This post is in continuation of the previous post Text Classification With Python Using fastText. This post describes how to improve fastText classifier using various techniques.

### More on Precision and Recall

Precision: Number of correct labels out of total labels predicted by classifier.

Recall: Number of labels successfully predicted out of real labels.

Example:

Why not put knives in the dishwasher?

This question has three labels on StackExchange: **equipment,** **cleaning** and **knives**.

Let us obtain top five labels predicted from our model (k = top k labels to predict):

```
text = ['Why not put knives in the dishwasher?']
labels = classifier.predict ('text', k=5)
print labels
```

This gives are **food-safety**, **baking**, **equipment**, **substitutions** and **bread**.

One out of five labels predicted by the model is correct, giving a precision of 0.20. Out of the three real labels, only one is predicted by the model, giving a recall of 0.33.

### Improving the Model

We ran the model with default parameters and training data as it is. Now let’s tweak a little bit. We will employ following techniques to improve :

- Preprocessing the data
- Changing the number of epochs (using the option
**epoch**, standard range [5 – 50]) - Changing the learning rate (using the option
**lr**, standard range [0.1 – 1.0]) - Using word n-grams (using the option
**wordNgrams**, standard range [1 – 5])

We will perform these techniques and see improvement in precision and recall at each stage.

__Preprocessing The Data__

Preprocessing includes removal of special characters and converting entire text to lower case.

```
cat cooking.stackexchange.txt | sed -e "s/([.\!?,'/()])/ 1 /g" | tr "[:upper:]" "[:lower:]" > cooking.preprocessed.txt
head -n 12404 cooking.preprocessed.txt > cooking.train
tail -n 3000 cooking.preprocessed.txt > cooking.valid
```

```
classifier = fasttext.supervised('cooking.train', 'model_cooking')
result = classifier.test ('cooking.valid')
print result.precision
0.161
print result.recall
0.0696266397578
```

So after preprocessing precision and recall have improved.

__More Epoch and Increased Learning Rate__

Epoch can be set using **epoch** parameter. Default value is 5. We are going to set it to 25. More epoch will result into increased training time but it would be worth.

```
classifier = fasttext.supervised('cooking.train', 'model_cooking', epoch=25)
result = classifier.test ('cooking.valid')
print result.precision
0.493
print result.recall
0.213204555283
```

Now let’s change learning rate with **lr** parameter:

```
classifier = fasttext.supervised('cooking.train', 'model_cooking', lr=1.0)
result = classifier.test ('cooking.valid')
print result.precision
0.546
print result.recall
0.236125126135
```

Results with both epoch and lr together:

```
classifier = fasttext.supervised('cooking.train', 'model_cooking', epoch=25, lr=1.0)
result = classifier.test ('cooking.valid')
print result.precision
0.565
print result.recall
0.244630243621
```

__Using Word n-grams__

Word n-grams deal with sequencing of tokens in the text. See examples of word n-grams on Wikipedia.

```
classifier = fasttext.supervised('cooking.train', 'model_cooking', epoch=25, lr=1.0, )
result = classifier.test ('cooking.valid')
print result.precision
???#
print result.recall
???#
```

*# I am unable to show results for word n-grams because Python on my system keeps crashing. I will update the post asap.*