Getting Started with fastText : Learn NLP

Extracting Text from PDF Using Apache Tika

This article is for those who are getting started with fastText. fastText is a text representation and classification library from Facebook Research developed by FAIR lab. Classification of text documents is an important natural language processing (NLP) task. It is originally written in C++ but can be accessed using Python interface. It is massively fast. See references for two defining papers.

In this article we’ll discuss Python installation of fastText. pip and cython are pre-requisites, install them if not already installed:

#Installing pip
sudo apt-get install pip

Now, install cython

#Installing Cython
pip install cython

Finally install fastText which may also download other missing packages like numpy for us.

#Installing fastText
pip install fasttext

And we are done!

See next post Text Classification With Python Using fastText.

References:

  1. fastTex – Facebook Research
  2. Representation: Enriching Word Vectors with Subword Information, Piotr Bojanowski, Edouard Grave, Armand Joulin and Tomas Mikolov, 2016
  3. Classification: Bag of Tricks for Efficient Text Classification, Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas Mikolov, 2016

Comments

avatar
  Subscribe  
Notify of