As a young adult nothing thrilled me more than Jeremy Brett’s performance as Sherlock Holmes. “You know my methods, apply them!” he would say. So let’s try to play Sherlock ourselves. We use Natural Language Tool Kit or NLTK to guess setting of a Sherlock story in terms of its geographic location. In this NLTK […]
Most NLP applications need to look beyond text and HTML documents as information is contained in PDF, ePub or other formats. Apache Tika is a toolkit that extracts meta data and text from documents. There is a REST based Python library for Tika.
We start by training the classifier with training data. It contains questions from cooking.stackexchange.com and their associated tags on the site. Let’s build a classifier that automatically recognize a topic of the question and assign a label to it.
fastText is a text representation and classification library from Facebook Research developed by FAIR lab. Classification of text documents is an important natural language processing (NLP) task. It is originally written in C++ but can be accessed using Python interface. It is massively fast. See references for two defining papers.
Can a computer read the stories, the way humans do? Of course computers can read from files much faster and accurately but second part of the question is more important. When we read a story we understand it we read the feelings of the protagonist, challenge here is to make computers do the same.