Extracting Text from PDF Using Apache Tika

Extracting Text from PDF Using Apache Tika – Learn NLP

Most NLP applications need to look beyond text and HTML documents as information is contained in PDF, ePub or other formats. Apache Tika is a toolkit that extracts meta data and text from documents. There is a REST based Python library for Tika. … Continue Reading >Extracting Text from PDF Using Apache Tika – Learn NLP

fastext

Tutorial: Text Classification With Python Using fastText

We start by training the classifier with training data. It contains questions from cooking.stackexchange.com and their associated tags on the site. Let’s build a classifier that automatically recognize a topic of the question and assign a label to it. … Continue Reading >Tutorial: Text Classification With Python Using fastText

Extracting Text from PDF Using Apache Tika

Getting Started with fastText : Learn NLP

fastText is a text representation and classification library from Facebook Research developed by FAIR lab. Classification of text documents is an important natural language processing (NLP) task. It is originally written in C++ but can be accessed using Python interface. It is massively fast. See references for two defining papers. … Continue Reading >Getting Started with fastText : Learn NLP

harry-potter-deathly-hallows

Programming Computers to Read Stories

Can a computer read the stories, the way humans do? Of course computers can read from files much faster and accurately but second part of the question is more important. When we read a story we understand it we read the feelings of the protagonist, challenge here is to make computers do the same. … Continue Reading >Programming Computers to Read Stories