NLTK Example : Detecting Geographic Setting of Sherlock Holmes Stories

NLTK example

As a young adult nothing thrilled me more than Jeremy Brett’s performance as Sherlock Holmes. “You know my methods, apply them!” he would say. So let’s try to play Sherlock ourselves. We use Natural Language Tool Kit or NLTK to guess setting of a Sherlock story in terms of its geographic location. In this NLTK example, our approach is very naive: identify the most frequent place mentioned in the story.

We use Named Entity Recognition (NRE) to identify geopolitical entities (GPE) and filter out the most frequent of them. This approach is very naive because there is no pre-processing on the text and GPEs may include other concepts apart from geographic locations such as nationalities. But we want to keep this really simple and fun. So here we go:

Code :

#NLTK example
#This code reads one text file at a time
 
from nltk import word_tokenize, pos_tag, ne_chunk
 
# read a text file
text = file ('filepath/file.txt')
 
# replace \n with a spcae
data=text.read().replace('\n', ' ')
 
chunked =  ne_chunk (pos_tag ( word_tokenize (data) ))
 
# extract GPEs
extracted = []
for chunk in chunked:
	if hasattr (chunk, 'label'):
		if chunk.label() == 'GPE':
			extracted.append (''.join (c[0] for c in chunk))
 
# extract most frequent GPE
 
from collections import Counter
count = Counter(extracted)
count.most_common(1)

Results:

Sr. Story Extracted Location Actual Setting Result
1. The Adventure of the Dancing Men [(‘Norfolk’, 14)] Norfolk Success
2. The Adventure of the Solitary Cyclist [(‘Farnham’, 6)] Farnham Success
3. A Scandal in Bohemia [(‘Bohemia’, 6)] Bohemia Success
4. The Red-Headed League [(‘London’, 7)] London Success
5. The Final Problem [(‘London’, 8)] London Success
6. The Greek Interpreter [(‘Greek’, 15)] Greece Fail

We got 5/6 predictions correct! These are not discouraging results and we may think of using this code somewhere in a more serious application.

References:

  1. Sherlock Holmes Stories in Plain Text
  2. NLTK Documentation

1
Comments

avatar
1 Comment threads
0 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
1 Comment authors
Anonymous Recent comment authors
  Subscribe  
newest oldest most voted
Notify of
Anonymous
Guest
Anonymous

For those who want to download all the stories in one go:
wget -r –no-parent https://sherlock-holm.es/stories/plain-text/