As a young adult nothing thrilled me more than Jeremy Brett’s performance as Sherlock Holmes. “You know my methods, apply them!” he would say. So let’s try to play Sherlock ourselves. We use Natural Language Tool Kit or NLTK to guess setting of a Sherlock story in terms of its geographic location. In this NLTK example, our approach is very naive: identify the most frequent place mentioned in the story.
We use Named Entity Recognition (NRE) to identify geopolitical entities (GPE) and filter out the most frequent of them. This approach is very naive because there is no pre-processing on the text and GPEs may include other concepts apart from geographic locations such as nationalities. But we want to keep this really simple and fun. So here we go:
#NLTK example #This code reads one text file at a time from nltk import word_tokenize, pos_tag, ne_chunk # read a text file text = file ('filepath/file.txt') # replace \n with a spcae data=text.read().replace('\n', ' ') chunked = ne_chunk (pos_tag ( word_tokenize (data) )) # extract GPEs extracted =  for chunk in chunked: if hasattr (chunk, 'label'): if chunk.label() == 'GPE': extracted.append (''.join (c for c in chunk)) # extract most frequent GPE from collections import Counter count = Counter(extracted) count.most_common(1)
|Sr.||Story||Extracted Location||Actual Setting||Result|
|1.||The Adventure of the Dancing Men||[(‘Norfolk’, 14)]||Norfolk||Success|
|2.||The Adventure of the Solitary Cyclist||[(‘Farnham’, 6)]||Farnham||Success|
|3.||A Scandal in Bohemia||[(‘Bohemia’, 6)]||Bohemia||Success|
|4.||The Red-Headed League||[(‘London’, 7)]||London||Success|
|5.||The Final Problem||[(‘London’, 8)]||London||Success|
|6.||The Greek Interpreter||[(‘Greek’, 15)]||Greece||Fail|
We got 5/6 predictions correct! These are not discouraging results and we may think of using this code somewhere in a more serious application.