Map Reduce Word Count With Python : Learn Data Science
We spent multiple lectures talking about Hadoop architecture at the university. Yes, I even demonstrated the cool playing cards example! In fact we have an 18-page PDF from our data science lab on the installation. Still I saw students shy away perhaps because of complex installation process involved. This tutorial jumps on to hands-on coding to help anyone get up and running with Map Reduce. No Hadoop installation is required.
Problem : Counting word frequencies (word count) in a file. Data : Create sample.txt file with following lines. Preferably, create a directory for this tutorial and put all files there including this one.
my home is kolkata
but my real home is kutch
Mapper : Create a file mapper.py and paste below code there. Mapper receives data from stdin, chunks it and prints the output. Any UNIX/Linux user would know about the beauty of pipes. We’ll later use pipes to throw data from sample.txt to stdin.
1 | #!/usr/bin/env python |
Reducer : Create a file reducer.py and paste below code there. Reducer reads tuples generated by mapper and aggregates them.
1 | #!/usr/bin/env python |
Execution : CD to the directory where all files are kept and make both Python files executable:
1 | chmod +x mapper.py |
And now we will feed cat command to mapper and mapper to reducer using pipe (). That is output of cat goes to mapper and mapper’s output goes to reducer. (Recall that cat command is used to display contents of any file.
1 | cat sample.txt ./mapper.py ./reducer.py |
Output :
1 | real1 |
Yay, so we get the word count kutch x 1, is x 2, but x 1, kolkata x 1, home x 2 and my x 2! You can put your questions in comments section below!
Satya Nadella Quotes Ghalib at a Presentation in New Delhi[/caption] “Hazaaron khwaishein aisi, ke har khwaish pe dum nikle. Bohat nikle mere armaan, fir bhi kam nikle,” people were surprised when Microsoft CEO Satya Nadella quoted great Rekhta poet Ghalib during a presentation in New Delhi last year. ‘Yet another geek into poetry’ one may wonder. But it perfectly suites CEO of a tech giant whose products aspire to be as good as human in some ways. If poetry can ease complexity of affairs by creatively deploying words why shouldn’t robots use them? Also, ill-informed reading in history undermines innovations and experiments in social and political thoughts. Which I think is a great cause of concern for tech students. For example, civil disobedience as an idea may not occupy any place in the mind of an engineer as a revolutionary approach towards fighting oppression. It is only hypocrisy talking about singularity and at the same time denying human culture and values any place in there. If future is what we build today, we should build it good and not evil. In India universities are finally moving towards offering more diverse learning experience to the students where an engineering student can study Shakespeare. Though the progress is very slow the outcomes should be positive with CBCS or choice based credit system. References: