Hortonworks sandbox for Hadoop Data Platform (HDP) is a quick and easy personal desktop environment to get started on learning, developing, testing and trying out new features. It saves the user from installation and configuration of Hadoop and other tools. This article explains how to run Python MapReduce word count example using Hadoop Streaming.


Minimum system requirement is 8 GB+ RAM. If you have 10 GB+ RAM perhaps than only you can run a VM with 8 GB. So if you do not fulfill this requirement, you can try it on cloud services such as Azure, AWS or Google Cloud.

This article uses examples based on HDP 2.3.2 running on Oracle VirtualBox hosted Ubuntu 16.06.

Download and Installation: Follow this guide from Hortonworks to install sandbox on Oracle VirtualBox.


  1. Download example code and data from here
  2. Start sandbox image from VirtualBox
  3. From Ubuntu’s web browser login to dashboard using : username/password: raj_ops/raj_ops
  4. From dashboard GUI, create directory input
  5. Upload sample.txt to input using Ambari > Files View > Upload
  6. Again, from web browser login to HDP shell using: username/password: root/password
  7. From shell upload mapper.py and reducer.py using following secure copy (scp) command:
    scp -P 2222 /home/username/Downloads/mapper.py  root@sandbox.hortonworks.com:/
    scp -P 2222 /home/username/Downloads/reducer.py  root@sandbox.hortonworks.com:/
  8. Run the job using:
    hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar \
     -input /input -output /output -mapper /mapper.py -reducer /reducer.py

    Note: Do not create output directory in advance. Hadoop will create it.

  9. Test output:
    hadoop -fs cat /output/part-0000
    real 1
    my 2
    is 2
    but 1
    kolkata 1
    home 2
    kutch 2


  1. Python MapReduce : Running Your First Hadoop Streaming Job
  2. Map Reduce Word Count With Python : The Simplest Tutorial
  3. Want to Learn More? Signup in a Click.


3 Comment threads
0 Thread replies
Most reacted comment
Hottest comment thread
3 Comment authors
EmmaDevji ChhangaVivian Anto Recent comment authors
newest oldest most voted
Notify of
Vivian Anto
Vivian Anto

hello sir,
Im not able to access HDP shell using the browser cannot display the website)
I have finished till step number 5,
Can u pls help!


I cannot do the same too. It’s just write

“ssh: connect to host sandbox.hortonworks.com port 2222: Connection refused
lost connection