CategoryBig Data

An Introduction to Hadoop and Hadoop Ecosystem


Welcome to Hadoop and BigData series! This is the first article in the series where we present an introduction to Hadoop and the ecosystem. In the beginning In October 2003, a paper titled Google File System (Ghemawat et al.) was published. The paper describes design and implementation of a scalable distibuted file system. This paper along with another paper on MapReduce inspired Doug Cutting and...

Setting up Apache Hadoop Single Node Cluster

setup hadoop single node cluster

This guide will help you to install a single node Apache Hadoop cluster on your machine. System Requirements Ubuntu 16.04Java 8 Installed 1. Download Hadoop wget 2. Prepare for Installation tar xfz hadoop-2.7.0.tar.gz sudo mv hadoop-2.7.0 /usr/local/hadoop 3. Create Dedicated Group and User sudo addgroup hadoop sudo adduser --ingroup hadoop hduser sudo adduser hduser sudo 4. Switch to Newly...

Configure Anaconda on Emacs


Perhaps my quest for an ultimate IDE ends with Emacs. My goal was to use Emacs as full-flagged Python IDE. This post describes how to setup Anaconda on Emacs. My Setup: OS: Trisquel 8.0 Emacs: GNU Emacs 25.3.2 Quick Key Guide (See full guide) : C-x = Ctrl + x M-x = Alt + x RET = ENTER 1. Downloading and installing Anaconda 1.1 Download: Download Anaconda from here. You should download Python 3.x...

Apache Log Visualization with Matplotlib : Learn Data Science


This post discusses Apache log visualization with Matplotlib library. First, download the data file used in this example [hide_from_apps container=”span”]from here.[/hide_from_apps][show_only_in_apps]from here.[/show_only_in_apps] We will require numpy and matplotlib In [1]: import numpy as np import matplotlib.pyplot as plt numpy.loadtext() can directly load a text file in an...

Building a Movie Recommendation Service with Apache Spark


In this tutorial I’ll show you building a movie recommendation service with Apache Spark. Two users are alike if they rated a product similarly. For example, if Alice rated a book 3/5 and Bob also rated the same book 3.3/5 they are very much alike. Now if Bob buys another book and rates it 4/5 we should suggest that book to Alice, that’s what a recommender system does. See references...

GraphFrames PySpark Example : Learn Data Science


In this post, GraphFrames PySpark example is discussed with shortest path problem. GraphFrames is a Spark package that allows DataFrame-based graphs in Saprk. Spark version 1.6.2 is considered for all examples. Including the package with PySaprk shell : pyspark --packages graphframes:graphframes:0.1.0-spark1.6 Code: from pyspark import SparkContext from pyspark.sql import SQLContext sc =...

Logistic Regression with Spark : Learn Data Science


Logistic regression with Spark is achieved using MLlib. Logistic regression returns binary class labels that is “0” or “1”. In this example, we consider a data set that consists only one variable “study hours” and class label is whether the student passed (1) or not passed (0). from pyspark import SparkContext from pyspark import SparkContext import numpy as np...

k-Means Clustering Spark Tutorial : Learn Data Science


k-Means clustering with Spark is easy to understand. MLlib comes bundled with k-Means implementation (KMeans) which can be imported from pyspark.mllib.clustering package. Here is a very simple example of clustering data with height and weight attributes. Arguments to KMeans.train: k is the number of desired clusters maxIterations is the maximum number of iterations to run. runs is the number of...

Apriori Algorithm for Generating Frequent Itemsets


Apriori Algorithm is used in finding frequent itemsets. Identifying associations between items in a dataset of transactions can be useful in various data mining tasks. For example, a supermarket can make better shelf arrangement if they know which items are purchased together frequently. The challenge is that given a dataset D having T transactions each with n number of attributes, how to find...

Devji Chhanga

I teach computer science at university of Kutch since 2011, Kutch is the western most district of India. At iDevji, I share tech stories that excite me. You will love reading the blog if you too believe in the disruptive power of technology. Some stories are purely technical while others can involve empathetical approach to problem solving using technology.

Get in touch

Quickly communicate covalent niche markets for maintainable sources. Collaboratively harness resource sucking experiences whereas cost effective meta-services.