The objective of this post is to present an intuitive overview of features of pandas DataFrame object. Minimum temperature data from 1901 to 2017 provided by data.gov.in is used as an example. Table of Contents What is pandas? Installing pandas Running this example on Kaggle Creating a DataFrame from Excel or CSV Glancing at the […]
Perhaps my quest for an ultimate IDE ends with Emacs. My goal was to use Emacs as full-flagged Python IDE. This post describes how to setup Anaconda on Emacs. My Setup: OS: Trisquel 8.0 Emacs: GNU Emacs 25.3.2 Quick Key Guide (See full guide) : C-x = Ctrl + x M-x = Alt + x […]
Cryptocurrencies are changing the way people buy or sell anonymously. But they are plagued by hype, regulatory issues and possible abuse. As a result of so much buzz around the term, many think cryptocurrency and blockchain are same things. That is not true; Cryptocurrencies is just one of the many blockchain applications. Blockchain is the […]
In this tutorial we’ll learn how to create a very basic Blockchain with Python. We will create a Blockchain with just 30 lines of code! The aim is to introduce you to Blockchain programming without getting into inessential details. You should already know fundamentals of Blockchain, if not then you may want to read this […]
The temptation to earn Bitcoins is understandable as Bitcoin is valued as world’s most expensive currency at current rates. But who mints Bitcoins? How are they distributed? and most importantly how can you earn them? these are the questions we will answer in this beginner’s guide to bitcoin mining. But let me give you a […]
Proof of Work (PoW) is a consensus algorithm used in the original Bitcoin implementation. In a Blockchain system new transactions are periodically added by packaging these transactions in a block. This block is then added to the Blockchain. Please read What is Blockchain Technology if you don’t already know. Background Users send each other digital […]
What is blockchain technology? is perhaps the most buzzing question in tech right now. A blockchain as the name suggests is a chain of blocks. Each block contains some information; a blockhain can store complete information about a financial transaction, a contract or a medical record. Important property of blockchain is that the data stored […]
MapReduce is a great approach to problem solving. It is very popular too, but MapReduce examples other than word-count are scarce on the web. This article describes MapReduce problem solving that is beyond word-count.
Arduino uses asynchronous serial communication to send-receive data to and from other devices. Arduino Uno supports serial communication via on-board UART port and Tx/Rx pins. Generally this transmission happens at 9600 bits per second which is termed as baud rate.
Arduino pushbutton example shows how to read a pushbutton with Arduino Uno. Pushbuttons (also spelled push-buttons) are widely used in calculators, phones and appliances. It closes the circuit when pushed and keeps it close until it is pressed. As soon as you release the button the circuit is open again. Here, we are going to […]
In this article we’ll learn displaying hello world on LCD; Interfacing 16×2 LCD with Arduino Uno and display some text on it. Writing Hello World pleases the gods of any new programming language that you want to learn. But in case of Arduino it is the LED blinking program that is generally written first. Any […]
LED Blinking on Arduino Uno should be your first Arduino project. Arduino Uno has an on-board or built-in LED, in this project we will see how to blink it. Parts you will need: An Arduino Uno Steps: Connect Arduino with your computer. Download, install and open Arduino IDE. Within IDE chooose File Menu > Examples […]
Arduino is an open-source hardware prototyping platform. It is widely used today in electronics projects because it is easy to learn, simple in design, well documented and cheaper. We call it platform because it is both hardware circuit as well as piece of software, the IDE. It also has its own programming language. All these […]
A major threat to your privacy emanates from your smart phone. These devices have become central medium of social interaction for everyone. Android is the only open source platform among the popular smart phone operating systems. But even with Android your privacy is under attack as large corporations have figured out that data is the […]
Searching and sorting are two basic problems which occur in various more complex computing problems. Time complexity of sorting algorithms cannot get better than O (nlogn). Many popular algorithms have polynomial time complexity O (n2). Undoubtedly sorted lists are interesting because we can solve many problems using it: from binary search to Kruskal’s algorithm. This […]
Zeno was a Greek philosopher who lived circa 490 to 430 BC. Zeno’s paradoxes paradoxes have puzzled us for more than 2500 years now; three of them are presented here for you to ponder upon. 1. Achilles and the tortoise In a race, the quickest runner can never overtake the slowest, since the pursuer must […]
This post discusses Apache log visualization with Matplotlib library. First, download the data file used in this example We will require numpy and matplotlib In : import numpy as np import matplotlib.pyplot as plt numpy.loadtext() can directly load a text file in an array requests-idevji.txt contains only hour on which request was made, this is achieved […]
As a young adult nothing thrilled me more than Jeremy Brett’s performance as Sherlock Holmes. “You know my methods, apply them!” he would say. So let’s try to play Sherlock ourselves. We use Natural Language Tool Kit or NLTK to guess setting of a Sherlock story in terms of its geographic location. In this NLTK […]
In this tutorial I’ll show you building a movie recommendation service with Apache Spark. Two users are alike if they rated a product similarly. For example, if Alice rated a book 3/5 and Bob also rated the same book 3.3/5 they are very much alike. Now if Bob buys another book and rates it 4/5 […]
In this post, GraphFrames PySpark example is discussed with shortest path problem. GraphFrames is a Spark package that allows DataFrame-based graphs in Saprk. Spark version 1.6.2 is considered for all examples. Including the package with PySaprk shell : pyspark –packages graphframes:graphframes:0.1.0-spark1.6pyspark –packages graphframes:graphframes:0.1.0-spark1.6 Code: from pyspark import SparkContext from pyspark.sql import SQLContext sc = SparkContext […]
Logistic regression with Spark is achieved using MLlib. Logistic regression returns binary class labels that is “0” or “1”. In this example, we consider a data set that consists only one variable “study hours” and class label is whether the student passed (1) or not passed (0). from pyspark import SparkContext from pyspark import SparkContext […]
k-Means clustering with Spark is easy to understand. MLlib comes bundled with k-Means implementation (KMeans) which can be imported from pyspark.mllib.clustering package. Here is a very simple example of clustering data with height and weight attributes. Arguments to KMeans.train: k is the number of desired clusters maxIterations is the maximum number of iterations to run. […]
Apriori Algorithm is used in finding frequent itemsets. Identifying associations between items in a dataset of transactions can be useful in various data mining tasks. For example, a supermarket can make better shelf arrangement if they know which items are purchased together frequently. The challenge is that given a dataset D having T transactions each […]
Functional programming in Python is possible with the use of lambda map reduce and filter functions. This article briefly describe use of each these functions. Lambda : Lambda specifies an anonymous function. It is used to declare a function with no name; When you want to use function only once. But why would you declare […]
Intuitive partitioning or natural partitioning is used in data discretization. Data discretization is the process of converting continuous values of an attribute into categorical data or partitions or intervals. Discretization helps reducing data size by reducing number of possible values. Instead of storing every observation we can only store partition range in which each observation […]
Binary logarithm or log2 n is the power to which the number 2 must be raised to obtain value n. Binary logarithm (and others) has numerous applications in computer science. Let’s take analysis of algorithms for example. All algorithms have a running time, also called time complexity of algorithms.
k-means clustering algorithm is used to group samples (items) in k clusters; k is specified by the user. The method works by calculating mean distance between cluster centroids and samples, hence the name k-means clustering. Euclidean distance is used as distance measure. See references for more information on the algorithm. This is a article describes k-means […]
Hadoop Distributed File System or HDFS is the underlying storage for all Hadoop applications. HDFS can be manipulated using APIs such as Java API or REST API but using HDFS shell is the most commonly used option. Below is a list of ten commonly used HDFS commands. 1. Invoking the file system: HDFS Shell supports […]
Hortonworks sandbox for Hadoop Data Platform (HDP) is a quick and easy personal desktop environment to get started on learning, developing, testing and trying out new features. It saves the user from installation and configuration of Hadoop and other tools. This article explains how to run Python MapReduce word count example using Hadoop Streaming.
Most NLP applications need to look beyond text and HTML documents as information is contained in PDF, ePub or other formats. Apache Tika is a toolkit that extracts meta data and text from documents. There is a REST based Python library for Tika.
We start by training the classifier with training data. It contains questions from cooking.stackexchange.com and their associated tags on the site. Let’s build a classifier that automatically recognize a topic of the question and assign a label to it.
E-commerce fails to deliver an exploratory shopping experience where a customer does not know what is she looking for. Online stores are too organized where products are arranged in categories and sub-categories.
fastText is a text representation and classification library from Facebook Research developed by FAIR lab. Classification of text documents is an important natural language processing (NLP) task. It is originally written in C++ but can be accessed using Python interface. It is massively fast. See references for two defining papers.
Can a computer read the stories, the way humans do? Of course computers can read from files much faster and accurately but second part of the question is more important. When we read a story we understand it we read the feelings of the protagonist, challenge here is to make computers do the same.
Nadella talks about culture, empathy, philosophy and trust apart from technology in his Hit Refresh: The Quest to Rediscover Microsoft’s Soul and Imagine a Better Future for Everyone. Six most powerful thoughts from the book that inspired me and will surely help you stay grounded.