Perhaps my quest for an ultimate IDE ends with Emacs. My goal was to use Emacs as full-flagged Python IDE. This post describes how to setup Anaconda on Emacs. My Setup: OS: Trisquel 8.0 Emacs: GNU Emacs 25.3.2 Quick Key Guide (See full guide) : C-x = Ctrl + x M-x = Alt + x […]
In this tutorial we’ll learn how to create a very basic Blockchain with Python. We will create a Blockchain with just 30 lines of code! The aim is to introduce you to Blockchain programming without getting into inessential details. You should already know fundamentals of Blockchain, if not then you may want to read this […]
Logistic regression with Spark is achieved using MLlib. Logistic regression returns binary class labels that is “0” or “1”. In this example, we consider a data set that consists only one variable “study hours” and class label is whether the student passed (1) or not passed (0). from pyspark import SparkContext from pyspark import SparkContext […]
k-Means clustering with Spark is easy to understand. MLlib comes bundled with k-Means implementation (KMeans) which can be imported from pyspark.mllib.clustering package. Here is a very simple example of clustering data with height and weight attributes. Arguments to KMeans.train: k is the number of desired clusters maxIterations is the maximum number of iterations to run. […]
Functional programming in Python is possible with the use of lambda map reduce and filter functions. This article briefly describe use of each these functions. Lambda : Lambda specifies an anonymous function. It is used to declare a function with no name; When you want to use function only once. But why would you declare […]
k-means clustering algorithm is used to group samples (items) in k clusters; k is specified by the user. The method works by calculating mean distance between cluster centroids and samples, hence the name k-means clustering. Euclidean distance is used as distance measure. See references for more information on the algorithm. This is a article describes k-means […]
Hortonworks sandbox for Hadoop Data Platform (HDP) is a quick and easy personal desktop environment to get started on learning, developing, testing and trying out new features. It saves the user from installation and configuration of Hadoop and other tools. This article explains how to run Python MapReduce word count example using Hadoop Streaming.
Most NLP applications need to look beyond text and HTML documents as information is contained in PDF, ePub or other formats. Apache Tika is a toolkit that extracts meta data and text from documents. There is a REST based Python library for Tika.
We start by training the classifier with training data. It contains questions from cooking.stackexchange.com and their associated tags on the site. Let’s build a classifier that automatically recognize a topic of the question and assign a label to it.