Proof of Work Explained Simply : Learn Blockchain

proof of work

Proof of Work (PoW) is a consensus algorithm used in the original Bitcoin implementation. In a Blockchain system new transactions are periodically added by packaging these transactions in a block. This block is then added to the Blockchain. Please read What is Blockchain Technology if you don’t already know. Background Users send each other digital […]

Merge Sort : Why Sorting Lists in Halves is Interesting?

merge sort

Searching and sorting are two basic problems which occur in various more complex computing problems. Time complexity of sorting algorithms cannot get better than O (nlogn). Many popular algorithms have polynomial time complexity O (n2). Undoubtedly sorted lists are interesting because we can solve many problems using it: from binary search to Kruskal’s algorithm. This […]

Apache Log Visualization with Matplotlib : Learn Data Science

Apache log visualization with Matplotlib

This post discusses Apache log visualization with Matplotlib library. First, download the data file used in this example We will require numpy and matplotlib In [1]: import numpy as np import matplotlib.pyplot as plt numpy.loadtext() can directly load a text file in an array requests-idevji.txt contains only hour on which request was made, this is achieved […]

GraphFrames PySpark Example : Learn Data Science

GraphFrames PySpark Example

In this post, GraphFrames PySpark example is discussed with shortest path problem. GraphFrames is a Spark package that allows DataFrame-based graphs in Saprk. Spark version 1.6.2 is considered for all examples. Including the package with PySaprk shell : pyspark –packages graphframes:graphframes:0.1.0-spark1.6pyspark –packages graphframes:graphframes:0.1.0-spark1.6 Code: from pyspark import SparkContext from pyspark.sql import SQLContext sc = SparkContext […]

Logistic Regression with Spark : Learn Data Science

Logistic Regression with Spark

Logistic regression with Spark is achieved using MLlib. Logistic regression returns binary class labels that is “0” or “1”. In this example, we consider a data set that consists only one variable “study hours” and class label is whether the student passed (1) or not passed (0). from pyspark import SparkContext from pyspark import SparkContext […]

k-Means Clustering Spark Tutorial : Learn Data Science

k-Means Clustering Spark

k-Means clustering with Spark is easy to understand. MLlib comes bundled with k-Means implementation (KMeans) which can be imported from pyspark.mllib.clustering package. Here is a very simple example of clustering data with height and weight attributes. Arguments to KMeans.train: k is the number of desired clusters maxIterations is the maximum number of iterations to run. […]

Data Mining : Intuitive Partitioning of Data or 3-4-5 Rule

Intuitive Partitioning

Intuitive partitioning or natural partitioning is used in data discretization. Data discretization is the process of converting continuous values of an attribute into categorical data or partitions or intervals. Discretization helps reducing data size by reducing number of possible values. Instead of storing every observation we can only store partition range in which each observation […]

k-means Clustering Algorithm with Python : Learn Data Science

k-Means Clustering Spark

k-means clustering algorithm is used to group samples (items) in k clusters; k is specified by the user. The method works by calculating mean distance between cluster centroids and samples, hence the name k-means clustering. Euclidean distance is used as distance measure. See references for more information on the algorithm. This is a article describes k-means […]