CategoryBig Data

Data Mining : Intuitive Partitioning of Data or 3-4-5 Rule


Introduction Intuitive partitioning or natural partitioning is used in data discretization. Data discretization is the process of converting continuous values of an attribute into categorical data or partitions or intervals. This helps reducing data size by reducing number of possible values, so instead of storing every observation, we store partition range in which each observation falls. One of...

k-means Clustering Algorithm with Python : Learn Data Science


k-means clustering algorithm is used to group samples (items) in k clusters; k is specified by the user. The method works by calculating mean distance between cluster centroids and samples, hence the name k-means clustering. Euclidean distance is used as distance measure. See references for more information on the algorithm. This is a article describes k-means Clustering Algorithm with...

Commonly Used HDFS Commands : Learn Data Science


Hadoop Distributed File System or HDFS is the underlying storage for all Hadoop applications. HDFS can be manipulated using APIs such as Java API or REST API but using HDFS shell is the most commonly used option. Below is a list of ten commonly used HDFS commands. 1. Invoking the file system: HDFS Shell supports various file systems and not just HDFS. This means you can invoke file systems...


Follow me on