GraphFrames PySpark Example : Learn Data Science


In this post, GraphFrames PySpark example is discussed with shortest path problem. GraphFrames is a Spark package that allows DataFrame-based graphs in Saprk. Spark version 1.6.2 is considered for all examples.

Including the package with PySaprk shell :

pyspark --packages graphframes:graphframes:0.1.0-spark1.6


from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext ()
sqlContext = SQLContext(sc)

# create vertex DataFrame for users with id and name attributes
v = sqlContext.createDataFrame([
  ("a", "Alice"),
  ("b", "Bob"),
  ("c", "Charlie"),
], ["id", "name"])

# create edge DataFrame with "src" and "dst" attributes
e = sqlContext.createDataFrame([
  ("a", "b", "friends"),
  ("b", "c", "follow"),
  ("c", "b", "follow"),
], ["src", "dst", "relationship"])

# create a GraphFrame with v, e
from graphframes import *
g = GraphFrame(v, e)

# example : getting in-degrees of each vertex


| id|inDegree|
|  b|       2|
|  c|       1|

# exampple : getting "follow" relationships in the graph
g.edges.filter("relationship = 'follow'").count()


# getting shortest paths to "a" from each vertex
results = g.shortestPaths(landmarks=["a"])"id", "distances").show()


| id|  distances|
|  a|Map(a -> 0)|
|  b|      Map()|
|  c|      Map()|

Feel free to ask your questions in the comments section!

Add Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.


Follow me on