In this post, GraphFrames PySpark example is discussed with shortest path problem. GraphFrames is a Spark package that allows DataFrame-based graphs in Saprk. Spark version 1.6.2 is considered for all examples. Including the package with PySaprk shell :
from pyspark import SparkContext from pyspark.sql import SQLContext sc = SparkContext () sqlContext = SQLContext(sc)
# create vertex DataFrame for users with id and name attributes v = sqlContext.createDataFrame([ ("a", "Alice"), ("b", "Bob"), ("c", "Charlie"), ], ["id", "name"])
# create edge DataFrame with "src" and "dst" attributes e = sqlContext.createDataFrame([ ("a", "b", "friends"), ("b", "c", "follow"), ("c", "b", "follow"), ], ["src", "dst", "relationship"])
# create a GraphFrame with v, e from graphframes import * g = GraphFrame(v, e)
# example : getting in-degrees of each vertex g.inDegrees.show()
Output:
id
inDegree
b
2
c
1
example : getting “follow” relationships in the graph