GraphFrames PySpark Example : Learn Data Science

GraphFrames PySpark Example

In this post, GraphFrames PySpark example is discussed with shortest path problem. GraphFrames is a Spark package that allows DataFrame-based graphs in Saprk. Spark version 1.6.2 is considered for all examples.

Including the package with PySaprk shell :

pyspark --packages graphframes:graphframes:0.1.0-spark1.6

Code:

from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext ()
sqlContext = SQLContext(sc)
 
# create vertex DataFrame for users with id and name attributes
v = sqlContext.createDataFrame([
  ("a", "Alice"),
  ("b", "Bob"),
  ("c", "Charlie"),
], ["id", "name"])
 
# create edge DataFrame with "src" and "dst" attributes
e = sqlContext.createDataFrame([
  ("a", "b", "friends"),
  ("b", "c", "follow"),
  ("c", "b", "follow"),
], ["src", "dst", "relationship"])
 
# create a GraphFrame with v, e
from graphframes import *
g = GraphFrame(v, e)
 
# example : getting in-degrees of each vertex
g.inDegrees.show()

Output:

+---+--------+                                                                  
| id|inDegree|
+---+--------+
|  b|       2|
|  c|       1|
+---+--------+
# exampple : getting "follow" relationships in the graph
g.edges.filter("relationship = 'follow'").count()

Output:

2
# getting shortest paths to "a" from each vertex
results = g.shortestPaths(landmarks=["a"])
results.select("id", "distances").show()

Output:

+---+-----------+                                                               
| id|  distances|
+---+-----------+
|  a|Map(a -> 0)|
|  b|      Map()|
|  c|      Map()|
+---+-----------+

Feel free to ask your questions in the comments section!

Comments

avatar
  Subscribe  
Notify of