public abstract class Graph<VD,ED>
extends Object
implements scala.Serializable
GraphOps
contains additional convenience operations and graph algorithms.
Modifier and Type | Method and Description |
---|---|
<A> VertexRDD<A> |
aggregateMessages(scala.Function1<EdgeContext<VD,ED,A>,scala.runtime.BoxedUnit> sendMsg,
scala.Function2<A,A,A> mergeMsg,
TripletFields tripletFields,
scala.reflect.ClassTag<A> evidence$11)
Aggregates values from the neighboring edges and vertices of each vertex.
|
static <VD,ED> Graph<VD,ED> |
apply(RDD<scala.Tuple2<Object,VD>> vertices,
RDD<Edge<ED>> edges,
VD defaultVertexAttr,
StorageLevel edgeStorageLevel,
StorageLevel vertexStorageLevel,
scala.reflect.ClassTag<VD> evidence$18,
scala.reflect.ClassTag<ED> evidence$19)
Construct a graph from a collection of vertices and
edges with attributes.
|
abstract Graph<VD,ED> |
cache()
Caches the vertices and edges associated with this graph at the previously-specified target
storage levels, which default to
MEMORY_ONLY . |
abstract void |
checkpoint()
Mark this Graph for checkpointing.
|
abstract EdgeRDD<ED> |
edges()
An RDD containing the edges and their associated attributes.
|
static <VD,ED> Graph<VD,ED> |
fromEdges(RDD<Edge<ED>> edges,
VD defaultValue,
StorageLevel edgeStorageLevel,
StorageLevel vertexStorageLevel,
scala.reflect.ClassTag<VD> evidence$16,
scala.reflect.ClassTag<ED> evidence$17)
Construct a graph from a collection of edges.
|
static <VD> Graph<VD,Object> |
fromEdgeTuples(RDD<scala.Tuple2<Object,Object>> rawEdges,
VD defaultValue,
scala.Option<PartitionStrategy> uniqueEdges,
StorageLevel edgeStorageLevel,
StorageLevel vertexStorageLevel,
scala.reflect.ClassTag<VD> evidence$15)
Construct a graph from a collection of edges encoded as vertex id pairs.
|
abstract scala.collection.Seq<String> |
getCheckpointFiles()
Gets the name of the files to which this Graph was checkpointed.
|
static <VD,ED> GraphOps<VD,ED> |
graphToGraphOps(Graph<VD,ED> g,
scala.reflect.ClassTag<VD> evidence$20,
scala.reflect.ClassTag<ED> evidence$21)
Implicitly extracts the
GraphOps member from a graph. |
abstract Graph<VD,ED> |
groupEdges(scala.Function2<ED,ED,ED> merge)
Merges multiple edges between two vertices into a single edge.
|
abstract boolean |
isCheckpointed()
Return whether this Graph has been checkpointed or not.
|
<ED2> Graph<VD,ED2> |
mapEdges(scala.Function1<Edge<ED>,ED2> map,
scala.reflect.ClassTag<ED2> evidence$4)
Transforms each edge attribute in the graph using the map function.
|
abstract <ED2> Graph<VD,ED2> |
mapEdges(scala.Function2<Object,scala.collection.Iterator<Edge<ED>>,scala.collection.Iterator<ED2>> map,
scala.reflect.ClassTag<ED2> evidence$5)
Transforms each edge attribute using the map function, passing it a whole partition at a
time.
|
<ED2> Graph<VD,ED2> |
mapTriplets(scala.Function1<EdgeTriplet<VD,ED>,ED2> map,
scala.reflect.ClassTag<ED2> evidence$6)
Transforms each edge attribute using the map function, passing it the adjacent vertex
attributes as well.
|
<ED2> Graph<VD,ED2> |
mapTriplets(scala.Function1<EdgeTriplet<VD,ED>,ED2> map,
TripletFields tripletFields,
scala.reflect.ClassTag<ED2> evidence$7)
Transforms each edge attribute using the map function, passing it the adjacent vertex
attributes as well.
|
abstract <ED2> Graph<VD,ED2> |
mapTriplets(scala.Function2<Object,scala.collection.Iterator<EdgeTriplet<VD,ED>>,scala.collection.Iterator<ED2>> map,
TripletFields tripletFields,
scala.reflect.ClassTag<ED2> evidence$8)
Transforms each edge attribute a partition at a time using the map function, passing it the
adjacent vertex attributes as well.
|
abstract <VD2> Graph<VD2,ED> |
mapVertices(scala.Function2<Object,VD,VD2> map,
scala.reflect.ClassTag<VD2> evidence$3,
scala.Predef.$eq$colon$eq<VD,VD2> eq)
Transforms each vertex attribute in the graph using the map function.
|
abstract <VD2,ED2> Graph<VD,ED> |
mask(Graph<VD2,ED2> other,
scala.reflect.ClassTag<VD2> evidence$9,
scala.reflect.ClassTag<ED2> evidence$10)
Restricts the graph to only the vertices and edges that are also in
other , but keeps the
attributes from this graph. |
GraphOps<VD,ED> |
ops()
The associated
GraphOps object. |
abstract <U,VD2> Graph<VD2,ED> |
outerJoinVertices(RDD<scala.Tuple2<Object,U>> other,
scala.Function3<Object,VD,scala.Option<U>,VD2> mapFunc,
scala.reflect.ClassTag<U> evidence$13,
scala.reflect.ClassTag<VD2> evidence$14,
scala.Predef.$eq$colon$eq<VD,VD2> eq)
Joins the vertices with entries in the
table RDD and merges the results using mapFunc . |
abstract Graph<VD,ED> |
partitionBy(PartitionStrategy partitionStrategy)
Repartitions the edges in the graph according to
partitionStrategy . |
abstract Graph<VD,ED> |
partitionBy(PartitionStrategy partitionStrategy,
int numPartitions)
Repartitions the edges in the graph according to
partitionStrategy . |
abstract Graph<VD,ED> |
persist(StorageLevel newLevel)
Caches the vertices and edges associated with this graph at the specified storage level,
ignoring any target storage levels previously set.
|
abstract Graph<VD,ED> |
reverse()
Reverses all edges in the graph.
|
abstract Graph<VD,ED> |
subgraph(scala.Function1<EdgeTriplet<VD,ED>,Object> epred,
scala.Function2<Object,VD,Object> vpred)
Restricts the graph to only the vertices and edges satisfying the predicates.
|
abstract RDD<EdgeTriplet<VD,ED>> |
triplets()
An RDD containing the edge triplets, which are edges along with the vertex data associated with
the adjacent vertices.
|
abstract Graph<VD,ED> |
unpersist(boolean blocking)
Uncaches both vertices and edges of this graph.
|
abstract Graph<VD,ED> |
unpersistVertices(boolean blocking)
Uncaches only the vertices of this graph, leaving the edges alone.
|
abstract VertexRDD<VD> |
vertices()
An RDD containing the vertices and their associated attributes.
|
public static <VD> Graph<VD,Object> fromEdgeTuples(RDD<scala.Tuple2<Object,Object>> rawEdges, VD defaultValue, scala.Option<PartitionStrategy> uniqueEdges, StorageLevel edgeStorageLevel, StorageLevel vertexStorageLevel, scala.reflect.ClassTag<VD> evidence$15)
rawEdges
- a collection of edges in (src, dst) formdefaultValue
- the vertex attributes with which to create vertices referenced by the edgesuniqueEdges
- if multiple identical edges are found they are combined and the edge
attribute is set to the sum. Otherwise duplicate edges are treated as separate. To enable
uniqueEdges
, a PartitionStrategy
must be provided.edgeStorageLevel
- the desired storage level at which to cache the edges if necessaryvertexStorageLevel
- the desired storage level at which to cache the vertices if necessary
evidence$15
- (undocumented)uniqueEdges
is None
) and vertex attributes containing the total degree of each vertex.public static <VD,ED> Graph<VD,ED> fromEdges(RDD<Edge<ED>> edges, VD defaultValue, StorageLevel edgeStorageLevel, StorageLevel vertexStorageLevel, scala.reflect.ClassTag<VD> evidence$16, scala.reflect.ClassTag<ED> evidence$17)
edges
- the RDD containing the set of edges in the graphdefaultValue
- the default vertex attribute to use for each vertexedgeStorageLevel
- the desired storage level at which to cache the edges if necessaryvertexStorageLevel
- the desired storage level at which to cache the vertices if necessary
evidence$16
- (undocumented)evidence$17
- (undocumented)edges
and vertices
given by all vertices in edges
with value defaultValue
public static <VD,ED> Graph<VD,ED> apply(RDD<scala.Tuple2<Object,VD>> vertices, RDD<Edge<ED>> edges, VD defaultVertexAttr, StorageLevel edgeStorageLevel, StorageLevel vertexStorageLevel, scala.reflect.ClassTag<VD> evidence$18, scala.reflect.ClassTag<ED> evidence$19)
vertices
- the "set" of vertices and their attributesedges
- the collection of edges in the graphdefaultVertexAttr
- the default vertex attribute to use for vertices that are
mentioned in edges but not in verticesedgeStorageLevel
- the desired storage level at which to cache the edges if necessaryvertexStorageLevel
- the desired storage level at which to cache the vertices if necessaryevidence$18
- (undocumented)evidence$19
- (undocumented)public static <VD,ED> GraphOps<VD,ED> graphToGraphOps(Graph<VD,ED> g, scala.reflect.ClassTag<VD> evidence$20, scala.reflect.ClassTag<ED> evidence$21)
GraphOps
member from a graph.
To improve modularity the Graph type only contains a small set of basic operations.
All the convenience operations are defined in the GraphOps
class which may be
shared across multiple graph implementations.
g
- (undocumented)evidence$20
- (undocumented)evidence$21
- (undocumented)public abstract VertexRDD<VD> vertices()
public abstract EdgeRDD<ED> edges()
Edge
for the edge type.,
Graph#triplets
to get an RDD which contains all the edges
along with their vertex data.
public abstract RDD<EdgeTriplet<VD,ED>> triplets()
edges
if the vertex data are not needed, i.e.
if only the edge data and adjacent vertex ids are needed.
type Color = Int
val graph: Graph[Color, Int] = GraphLoader.edgeListFile("hdfs://file.tsv")
val numInvalid = graph.triplets.map(e => if (e.src.data == e.dst.data) 1 else 0).sum
public abstract Graph<VD,ED> persist(StorageLevel newLevel)
newLevel
- the level at which to cache the graph.
public abstract Graph<VD,ED> cache()
MEMORY_ONLY
. This is used to pin a graph in memory enabling
multiple queries to reuse the same construction process.public abstract void checkpoint()
public abstract boolean isCheckpointed()
public abstract scala.collection.Seq<String> getCheckpointFiles()
public abstract Graph<VD,ED> unpersist(boolean blocking)
blocking
- (undocumented)public abstract Graph<VD,ED> unpersistVertices(boolean blocking)
blocking
- (undocumented)public abstract Graph<VD,ED> partitionBy(PartitionStrategy partitionStrategy)
partitionStrategy
.
partitionStrategy
- the partitioning strategy to use when partitioning the edges
in the graph.public abstract Graph<VD,ED> partitionBy(PartitionStrategy partitionStrategy, int numPartitions)
partitionStrategy
.
partitionStrategy
- the partitioning strategy to use when partitioning the edges
in the graph.numPartitions
- the number of edge partitions in the new graph.public abstract <VD2> Graph<VD2,ED> mapVertices(scala.Function2<Object,VD,VD2> map, scala.reflect.ClassTag<VD2> evidence$3, scala.Predef.$eq$colon$eq<VD,VD2> eq)
map
- the function from a vertex object to a new vertex value
evidence$3
- (undocumented)eq
- (undocumented)
val rawGraph: Graph[(), ()] = Graph.textFile("hdfs://file")
val root = 42
var bfsGraph = rawGraph.mapVertices[Int]((vid, data) => if (vid == root) 0 else Math.MaxValue)
public <ED2> Graph<VD,ED2> mapEdges(scala.Function1<Edge<ED>,ED2> map, scala.reflect.ClassTag<ED2> evidence$4)
mapTriplets
.
map
- the function from an edge object to a new edge value.
evidence$4
- (undocumented)public abstract <ED2> Graph<VD,ED2> mapEdges(scala.Function2<Object,scala.collection.Iterator<Edge<ED>>,scala.collection.Iterator<ED2>> map, scala.reflect.ClassTag<ED2> evidence$5)
mapTriplets
.
map
- a function that takes a partition id and an iterator
over all the edges in the partition, and must return an iterator over
the new values for each edge in the order of the input iterator
evidence$5
- (undocumented)public <ED2> Graph<VD,ED2> mapTriplets(scala.Function1<EdgeTriplet<VD,ED>,ED2> map, scala.reflect.ClassTag<ED2> evidence$6)
mapEdges
instead.
map
- the function from an edge object to a new edge value.
evidence$6
- (undocumented)
val rawGraph: Graph[Int, Int] = someLoadFunction()
val graph = rawGraph.mapTriplets[Int]( edge =>
edge.src.data - edge.dst.data)
public <ED2> Graph<VD,ED2> mapTriplets(scala.Function1<EdgeTriplet<VD,ED>,ED2> map, TripletFields tripletFields, scala.reflect.ClassTag<ED2> evidence$7)
mapEdges
instead.
map
- the function from an edge object to a new edge value.tripletFields
- which fields should be included in the edge triplet passed to the map
function. If not all fields are needed, specifying this can improve performance.
evidence$7
- (undocumented)
val rawGraph: Graph[Int, Int] = someLoadFunction()
val graph = rawGraph.mapTriplets[Int]( edge =>
edge.src.data - edge.dst.data)
public abstract <ED2> Graph<VD,ED2> mapTriplets(scala.Function2<Object,scala.collection.Iterator<EdgeTriplet<VD,ED>>,scala.collection.Iterator<ED2>> map, TripletFields tripletFields, scala.reflect.ClassTag<ED2> evidence$8)
mapEdges
instead.
map
- the iterator transformtripletFields
- which fields should be included in the edge triplet passed to the map
function. If not all fields are needed, specifying this can improve performance.
evidence$8
- (undocumented)public abstract Graph<VD,ED> reverse()
public abstract Graph<VD,ED> subgraph(scala.Function1<EdgeTriplet<VD,ED>,Object> epred, scala.Function2<Object,VD,Object> vpred)
V' = {v : for all v in V where vpred(v)}
E' = {(u,v): for all (u,v) in E where epred((u,v)) && vpred(u) && vpred(v)}
epred
- the edge predicate, which takes a triplet and
evaluates to true if the edge is to remain in the subgraph. Note
that only edges where both vertices satisfy the vertex
predicate are considered.
vpred
- the vertex predicate, which takes a vertex object and
evaluates to true if the vertex is to be included in the subgraph
public abstract <VD2,ED2> Graph<VD,ED> mask(Graph<VD2,ED2> other, scala.reflect.ClassTag<VD2> evidence$9, scala.reflect.ClassTag<ED2> evidence$10)
other
, but keeps the
attributes from this graph.other
- the graph to project this graph ontoevidence$9
- (undocumented)evidence$10
- (undocumented)other
,
with vertex and edge data from the current graphpublic abstract Graph<VD,ED> groupEdges(scala.Function2<ED,ED,ED> merge)
partitionBy
.
merge
- the user-supplied commutative associative function to merge edge attributes
for duplicate edges.
public <A> VertexRDD<A> aggregateMessages(scala.Function1<EdgeContext<VD,ED,A>,scala.runtime.BoxedUnit> sendMsg, scala.Function2<A,A,A> mergeMsg, TripletFields tripletFields, scala.reflect.ClassTag<A> evidence$11)
sendMsg
function is invoked on each edge of the graph, generating 0 or more messages to be
sent to either vertex in the edge. The mergeMsg
function is then used to combine all messages
destined to the same vertex.
sendMsg
- runs on each edge, sending messages to neighboring vertices using the
EdgeContext
.mergeMsg
- used to combine messages from sendMsg
destined to the same vertex. This
combiner should be commutative and associative.tripletFields
- which fields should be included in the EdgeContext
passed to the
sendMsg
function. If not all fields are needed, specifying this can improve performance.
evidence$11
- (undocumented)
val rawGraph: Graph[_, _] = Graph.textFile("twittergraph")
val inDeg: RDD[(VertexId, Int)] =
rawGraph.aggregateMessages[Int](ctx => ctx.sendToDst(1), _ + _)
public abstract <U,VD2> Graph<VD2,ED> outerJoinVertices(RDD<scala.Tuple2<Object,U>> other, scala.Function3<Object,VD,scala.Option<U>,VD2> mapFunc, scala.reflect.ClassTag<U> evidence$13, scala.reflect.ClassTag<VD2> evidence$14, scala.Predef.$eq$colon$eq<VD,VD2> eq)
table
RDD and merges the results using mapFunc
.
The input table should contain at most one entry for each vertex. If no entry in other
is
provided for a particular vertex in the graph, the map function receives None
.
other
- the table to join with the vertices in the graph.
The table should contain at most one entry for each vertex.mapFunc
- the function used to compute the new vertex values.
The map function is invoked for all vertices, even those
that do not have a corresponding entry in the table.
evidence$13
- (undocumented)evidence$14
- (undocumented)eq
- (undocumented)
val rawGraph: Graph[_, _] = Graph.textFile("webgraph")
val outDeg: RDD[(VertexId, Int)] = rawGraph.outDegrees
val graph = rawGraph.outerJoinVertices(outDeg) {
(vid, data, optDeg) => optDeg.getOrElse(0)
}