org.apache.spark.graphx
Class PartitionStrategy.EdgePartition2D$
Object
org.apache.spark.graphx.PartitionStrategy.EdgePartition2D$
- All Implemented Interfaces:
- java.io.Serializable, PartitionStrategy, scala.Equals, scala.Product
- Enclosing interface:
- PartitionStrategy
public static class PartitionStrategy.EdgePartition2D$
- extends Object
- implements PartitionStrategy, scala.Product, scala.Serializable
Assigns edges to partitions using a 2D partitioning of the sparse edge adjacency matrix,
guaranteeing a 2 * sqrt(numParts) - 1
bound on vertex replication.
Suppose we have a graph with 12 vertices that we want to partition
over 9 machines. We can use the following sparse matrix representation:
__________________________________
v0 | P0 * | P1 | P2 * |
v1 | **** | * | |
v2 | ******* | ** | **** |
v3 | ***** | * * | * |
----------------------------------
v4 | P3 * | P4 *** | P5 ** * |
v5 | * * | * | |
v6 | * | ** | **** |
v7 | * * * | * * | * |
----------------------------------
v8 | P6 * | P7 * | P8 * *|
v9 | * | * * | |
v10 | * | ** | * * |
v11 | * <-E | *** | ** |
----------------------------------
The edge denoted by E
connects v11
with v1
and is assigned to processor P6
. To get the
processor number we divide the matrix into sqrt(numParts)
by sqrt(numParts)
blocks. Notice
that edges adjacent to v11
can only be in the first column of blocks (P0, P3,
P6)
or the last
row of blocks (P6, P7, P8)
. As a consequence we can guarantee that v11
will need to be
replicated to at most 2 * sqrt(numParts) - 1
machines.
Notice that P0
has many edges and as a consequence this partitioning would lead to poor work
balance. To improve balance we first multiply each vertex id by a large prime to shuffle the
vertex locations.
One of the limitations of this approach is that the number of machines must either be a
perfect square. We partially address this limitation by computing the machine assignment to
the next
largest perfect square and then mapping back down to the actual number of machines.
Unfortunately, this can also lead to work imbalance and so it is suggested that a perfect
square is used.
- See Also:
- Serialized Form
Method Summary |
int |
getPartition(long src,
long dst,
int numParts)
Returns the partition number for a given edge. |
Methods inherited from class Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface scala.Product |
productArity, productElement, productIterator, productPrefix |
Methods inherited from interface scala.Equals |
canEqual, equals |
MODULE$
public static final PartitionStrategy.EdgePartition2D$ MODULE$
- Static reference to the singleton instance of this Scala object.
PartitionStrategy.EdgePartition2D$
public PartitionStrategy.EdgePartition2D$()
getPartition
public int getPartition(long src,
long dst,
int numParts)
- Description copied from interface:
PartitionStrategy
- Returns the partition number for a given edge.
- Specified by:
getPartition
in interface PartitionStrategy