T
- the data typeS
- the quantiles sketch that implements both QuantilesGenericAPI and PartitioningFeature.public class Partitioner<T,S extends QuantilesGenericAPI<T> & PartitioningFeature<T>> extends Object
The code included here does work fine for moderate sized partitioning tasks. As an example, using the test code in the test branch with the partitioning task of splitting a data set of 1 billion items into 324 partitions of size 3M items completed in under 3 minutes, which was performed on a single CPU. For much larger partitioning tasks, it is recommended that this code be leveraged into a parallelized systems environment.
Modifier and Type | Class and Description |
---|---|
static class |
Partitioner.PartitionBoundsRow<T>
Defines a row for List of PartitionBounds.
|
static class |
Partitioner.StackElement<T>
Holds data for a Stack element
|
Constructor and Description |
---|
Partitioner(long tgtPartitionSize,
int maxPartsPerPass,
SketchFillRequest<T,S> fillReq)
This constructor assumes a QuantileSearchCriteria of INCLUSIVE.
|
Partitioner(long tgtPartitionSize,
int maxPartsPerSk,
SketchFillRequest<T,S> fillReq,
QuantileSearchCriteria criteria)
This constructor includes the QuantileSearchCriteria criteria as a parameter.
|
Modifier and Type | Method and Description |
---|---|
List<Partitioner.PartitionBoundsRow<T>> |
partition(S sk)
This initiates the partitioning process
|
public Partitioner(long tgtPartitionSize, int maxPartsPerPass, SketchFillRequest<T,S> fillReq)
tgtPartitionSize
- the target size of the resulting partitions in number of items.maxPartsPerPass
- The maximum number of partitions to request from the sketch. The smaller this number is
the smaller the variance will be of the resulting partitions, but this will increase the number of passes of the
source data set.fillReq
- The is an implementation of the SketchFillRequest call-back supplied by the user and implements
the SketchFillRequest interface.public Partitioner(long tgtPartitionSize, int maxPartsPerSk, SketchFillRequest<T,S> fillReq, QuantileSearchCriteria criteria)
tgtPartitionSize
- the target size of the resulting partitions in number of items.maxPartsPerSk
- The maximum number of partitions to request from the sketch. The smaller this number is
the smaller the variance will be of the resulting partitions, but this will increase the number of passes of the
source data set.fillReq
- The is an implementation of the SketchFillRequest call-back supplied by the user.criteria
- This is the desired QuantileSearchCriteria to be used.public List<Partitioner.PartitionBoundsRow<T>> partition(S sk)
sk
- A sketch of the entire data set.Copyright © 2015–2024 The Apache Software Foundation. All rights reserved.