T
- The specific Java type for this sketchpublic final class ReservoirItemsUnion<T> extends Object
For efficiency reasons, the unioning process picks one of the two sketches to use as the base. As a result, we provide only a stateful union. Using the same approach for a merge would result in unpredictable side effects on the underlying sketches.
A union object is created with a maximum value of k, represented using the ReservoirSize class. The unioning process may cause the actual number of samples to fall below that maximum value, but never to exceed it. The result of a union will be a reservoir where each item from the global input has a uniform probability of selection, but there are no claims about higher order statistics. For instance, in general all possible permutations of the global input are not equally likely.
If taking the union of two reservoirs of different sizes, the output sample will contain no more than MIN(k_1, k_2) samples.
Modifier and Type | Method and Description |
---|---|
int |
getMaxK()
Returns the maximum allowed reservoir capacity in this union.
|
ReservoirItemsSketch<T> |
getResult()
Returns a sketch representing the current state of the union.
|
static <T> ReservoirItemsUnion<T> |
heapify(org.apache.datasketches.memory.Memory srcMem,
ArrayOfItemsSerDe<T> serDe)
Instantiates a Union from Memory
|
static <T> ReservoirItemsUnion<T> |
newInstance(int maxK)
Creates an empty Union with a maximum reservoir capacity of size k.
|
byte[] |
toByteArray(ArrayOfItemsSerDe<T> serDe)
Returns a byte array representation of this union
|
byte[] |
toByteArray(ArrayOfItemsSerDe<T> serDe,
Class<?> clazz)
Returns a byte array representation of this union.
|
String |
toString()
Returns a human-readable summary of the sketch, without items.
|
void |
update(long n,
int k,
ArrayList<T> input)
Present this union with raw elements of a sketch.
|
void |
update(org.apache.datasketches.memory.Memory mem,
ArrayOfItemsSerDe<T> serDe)
Union the given Memory image of the sketch.
|
void |
update(ReservoirItemsSketch<T> sketchIn)
Union the given sketch.
|
void |
update(T datum)
Present this union with a single item to be added to the union.
|
public static <T> ReservoirItemsUnion<T> newInstance(int maxK)
T
- The type of item this sketch containsmaxK
- The maximum allowed reservoir capacity for any sketches in the unionpublic static <T> ReservoirItemsUnion<T> heapify(org.apache.datasketches.memory.Memory srcMem, ArrayOfItemsSerDe<T> serDe)
T
- The type of item this sketch containssrcMem
- Memory object containing a serialized unionserDe
- An instance of ArrayOfItemsSerDepublic int getMaxK()
public void update(ReservoirItemsSketch<T> sketchIn)
sketchIn
- The incoming sketch.public void update(org.apache.datasketches.memory.Memory mem, ArrayOfItemsSerDe<T> serDe)
This method can be repeatedly called. If the given sketch is null it is interpreted as an empty sketch.
mem
- Memory image of sketch to be mergedserDe
- An instance of ArrayOfItemsSerDepublic void update(T datum)
datum
- The given datum of type T.public void update(long n, int k, ArrayList<T> input)
n
- Total items seenk
- Reservoir sizeinput
- Reservoir samplespublic ReservoirItemsSketch<T> getResult()
public byte[] toByteArray(ArrayOfItemsSerDe<T> serDe)
serDe
- An instance of ArrayOfItemsSerDepublic String toString()
public byte[] toByteArray(ArrayOfItemsSerDe<T> serDe, Class<?> clazz)
serDe
- An instance of ArrayOfItemsSerDeclazz
- A class to which the items are cast before serializationCopyright © 2015–2021 The Apache Software Foundation. All rights reserved.