T
- the item class typepublic final class EbppsItemsSketch<T> extends Object
From: "Exact PPS Sampling with Bounded Sample Size", B. Hentschel, P. J. Haas, Y. Tian. Information Processing Letters, 2023.
This sketch samples data from a stream of items proportional to the weight of each item. The sample guarantees the presence of an item in the result is proportional to that item's portion of the total weight seen by the sketch, and returns a sample no larger than size k.
The sample may be smaller than k and the resulting size of the sample potentially includes a probabilistic component, meaning the resulting sample size is not always constant.
Constructor and Description |
---|
EbppsItemsSketch(int k)
Constructor
|
Modifier and Type | Method and Description |
---|---|
double |
getC()
Returns the expected number of samples returned upon a call to
getResult().
|
double |
getCumulativeWeight()
Returns the cumulative weight of items processed by the sketch.
|
int |
getK()
Returns the configured maximum sample size.
|
long |
getN()
Returns the number of items processed by the sketch, regardless
of item weight.
|
ArrayList<T> |
getResult()
Returns a copy of the current sample.
|
int |
getSerializedSizeBytes(ArrayOfItemsSerDe<? super T> serDe)
Returns the size of a byte array representation of this sketch.
|
int |
getSerializedSizeBytes(ArrayOfItemsSerDe<? super T> serDe,
Class<?> clazz)
Returns the length of a byte array representation of this sketch.
|
static <T> EbppsItemsSketch<T> |
heapify(org.apache.datasketches.memory.Memory srcMem,
ArrayOfItemsSerDe<T> serDe)
Returns a sketch instance of this class from the given srcMem,
which must be a Memory representation of this sketch class.
|
boolean |
isEmpty()
Returns true if the sketch is empty.
|
void |
merge(EbppsItemsSketch<T> other)
Merges the provided sketch into the current one.
|
void |
reset()
Resets the sketch to its default, empty state.
|
byte[] |
toByteArray(ArrayOfItemsSerDe<? super T> serDe)
Returns a byte array representation of this sketch.
|
byte[] |
toByteArray(ArrayOfItemsSerDe<? super T> serDe,
Class<?> clazz)
Returns a byte array representation of this sketch.
|
String |
toString()
Provides a human-readable summary of the sketch
|
void |
update(T item)
Updates this sketch with the given data item with weight 1.0.
|
void |
update(T item,
double weight)
Updates this sketch with the given data item with the given weight.
|
public EbppsItemsSketch(int k)
k
- The maximum number of samples to retainpublic static <T> EbppsItemsSketch<T> heapify(org.apache.datasketches.memory.Memory srcMem, ArrayOfItemsSerDe<T> serDe)
T
- The type of item this sketch containssrcMem
- a Memory representation of a sketch of this class.
See MemoryserDe
- An instance of ArrayOfItemsSerDepublic void update(T item)
item
- an item from a stream of itemspublic void update(T item, double weight)
item
- an item from a stream of itemsweight
- the weight of the itempublic void merge(EbppsItemsSketch<T> other)
other
- the sketch to merge into the current objectpublic ArrayList<T> getResult()
public String toString()
public int getK()
public long getN()
public double getCumulativeWeight()
public double getC()
The value C should be no larger than the sketch's configured value of k, although numerical precision limitations mean it may exceed k by double precision floating point error margins in certain cases.
public boolean isEmpty()
public void reset()
public int getSerializedSizeBytes(ArrayOfItemsSerDe<? super T> serDe)
serDe
- An instance of ArrayOfItemsSerDepublic int getSerializedSizeBytes(ArrayOfItemsSerDe<? super T> serDe, Class<?> clazz)
serDe
- An instance of ArrayOfItemsSerDeclazz
- The class represented by <T>public byte[] toByteArray(ArrayOfItemsSerDe<? super T> serDe)
serDe
- An instance of ArrayOfItemsSerDepublic byte[] toByteArray(ArrayOfItemsSerDe<? super T> serDe, Class<?> clazz)
serDe
- An instance of ArrayOfItemsSerDeclazz
- The class represented by <T>Copyright © 2015–2024 The Apache Software Foundation. All rights reserved.