T
- The type of object held in the reservoir.public final class ReservoirItemsSketch<T> extends Object
Modifier and Type | Method and Description |
---|---|
SampleSubsetSummary |
estimateSubsetSum(Predicate<T> predicate)
Computes an estimated subset sum from the entire stream for objects matching a given
predicate.
|
int |
getK()
Returns the sketch's value of k, the maximum number of samples stored in the
reservoir.
|
long |
getN()
Returns the number of items processed from the input stream
|
int |
getNumSamples()
Returns the current number of items in the reservoir, which may be smaller than the
reservoir capacity.
|
T[] |
getSamples()
Returns a copy of the items in the reservoir, or null if empty.
|
T[] |
getSamples(Class<?> clazz)
Returns a copy of the items in the reservoir as members of Class clazz, or null
if empty.
|
static <T> ReservoirItemsSketch<T> |
heapify(org.apache.datasketches.memory.Memory srcMem,
ArrayOfItemsSerDe<T> serDe)
Returns a sketch instance of this class from the given srcMem,
which must be a Memory representation of this sketch class.
|
static <T> ReservoirItemsSketch<T> |
newInstance(int k)
Construct a mergeable sampling sketch with up to k samples using the default resize
factor (8).
|
static <T> ReservoirItemsSketch<T> |
newInstance(int k,
ResizeFactor rf)
Construct a mergeable sampling sketch with up to k samples using a specified resize factor.
|
void |
reset()
Resets this sketch to the empty state, but retains the original value of k.
|
byte[] |
toByteArray(ArrayOfItemsSerDe<? super T> serDe)
Returns a byte array representation of this sketch.
|
byte[] |
toByteArray(ArrayOfItemsSerDe<? super T> serDe,
Class<?> clazz)
Returns a byte array representation of this sketch.
|
String |
toString()
Returns a human-readable summary of the sketch, without items.
|
static String |
toString(byte[] byteArr)
Returns a human readable string of the preamble of a byte array image of a ReservoirItemsSketch.
|
static String |
toString(org.apache.datasketches.memory.Memory mem)
Returns a human readable string of the preamble of a Memory image of a ReservoirItemsSketch.
|
void |
update(T item)
Randomly decide whether or not to include an item in the sample set.
|
public static <T> ReservoirItemsSketch<T> newInstance(int k)
T
- The type of object held in the reservoir.k
- Maximum size of sampling. Allocated size may be smaller until reservoir fills.
Unlike many sketches in this package, this value does not need to be a
power of 2.public static <T> ReservoirItemsSketch<T> newInstance(int k, ResizeFactor rf)
T
- The type of object held in the reservoir.k
- Maximum size of sampling. Allocated size may be smaller until reservoir fills.
Unlike many sketches in this package, this value does not need to be a
power of 2.rf
- See Resize Factorpublic static <T> ReservoirItemsSketch<T> heapify(org.apache.datasketches.memory.Memory srcMem, ArrayOfItemsSerDe<T> serDe)
T
- The type of item this sketch containssrcMem
- a Memory representation of a sketch of this class.
See MemoryserDe
- An instance of ArrayOfItemsSerDepublic int getK()
public long getN()
public int getNumSamples()
public void update(T item)
item
- a unit-weight (equivalently, unweighted) item of the set being sampled frompublic void reset()
public T[] getSamples()
In order to allocate an array of generic type T, uses the class of the first item in
the array. This method method may throw an ArrayAssignmentException
if the
reservoir stores instances of a polymorphic base class.
public T[] getSamples(Class<?> clazz)
This method allocates an array of class clazz, which must either match or extend T. This method should be used when objects in the array are all instances of T but are not necessarily instances of the base class.
clazz
- A class to which the items are cast before returningpublic String toString()
public static String toString(byte[] byteArr)
byteArr
- the given byte arraypublic static String toString(org.apache.datasketches.memory.Memory mem)
mem
- the given Memorypublic byte[] toByteArray(ArrayOfItemsSerDe<? super T> serDe)
serDe
- An instance of ArrayOfItemsSerDepublic byte[] toByteArray(ArrayOfItemsSerDe<? super T> serDe, Class<?> clazz)
serDe
- An instance of ArrayOfItemsSerDeclazz
- The class represented by <T>public SampleSubsetSummary estimateSubsetSum(Predicate<T> predicate)
This is technically a heuristic method, and tries to err on the conservative side.
predicate
- A predicate to use when identifying items.Copyright © 2015–2024 The Apache Software Foundation. All rights reserved.