public class ReqSketch extends Object
This implementation differs from the algorithm described in the paper in the following:
This implementation provides a number of capabilities not discussed in the paper or provided in the Python prototype.
Modifier and Type | Method and Description |
---|---|
static ReqSketchBuilder |
builder()
Returns a new ReqSketchBuilder
|
double[] |
getCDF(float[] splitPoints)
Returns an approximation to the Cumulative Distribution Function (CDF), which is the
cumulative analog of the PMF, of the input stream given a set of splitPoint (values).
|
boolean |
getHighRankAccuracy()
If true, the high ranks are prioritized for better accuracy.
|
float |
getMaxValue()
Gets the largest value seen by this sketch
|
float |
getMinValue()
Gets the smallest value seen by this sketch
|
long |
getN()
Gets the total number of items offered to the sketch.
|
double[] |
getPMF(float[] splitPoints)
Returns an approximation to the Probability Mass Function (PMF) of the input stream
given a set of splitPoints (values).
|
float |
getQuantile(double normRank)
Gets the approximate quantile of the given normalized rank based on the lteq criterion.
|
float[] |
getQuantiles(double[] normRanks)
Gets an array of quantiles that correspond to the given array of normalized ranks.
|
double |
getRank(float value)
Computes the normalized rank of the given value in the stream.
|
double |
getRankLowerBound(double rank,
int numStdDev)
returns an approximate lower bound rank of the given noramalized rank.
|
double[] |
getRanks(float[] values)
Gets an array of normalized ranks that correspond to the given array of values.
|
double |
getRankUpperBound(double rank,
int numStdDev)
Returns an approximate upper bound rank of the given rank.
|
int |
getRetainedItems()
Gets the number of retained items of this sketch
|
double |
getRSE(int k,
double rank,
boolean hra,
long totalN)
Returns an a priori estimate of relative standard error (RSE, expressed as a number in [0,1]).
|
int |
getSerializationBytes()
Gets the number of bytes when serialized.
|
static ReqSketch |
heapify(org.apache.datasketches.memory.Memory mem)
Returns an ReqSketch on the heap from a Memory image of the sketch.
|
boolean |
isEmpty()
Returns true if this sketch is empty.
|
boolean |
isEstimationMode()
Returns true if this sketch is in estimation mode.
|
boolean |
isLessThanOrEqual()
Returns the current comparison criterion.
|
ReqIterator |
iterator()
Returns an iterator for all the items in this sketch.
|
ReqSketch |
merge(ReqSketch other)
Merge other sketch into this one.
|
ReqSketch |
reset()
Resets this sketch by removing all data and setting all data related variables to their
virgin state.
|
ReqSketch |
setLessThanOrEqual(boolean ltEq)
Sets the chosen criterion for value comparison
|
byte[] |
toByteArray()
Returns a byte array representation of this sketch.
|
String |
toString()
Returns a summary of the key parameters of the sketch.
|
void |
update(float item)
Updates this sketch with the given item.
|
String |
viewCompactorDetail(String fmt,
boolean allData)
A detailed, human readable view of the sketch compactors and their data.
|
public static ReqSketch heapify(org.apache.datasketches.memory.Memory mem)
mem
- The Memory object holding a valid image of an ReqSketchpublic static final ReqSketchBuilder builder()
public double[] getCDF(float[] splitPoints)
The resulting approximations have a probabilistic guarantee that be obtained, a priori,
from the getRSE(int, double, boolean, long)
function.
If the sketch is empty this returns null.
splitPoints
- an array of m unique, monotonically increasing double values
that divide the real number line into m+1 consecutive disjoint intervals.
The definition of an "interval" is inclusive of the left splitPoint (or minimum value) and
exclusive of the right splitPoint, with the exception that the last interval will include
the maximum value.
It is not necessary to include either the min or max values in these split points.public boolean getHighRankAccuracy()
public float getMaxValue()
public float getMinValue()
public long getN()
public double[] getPMF(float[] splitPoints)
The resulting approximations have a probabilistic guarantee that be obtained, a priori,
from the getRSE(int, double, boolean, long)
function.
If the sketch is empty this returns null.
splitPoints
- an array of m unique, monotonically increasing double values
that divide the real number line into m+1 consecutive disjoint intervals.
The definition of an "interval" is inclusive of the left splitPoint (or minimum value) and
exclusive of the right splitPoint, with the exception that the last interval will include
the maximum value.
It is not necessary to include either the min or max values in these splitpoints.public float getQuantile(double normRank)
normRank
- the given normalized rankpublic float[] getQuantiles(double[] normRanks)
normRanks
- the given array of normalized ranks.getQuantile(double)
public double getRank(float value)
value
- the given valuepublic double[] getRanks(float[] values)
values
- the given array of values.getRank(float)
public double getRankLowerBound(double rank, int numStdDev)
rank
- the given rank, a value between 0 and 1.0.numStdDev
- the number of standard deviations. Must be 1, 2, or 3.public double getRankUpperBound(double rank, int numStdDev)
rank
- the given rank, a value between 0 and 1.0.numStdDev
- the number of standard deviations. Must be 1, 2, or 3.public int getRetainedItems()
public double getRSE(int k, double rank, boolean hra, long totalN)
k
- the given value of krank
- the given normalized rank, a number in [0,1].hra
- if true High Rank Accuracy mode is being selected, otherwise, Low Rank Accuracy.totalN
- an estimate of the total number of items submitted to the sketch.public int getSerializationBytes()
public boolean isEmpty()
public boolean isEstimationMode()
public boolean isLessThanOrEqual()
public ReqIterator iterator()
public ReqSketch merge(ReqSketch other)
other
- sketch to be merged into this one.public ReqSketch reset()
public ReqSketch setLessThanOrEqual(boolean ltEq)
ltEq
- (Less-than-or Equals) If true, the sketch will use the ≤ criterion for comparing
values. Otherwise, the criterion is strictly <, the default.
This can be set anytime prior to a getRank(float)
or getQuantile(double)
or
equivalent query.public byte[] toByteArray()
public String toString()
public void update(float item)
item
- the given itempublic String viewCompactorDetail(String fmt, boolean allData)
fmt
- the format string for the data items; example: "%4.0f".allData
- all the retained items for the sketch will be output by
compactory level. Otherwise, just a summary will be output.Copyright © 2015–2021 The Apache Software Foundation. All rights reserved.