public abstract class DoublesSketch extends Object implements QuantilesDoublesAPI
A k of 128 produces a normalized, rank error of about 1.7%. For example, the median returned from getQuantile(0.5) will be between the actual quantiles from the hypothetically sorted array of input quantiles at normalized ranks of 0.483 and 0.517, with a confidence of about 99%.
Table Guide for DoublesSketch Size in Bytes and Approximate Error: K => | 16 32 64 128 256 512 1,024 ~ Error => | 12.145% 6.359% 3.317% 1.725% 0.894% 0.463% 0.239% N | Size in Bytes -> ------------------------------------------------------------------------ 0 | 8 8 8 8 8 8 8 1 | 72 72 72 72 72 72 72 3 | 72 72 72 72 72 72 72 7 | 104 104 104 104 104 104 104 15 | 168 168 168 168 168 168 168 31 | 296 296 296 296 296 296 296 63 | 424 552 552 552 552 552 552 127 | 552 808 1,064 1,064 1,064 1,064 1,064 255 | 680 1,064 1,576 2,088 2,088 2,088 2,088 511 | 808 1,320 2,088 3,112 4,136 4,136 4,136 1,023 | 936 1,576 2,600 4,136 6,184 8,232 8,232 2,047 | 1,064 1,832 3,112 5,160 8,232 12,328 16,424 4,095 | 1,192 2,088 3,624 6,184 10,280 16,424 24,616 8,191 | 1,320 2,344 4,136 7,208 12,328 20,520 32,808 16,383 | 1,448 2,600 4,648 8,232 14,376 24,616 41,000 32,767 | 1,576 2,856 5,160 9,256 16,424 28,712 49,192 65,535 | 1,704 3,112 5,672 10,280 18,472 32,808 57,384 131,071 | 1,832 3,368 6,184 11,304 20,520 36,904 65,576 262,143 | 1,960 3,624 6,696 12,328 22,568 41,000 73,768 524,287 | 2,088 3,880 7,208 13,352 24,616 45,096 81,960 1,048,575 | 2,216 4,136 7,720 14,376 26,664 49,192 90,152 2,097,151 | 2,344 4,392 8,232 15,400 28,712 53,288 98,344 4,194,303 | 2,472 4,648 8,744 16,424 30,760 57,384 106,536 8,388,607 | 2,600 4,904 9,256 17,448 32,808 61,480 114,728 16,777,215 | 2,728 5,160 9,768 18,472 34,856 65,576 122,920 33,554,431 | 2,856 5,416 10,280 19,496 36,904 69,672 131,112 67,108,863 | 2,984 5,672 10,792 20,520 38,952 73,768 139,304 134,217,727 | 3,112 5,928 11,304 21,544 41,000 77,864 147,496 268,435,455 | 3,240 6,184 11,816 22,568 43,048 81,960 155,688 536,870,911 | 3,368 6,440 12,328 23,592 45,096 86,056 163,880 1,073,741,823 | 3,496 6,696 12,840 24,616 47,144 90,152 172,072 2,147,483,647 | 3,624 6,952 13,352 25,640 49,192 94,248 180,264 4,294,967,295 | 3,752 7,208 13,864 26,664 51,240 98,344 188,456
QuantilesAPI
EMPTY_MSG, MEM_REQ_SVR_NULL_MSG, NOT_SINGLE_ITEM_MSG, SELF_MERGE_MSG, TGT_IS_READ_ONLY_MSG, UNSUPPORTED_MSG
Modifier and Type | Method and Description |
---|---|
static DoublesSketchBuilder |
builder()
Returns a new builder
|
DoublesSketch |
downSample(DoublesSketch srcSketch,
int smallerK,
org.apache.datasketches.memory.WritableMemory dstMem)
From an source sketch, create a new sketch that must have a smaller K.
|
double[] |
getCDF(double[] splitPoints,
QuantileSearchCriteria searchCrit)
Returns an approximation to the Cumulative Distribution Function (CDF) of the input stream
as a monotonically increasing array of double ranks (or cumulative probabilities) on the interval [0.0, 1.0],
given a set of splitPoints.
|
static int |
getCompactSerialiedSizeBytes(int k,
long n)
Returns the number of bytes a DoublesSketch would require to store in compact form
given k and n.
|
int |
getCurrentCompactSerializedSizeBytes()
Returns the current number of bytes this sketch would require to store in the compact Memory Format.
|
int |
getCurrentUpdatableSerializedSizeBytes()
Returns the current number of bytes this sketch would require to store in the updatable Memory Format.
|
int |
getK()
Gets the user configured parameter k, which controls the accuracy of the sketch
and its memory space usage.
|
static int |
getKFromEpsilon(double epsilon,
boolean pmf)
Gets the approximate k to use given epsilon, the normalized rank error.
|
abstract double |
getMaxItem()
Returns the maximum item of the stream.
|
abstract double |
getMinItem()
Returns the minimum item of the stream.
|
abstract long |
getN()
Gets the length of the input stream offered to the sketch..
|
double |
getNormalizedRankError(boolean pmf)
Gets the approximate rank error of this sketch normalized as a fraction between zero and one.
|
static double |
getNormalizedRankError(int k,
boolean pmf)
Gets the normalized rank error given k and pmf.
|
int |
getNumRetained()
Gets the number of quantiles retained by the sketch.
|
double[] |
getPMF(double[] splitPoints,
QuantileSearchCriteria searchCrit)
Returns an approximation to the Probability Mass Function (PMF) of the input stream
as an array of probability masses as doubles on the interval [0.0, 1.0],
given a set of splitPoints.
|
double |
getQuantile(double rank,
QuantileSearchCriteria searchCrit)
Gets the approximate quantile of the given normalized rank and the given search criterion.
|
double |
getQuantileLowerBound(double rank)
Gets the lower bound of the quantile confidence interval in which the quantile of the
given rank exists.
|
double[] |
getQuantiles(double[] ranks,
QuantileSearchCriteria searchCrit)
Gets an array of quantiles from the given array of normalized ranks.
|
double |
getQuantileUpperBound(double rank)
Gets the upper bound of the quantile confidence interval in which the true quantile of the
given rank exists.
|
double |
getRank(double quantile,
QuantileSearchCriteria searchCrit)
Gets the normalized rank corresponding to the given a quantile.
|
double |
getRankLowerBound(double rank)
Gets the lower bound of the rank confidence interval in which the true rank of the
given rank exists.
|
double[] |
getRanks(double[] quantiles,
QuantileSearchCriteria searchCrit)
Gets an array of normalized ranks corresponding to the given array of quantiles and the given
search criterion.
|
double |
getRankUpperBound(double rank)
Gets the upper bound of the rank confidence interval in which the true rank of the
given rank exists.
|
int |
getSerializedSizeBytes()
Returns the current number of bytes this Sketch would require if serialized.
|
DoublesSketchSortedView |
getSortedView()
Gets the sorted view of this sketch
|
static int |
getUpdatableStorageBytes(int k,
long n)
Returns the number of bytes a sketch would require to store in updatable form.
|
abstract boolean |
hasMemory()
Returns true if this sketch's data structure is backed by Memory or WritableMemory.
|
static DoublesSketch |
heapify(org.apache.datasketches.memory.Memory srcMem)
Heapify takes the sketch image in Memory and instantiates an on-heap Sketch.
|
abstract boolean |
isDirect()
Returns true if this sketch's data structure is off-heap (a.k.a., Direct or Native memory).
|
boolean |
isEmpty()
Returns true if this sketch is empty.
|
boolean |
isEstimationMode()
Returns true if this sketch is in estimation mode.
|
abstract boolean |
isReadOnly()
Returns true if this sketch is read only.
|
boolean |
isSameResource(org.apache.datasketches.memory.Memory that)
Returns true if the backing resource of this is identical with the backing resource
of that.
|
QuantilesDoublesSketchIterator |
iterator()
Gets the iterator for this sketch, which is not sorted.
|
void |
putMemory(org.apache.datasketches.memory.WritableMemory dstMem)
Puts the current sketch into the given Memory in compact form if there is sufficient space,
otherwise, it throws an error.
|
void |
putMemory(org.apache.datasketches.memory.WritableMemory dstMem,
boolean compact)
Puts the current sketch into the given Memory if there is sufficient space, otherwise,
throws an error.
|
abstract void |
reset()
Resets this sketch to the empty state.
|
byte[] |
toByteArray()
Returns a byte array representation of this sketch.
|
byte[] |
toByteArray(boolean compact)
Serialize this sketch in a byte array form.
|
String |
toString()
Returns human readable summary information about this sketch.
|
String |
toString(boolean withLevels,
boolean withLevelsAndItems)
Returns human readable summary information about this sketch.
|
static String |
toString(byte[] byteArr)
Returns a human readable string of the preamble of a byte array image of a DoublesSketch.
|
static String |
toString(org.apache.datasketches.memory.Memory mem)
Returns a human readable string of the preamble of a Memory image of a DoublesSketch.
|
static DoublesSketch |
wrap(org.apache.datasketches.memory.Memory srcMem)
Wrap this sketch around the given Memory image of a DoublesSketch, compact or updatable.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
getCDF, getPMF, getQuantile, getQuantiles, getRank, getRanks, update
public static final DoublesSketchBuilder builder()
public static DoublesSketch heapify(org.apache.datasketches.memory.Memory srcMem)
srcMem
- a Memory image of a Sketch.
See Memorypublic static DoublesSketch wrap(org.apache.datasketches.memory.Memory srcMem)
srcMem
- the given Memory image of a DoublesSketch that may have data,public double[] getCDF(double[] splitPoints, QuantileSearchCriteria searchCrit)
QuantilesDoublesAPI
The resulting approximations have a probabilistic guarantee that can be obtained from the getNormalizedRankError(false) function.
getCDF
in interface QuantilesDoublesAPI
splitPoints
- an array of m unique, monotonically increasing items
(of the same type as the input items)
that divide the item input domain into m+1 overlapping intervals.
The start of each interval is below the lowest item retained by the sketch corresponding to a zero rank or zero probability, and the end of the interval is the rank or cumulative probability corresponding to the split point.
The (m+1)th interval represents 100% of the distribution represented by the sketch and consistent with the definition of a cumulative probability distribution, thus the (m+1)th rank or probability in the returned array is always 1.0.
If a split point exactly equals a retained item of the sketch and the search criterion is:
It is not recommended to include either the minimum or maximum items of the input stream.
searchCrit
- the desired search criteria.public abstract double getMaxItem()
QuantilesDoublesAPI
getMaxItem
in interface QuantilesDoublesAPI
public abstract double getMinItem()
QuantilesDoublesAPI
getMinItem
in interface QuantilesDoublesAPI
public double[] getPMF(double[] splitPoints, QuantileSearchCriteria searchCrit)
QuantilesDoublesAPI
The resulting approximations have a probabilistic guarantee that can be obtained from the getNormalizedRankError(true) function.
getPMF
in interface QuantilesDoublesAPI
splitPoints
- an array of m unique, monotonically increasing items
(of the same type as the input items)
that divide the item input domain into m+1 consecutive, non-overlapping intervals.
Each interval except for the end intervals starts with a split point and ends with the next split point in sequence.
The first interval starts below the lowest item retained by the sketch corresponding to a zero rank or zero probability, and ends with the first split point
The last (m+1)th interval starts with the last split point and ends after the last item retained by the sketch corresponding to a rank or probability of 1.0.
The sum of the probability masses of all (m+1) intervals is 1.0.
If the search criterion is:
It is not recommended to include either the minimum or maximum items of the input stream.
searchCrit
- the desired search criteria.public double getQuantile(double rank, QuantileSearchCriteria searchCrit)
QuantilesDoublesAPI
getQuantile
in interface QuantilesDoublesAPI
rank
- the given normalized rank, a double in the range [0.0, 1.0].searchCrit
- If INCLUSIVE, the given rank includes all quantiles ≤
the quantile directly corresponding to the given rank.
If EXCLUSIVE, he given rank includes all quantiles <
the quantile directly corresponding to the given rank.QuantileSearchCriteria
public double[] getQuantiles(double[] ranks, QuantileSearchCriteria searchCrit)
QuantilesDoublesAPI
getQuantiles
in interface QuantilesDoublesAPI
ranks
- the given array of normalized ranks, each of which must be
in the interval [0.0,1.0].searchCrit
- if INCLUSIVE, the given ranks include all quantiles ≤
the quantile directly corresponding to each rank.QuantileSearchCriteria
public double getQuantileLowerBound(double rank)
Although it is possible to estimate the probability that the true quantile exists within the quantile confidence interval specified by the upper and lower quantile bounds, it is not possible to guarantee the width of the quantile confidence interval as an additive or multiplicative percent of the true quantile.
The approximate probability that the true quantile is within the confidence interval specified by the upper and lower quantile bounds for this sketch is 0.99.getQuantileLowerBound
in interface QuantilesDoublesAPI
rank
- the given normalized rankpublic double getQuantileUpperBound(double rank)
Although it is possible to estimate the probability that the true quantile exists within the quantile confidence interval specified by the upper and lower quantile bounds, it is not possible to guarantee the width of the quantile interval as an additive or multiplicative percent of the true quantile.
The approximate probability that the true quantile is within the confidence interval specified by the upper and lower quantile bounds for this sketch is 0.99.getQuantileUpperBound
in interface QuantilesDoublesAPI
rank
- the given normalized rankpublic double getRank(double quantile, QuantileSearchCriteria searchCrit)
QuantilesDoublesAPI
getRank
in interface QuantilesDoublesAPI
quantile
- the given quantilesearchCrit
- if INCLUSIVE the given quantile is included into the rank.QuantileSearchCriteria
public double getRankLowerBound(double rank)
getRankLowerBound
in interface QuantilesAPI
rank
- the given normalized rank.public double getRankUpperBound(double rank)
getRankUpperBound
in interface QuantilesAPI
rank
- the given normalized rank.public double[] getRanks(double[] quantiles, QuantileSearchCriteria searchCrit)
QuantilesDoublesAPI
getRanks
in interface QuantilesDoublesAPI
quantiles
- the given array of quantilessearchCrit
- if INCLUSIVE, the given quantiles include the rank directly corresponding to each quantile.QuantileSearchCriteria
public int getK()
QuantilesAPI
getK
in interface QuantilesAPI
public abstract long getN()
QuantilesAPI
getN
in interface QuantilesAPI
public double getNormalizedRankError(boolean pmf)
QuantilesAPI
getNormalizedRankError
in interface QuantilesAPI
pmf
- if true, returns the "double-sided" normalized rank error for the getPMF() function.
Otherwise, it is the "single-sided" normalized rank error for all the other queries.public static double getNormalizedRankError(int k, boolean pmf)
k
- the configuration parameterpmf
- if true, returns the "double-sided" normalized rank error for the getPMF() function.
Otherwise, it is the "single-sided" normalized rank error for all the other queries.public static int getKFromEpsilon(double epsilon, boolean pmf)
epsilon
- the normalized rank error between zero and one.pmf
- if true, this function returns k assuming the input epsilon
is the desired "double-sided" epsilon for the getPMF() function. Otherwise, this function
returns k assuming the input epsilon is the desired "single-sided"
epsilon for all the other queries.public abstract boolean hasMemory()
QuantilesAPI
hasMemory
in interface QuantilesAPI
public abstract boolean isDirect()
QuantilesAPI
isDirect
in interface QuantilesAPI
public boolean isEmpty()
QuantilesAPI
isEmpty
in interface QuantilesAPI
public boolean isEstimationMode()
QuantilesAPI
isEstimationMode
in interface QuantilesAPI
public abstract boolean isReadOnly()
QuantilesAPI
isReadOnly
in interface QuantilesAPI
public boolean isSameResource(org.apache.datasketches.memory.Memory that)
that
- A different non-null objectpublic byte[] toByteArray()
QuantilesDoublesAPI
toByteArray
in interface QuantilesDoublesAPI
public byte[] toByteArray(boolean compact)
compact
- if true the sketch will be serialized in compact form.
DirectCompactDoublesSketch can wrap() only a compact byte array;
DirectUpdateDoublesSketch can wrap() only a updatable byte array.public String toString()
toString
in interface QuantilesAPI
toString
in class Object
public String toString(boolean withLevels, boolean withLevelsAndItems)
withLevels
- if true includes sketch levels array summary informationwithLevelsAndItems
- if true include detail of levels array and items array togetherpublic static String toString(byte[] byteArr)
byteArr
- the given byte arraypublic static String toString(org.apache.datasketches.memory.Memory mem)
mem
- the given Memorypublic DoublesSketch downSample(DoublesSketch srcSketch, int smallerK, org.apache.datasketches.memory.WritableMemory dstMem)
srcSketch
- the sourcing sketchsmallerK
- the new sketch's K that must be smaller than this K.
It is required that this.getK() = smallerK * 2^(nonnegative integer).dstMem
- the destination Memory. It must not overlap the Memory of this sketch.
If null, a heap sketch will be returned, otherwise it will be off-heap.public int getNumRetained()
QuantilesAPI
getNumRetained
in interface QuantilesAPI
public int getCurrentCompactSerializedSizeBytes()
public static int getCompactSerialiedSizeBytes(int k, long n)
k
- the size configuration parameter for the sketchn
- the number of quantiles input into the sketchpublic int getSerializedSizeBytes()
QuantilesDoublesAPI
getSerializedSizeBytes
in interface QuantilesDoublesAPI
public int getCurrentUpdatableSerializedSizeBytes()
public static int getUpdatableStorageBytes(int k, long n)
k
- the size configuration parameter for the sketchn
- the number of quantiles input into the sketchpublic void putMemory(org.apache.datasketches.memory.WritableMemory dstMem)
dstMem
- the given memory.public void putMemory(org.apache.datasketches.memory.WritableMemory dstMem, boolean compact)
dstMem
- the given memory.compact
- if true, compacts and sorts the base buffer, which optimizes merge
performance at the cost of slightly increased serialization time.public QuantilesDoublesSketchIterator iterator()
QuantilesDoublesAPI
iterator
in interface QuantilesDoublesAPI
public abstract void reset()
The parameter k will not change.
The parameter k will not change.
reset
in interface QuantilesAPI
public final DoublesSketchSortedView getSortedView()
QuantilesDoublesAPI
getSortedView
in interface QuantilesDoublesAPI
Copyright © 2015–2024 The Apache Software Foundation. All rights reserved.