public abstract class Sketch extends Object implements MemoryStatus
Modifier and Type | Method and Description |
---|---|
CompactSketch |
compact()
Converts this sketch to a ordered CompactSketch.
|
abstract CompactSketch |
compact(boolean dstOrdered,
org.apache.datasketches.memory.WritableMemory dstMem)
Convert this sketch to a CompactSketch.
|
abstract int |
getCompactBytes()
Returns the number of storage bytes required for this Sketch if its current state were
compacted.
|
static int |
getCompactSketchMaxBytes(int lgNomEntries)
Returns the maximum number of storage bytes required for a CompactSketch given the configured
log_base2 of the number of nominal entries, which is a power of 2.
|
int |
getCountLessThanThetaLong(long thetaLong)
Gets the number of hash values less than the given theta expressed as a long.
|
abstract int |
getCurrentBytes()
Returns the number of storage bytes required for this sketch in its current state.
|
abstract double |
getEstimate()
Gets the unique count estimate.
|
abstract Family |
getFamily()
Returns the Family that this sketch belongs to
|
double |
getLowerBound(int numStdDev)
Gets the approximate lower error bound given the specified number of Standard Deviations.
|
static int |
getMaxCompactSketchBytes(int numberOfEntries)
Returns the maximum number of storage bytes required for a CompactSketch with the given
number of actual entries.
|
static int |
getMaxUpdateSketchBytes(int nomEntries)
Returns the maximum number of storage bytes required for an UpdateSketch with the given
number of nominal entries (power of 2).
|
int |
getRetainedEntries()
Returns the number of valid entries that have been retained by the sketch.
|
abstract int |
getRetainedEntries(boolean valid)
Returns the number of entries that have been retained by the sketch.
|
static int |
getSerializationVersion(org.apache.datasketches.memory.Memory mem)
Returns the serialization version from the given Memory
|
double |
getTheta()
Gets the value of theta as a double with a value between zero and one
|
abstract long |
getThetaLong()
Gets the value of theta as a long
|
double |
getUpperBound(int numStdDev)
Gets the approximate upper error bound given the specified number of Standard Deviations.
|
static Sketch |
heapify(org.apache.datasketches.memory.Memory srcMem)
Heapify takes the sketch image in Memory and instantiates an on-heap Sketch.
|
static Sketch |
heapify(org.apache.datasketches.memory.Memory srcMem,
long expectedSeed)
Heapify takes the sketch image in Memory and instantiates an on-heap Sketch.
|
abstract boolean |
isCompact()
Returns true if this sketch is in compact form.
|
abstract boolean |
isEmpty()
|
boolean |
isEstimationMode()
Returns true if the sketch is Estimation Mode (as opposed to Exact Mode).
|
abstract boolean |
isOrdered()
Returns true if internal cache is ordered
|
abstract HashIterator |
iterator()
Returns a HashIterator that can be used to iterate over the retained hash values of the
Theta sketch.
|
abstract byte[] |
toByteArray()
Serialize this sketch to a byte array form.
|
String |
toString()
Returns a human readable summary of the sketch.
|
String |
toString(boolean sketchSummary,
boolean dataDetail,
int width,
boolean hexMode)
Gets a human readable listing of contents and summary of the given sketch.
|
static String |
toString(byte[] byteArr)
Returns a human readable string of the preamble of a byte array image of a Theta Sketch.
|
static String |
toString(org.apache.datasketches.memory.Memory mem)
Returns a human readable string of the preamble of a Memory image of a Theta Sketch.
|
static Sketch |
wrap(org.apache.datasketches.memory.Memory srcMem)
Wrap takes the sketch image in the given Memory and refers to it directly.
|
static Sketch |
wrap(org.apache.datasketches.memory.Memory srcMem,
long expectedSeed)
Wrap takes the sketch image in the given Memory and refers to it directly.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
hasMemory, isDirect, isSameResource
public static Sketch heapify(org.apache.datasketches.memory.Memory srcMem)
The resulting sketch will not retain any link to the source Memory.
For Update Sketches this method checks if the Default Update Seed
was used to create the source Memory image.For Compact Sketches this method assumes that the sketch image was created with the correct hash seed, so it is not checked.
srcMem
- an image of a Sketch.
See Memory.public static Sketch heapify(org.apache.datasketches.memory.Memory srcMem, long expectedSeed)
The resulting sketch will not retain any link to the source Memory.
For Update and Compact Sketches this method checks if the given expectedSeed was used to create the source Memory image. However, SerialVersion 1 sketches cannot be checked.
srcMem
- an image of a Sketch that was created using the given expectedSeed.
See Memory.expectedSeed
- the seed used to validate the given Memory image.
See Update Hash Seed.
Compact sketches store a 16-bit hash of the seed, but not the seed itself.public static Sketch wrap(org.apache.datasketches.memory.Memory srcMem)
Only "Direct" Serialization Version 3 (i.e, OpenSource) sketches that have been explicitly stored as direct sketches can be wrapped. Wrapping earlier serial version sketches will result in a on-heap CompactSketch where all data will be copied to the heap. These early versions were never designed to "wrap".
Wrapping any subclass of this class that is empty or contains only a single item will result in on-heap equivalent forms of empty and single item sketch respectively. This is actually faster and consumes less overall memory.
For Update Sketches this method checks if the Default Update Seed
was used to create the source Memory image.For Compact Sketches this method assumes that the sketch image was created with the correct hash seed, so it is not checked.
srcMem
- an image of a Sketch.
See Memory.public static Sketch wrap(org.apache.datasketches.memory.Memory srcMem, long expectedSeed)
Only "Direct" Serialization Version 3 (i.e, OpenSource) sketches that have been explicitly stored as direct sketches can be wrapped. Wrapping earlier serial version sketches will result in a on-heap CompactSketch where all data will be copied to the heap. These early versions were never designed to "wrap".
Wrapping any subclass of this class that is empty or contains only a single item will result in on-heap equivalent forms of empty and single item sketch respectively. This is actually faster and consumes less overall memory.
For Update and Compact Sketches this method checks if the given expectedSeed was used to create the source Memory image. However, SerialVersion 1 sketches cannot be checked.
srcMem
- an image of a Sketch.
See MemoryexpectedSeed
- the seed used to validate the given Memory image.
See Update Hash Seed.public CompactSketch compact()
If this.isCompact() == true this method returns this,
otherwise, this method is equivalent to
compact(true, null)
.
A CompactSketch is always immutable.
public abstract CompactSketch compact(boolean dstOrdered, org.apache.datasketches.memory.WritableMemory dstMem)
If this sketch is a type of UpdateSketch, the compacting process converts the hash table of the UpdateSketch to a simple list of the valid hash values. Any hash values of zero or equal-to or greater than theta will be discarded. The number of valid values remaining in the CompactSketch depends on a number of factors, but may be larger or smaller than Nominal Entries (or k). It will never exceed 2k. If it is critical to always limit the size to no more than k, then rebuild() should be called on the UpdateSketch prior to calling this method.
A CompactSketch is always immutable.
A new CompactSketch object is created:
Otherwise, this operation returns this.
dstOrdered
- assumed true if this sketch is empty or has only one value
See Destination OrdereddstMem
- See Destination Memory.public abstract int getCompactBytes()
getCurrentBytes()
.public int getCountLessThanThetaLong(long thetaLong)
thetaLong
- the given theta as a long between zero and Long.MAX_VALUE.public abstract int getCurrentBytes()
public abstract double getEstimate()
public abstract Family getFamily()
public double getLowerBound(int numStdDev)
numStdDev
- See Number of Standard Deviationspublic static int getMaxCompactSketchBytes(int numberOfEntries)
numberOfEntries
- the actual number of retained entries stored in the sketch.public static int getCompactSketchMaxBytes(int lgNomEntries)
lgNomEntries
- Nominal Entriespublic static int getMaxUpdateSketchBytes(int nomEntries)
nomEntries
- Nominal Entries
This will become the ceiling power of 2 if it is not.public int getRetainedEntries()
public abstract int getRetainedEntries(boolean valid)
valid
- if true, returns the number of valid entries, which are less than theta and used
for estimation.
Otherwise, return the number of all entries, valid or not, that are currently in the internal
sketch cache.public static int getSerializationVersion(org.apache.datasketches.memory.Memory mem)
mem
- the sketch Memorypublic double getTheta()
public abstract long getThetaLong()
public double getUpperBound(int numStdDev)
numStdDev
- See Number of Standard Deviationspublic abstract boolean isCompact()
public abstract boolean isEmpty()
public boolean isEstimationMode()
public abstract boolean isOrdered()
public abstract HashIterator iterator()
public abstract byte[] toByteArray()
public String toString()
public String toString(boolean sketchSummary, boolean dataDetail, int width, boolean hexMode)
sketchSummary
- If true the sketch summary will be output at the end.dataDetail
- If true, includes all valid hash values in the sketch.width
- The number of columns of hash values. Default is 8.hexMode
- If true, hashes will be output in hex.public static String toString(byte[] byteArr)
byteArr
- the given byte arraypublic static String toString(org.apache.datasketches.memory.Memory mem)
mem
- the given Memory objectCopyright © 2015–2024 The Apache Software Foundation. All rights reserved.