public final class BloomFilterBuilder extends Object
This class provides methods to help estimate the correct parameters when creating a Bloom filter, and methods to create the filter using those values.
The underlying math is described in the Wikipedia article on Bloom filters.
Constructor and Description |
---|
BloomFilterBuilder() |
Modifier and Type | Method and Description |
---|---|
static BloomFilter |
createByAccuracy(long maxDistinctItems,
double targetFalsePositiveProb)
Creates a new BloomFilter with an optimal number of bits and hash functions for the given inputs,
using a random base seed for the hash function.
|
static BloomFilter |
createByAccuracy(long maxDistinctItems,
double targetFalsePositiveProb,
long seed)
Creates a new BloomFilter with an optimal number of bits and hash functions for the given inputs,
using the provided base seed for the hash function.
|
static BloomFilter |
createBySize(long numBits,
int numHashes)
Creates a BloomFilter with given number of bits and number of hash functions,
using a rnadom base seed for the hash function.
|
static BloomFilter |
createBySize(long numBits,
int numHashes,
long seed)
Creates a BloomFilter with given number of bits and number of hash functions,
using the provided base seed for the hash function.
|
static long |
getSerializedFilterSize(long numBits)
Returns the minimum memory size, in bytes, needed for a serialized BloomFilter with the given number of bits.
|
static long |
getSerializedFilterSizeByAccuracy(long maxDistinctItems,
double targetFalsePositiveProb)
Returns the minimum memory size, in bytes, needed for a serialized BloomFilter with an optimal number of bits
and hash functions for the given inputs.
|
static BloomFilter |
initializeByAccuracy(long maxDistinctItems,
double targetFalsePositiveProb,
long seed,
org.apache.datasketches.memory.WritableMemory dstMem)
Creates a new BloomFilter with an optimal number of bits and hash functions for the given inputs,
using the provided base seed for the hash function and writing into the provided WritableMemory.
|
static BloomFilter |
initializeByAccuracy(long maxDistinctItems,
double targetFalsePositiveProb,
org.apache.datasketches.memory.WritableMemory dstMem)
Creates a new BloomFilter with an optimal number of bits and hash functions for the given inputs,
using a random base seed for the hash function and writing into the provided WritableMemory.
|
static BloomFilter |
initializeBySize(long numBits,
int numHashes,
long seed,
org.apache.datasketches.memory.WritableMemory dstMem)
Initializes a BloomFilter with given number of bits and number of hash functions,
using the provided base seed for the hash function and writing into the provided WritableMemory.
|
static BloomFilter |
initializeBySize(long numBits,
int numHashes,
org.apache.datasketches.memory.WritableMemory dstMem)
Initializes a BloomFilter with given number of bits and number of hash functions,
using a random base seed for the hash function and writing into the provided WritableMemory.
|
static long |
suggestNumFilterBits(long maxDistinctItems,
double targetFalsePositiveProb)
Returns the optimal number of bits to use in a Bloom Filter given a target number of distinct
items and a target false positive probability.
|
static short |
suggestNumHashes(double targetFalsePositiveProb)
Returns the optimal number of hash functions to achieve a target false positive probability.
|
static short |
suggestNumHashes(long maxDistinctItems,
long numFilterBits)
Returns the optimal number of hash functions to given target numbers of distinct items
and the BloomFilter size in bits.
|
public static short suggestNumHashes(long maxDistinctItems, long numFilterBits)
maxDistinctItems
- The maximum expected number of distinct items to add to the filternumFilterBits
- The intended size of the Bloom Filter in bitspublic static short suggestNumHashes(double targetFalsePositiveProb)
targetFalsePositiveProb
- A desired false positive probability per itempublic static long suggestNumFilterBits(long maxDistinctItems, double targetFalsePositiveProb)
maxDistinctItems
- The maximum expected number of distinct items to add to the filtertargetFalsePositiveProb
- A desired false positive probability per itempublic static long getSerializedFilterSizeByAccuracy(long maxDistinctItems, double targetFalsePositiveProb)
maxDistinctItems
- The maximum expected number of distinct items to add to the filtertargetFalsePositiveProb
- A desired false positive probability per itempublic static long getSerializedFilterSize(long numBits)
numBits
- The number of bits in the target BloomFilter's bit array.public static BloomFilter createByAccuracy(long maxDistinctItems, double targetFalsePositiveProb)
maxDistinctItems
- The maximum expected number of distinct items to add to the filtertargetFalsePositiveProb
- A desired false positive probability per itempublic static BloomFilter createByAccuracy(long maxDistinctItems, double targetFalsePositiveProb, long seed)
maxDistinctItems
- The maximum expected number of distinct items to add to the filtertargetFalsePositiveProb
- A desired false positive probability per itemseed
- A base hash seedpublic static BloomFilter createBySize(long numBits, int numHashes)
numBits
- The size of the BloomFilter, in bitsnumHashes
- The number of hash functions to apply to itemspublic static BloomFilter createBySize(long numBits, int numHashes, long seed)
numBits
- The size of the BloomFilter, in bitsnumHashes
- The number of hash functions to apply to itemsseed
- A base hash seedpublic static BloomFilter initializeByAccuracy(long maxDistinctItems, double targetFalsePositiveProb, org.apache.datasketches.memory.WritableMemory dstMem)
maxDistinctItems
- The maximum expected number of distinct items to add to the filtertargetFalsePositiveProb
- A desired false positive probability per itemdstMem
- A WritableMemory to hold the initialized filterpublic static BloomFilter initializeByAccuracy(long maxDistinctItems, double targetFalsePositiveProb, long seed, org.apache.datasketches.memory.WritableMemory dstMem)
maxDistinctItems
- The maximum expected number of distinct items to add to the filtertargetFalsePositiveProb
- A desired false positive probability per itemseed
- A base hash seeddstMem
- A WritableMemory to hold the initialized filterpublic static BloomFilter initializeBySize(long numBits, int numHashes, org.apache.datasketches.memory.WritableMemory dstMem)
numBits
- The size of the BloomFilter, in bitsnumHashes
- The number of hash functions to apply to itemsdstMem
- A WritableMemory to hold the initialized filterpublic static BloomFilter initializeBySize(long numBits, int numHashes, long seed, org.apache.datasketches.memory.WritableMemory dstMem)
numBits
- The size of the BloomFilter, in bitsnumHashes
- The number of hash functions to apply to itemsseed
- A base hash seeddstMem
- A WritableMemory to hold the initialized filterCopyright © 2015–2024 The Apache Software Foundation. All rights reserved.