public abstract class CompressedSizeEstimator extends Object
Modifier and Type | Method and Description |
---|---|
List<CompressedSizeInfoColGroup> |
computeCompressedSizeInfos(Collection<int[]> columnLists)
Compression Size info from list of specified columns
|
List<CompressedSizeInfoColGroup> |
computeCompressedSizeInfos(Collection<int[]> columnLists,
int k)
Multi threaded version of extracting Compression Size info from list of specified columns
|
CompressedSizeInfo |
computeCompressedSizeInfos(int k)
Multi threaded version of extracting Compression Size info
|
CompressedSizeInfoColGroup |
estimateCompressedColGroupSize()
Method used for compressing into one type of colGroup
|
EstimationFactors |
estimateCompressedColGroupSize(ABitmap ubm,
int[] colIndexes)
Method used to extract the CompressedSizeEstimationFactors from an constructed UncompressedBitmap.
|
static EstimationFactors |
estimateCompressedColGroupSize(ABitmap ubm,
int[] colIndexes,
int nrRows,
CompressionSettings cs) |
CompressedSizeInfoColGroup |
estimateCompressedColGroupSize(int[] colIndexes)
Method for extracting Compressed Size Info of specified columns, together in a single ColGroup
|
abstract CompressedSizeInfoColGroup |
estimateCompressedColGroupSize(int[] colIndexes,
int estimate,
int nrUniqueUpperBound)
A method to extract the Compressed Size Info for a given list of columns, This method further limits the estimated
number of unique values, since in some cases the estimated number of uniques is estimated higher than the number
estimated in sub groups of the given colIndexes.
|
CompressedSizeInfoColGroup |
estimateJoinCompressedSize(CompressedSizeInfoColGroup g1,
CompressedSizeInfoColGroup g2)
Join two analyzed column groups together.
|
CompressedSizeInfoColGroup |
estimateJoinCompressedSize(int[] joined,
CompressedSizeInfoColGroup g1,
CompressedSizeInfoColGroup g2)
Join two analyzed column groups together.
|
MatrixBlock |
getData() |
int |
getNumColumns() |
int |
getNumRows() |
public int getNumRows()
public int getNumColumns()
public MatrixBlock getData()
public CompressedSizeInfo computeCompressedSizeInfos(int k)
k
- The concurrency degree.public List<CompressedSizeInfoColGroup> computeCompressedSizeInfos(Collection<int[]> columnLists, int k)
columnLists
- The specified columns to extract.k
- The parallelization degreepublic List<CompressedSizeInfoColGroup> computeCompressedSizeInfos(Collection<int[]> columnLists)
columnLists
- The specified columns to extract.public CompressedSizeInfoColGroup estimateCompressedColGroupSize()
public CompressedSizeInfoColGroup estimateCompressedColGroupSize(int[] colIndexes)
colIndexes
- The columns to group together inside a ColGrouppublic abstract CompressedSizeInfoColGroup estimateCompressedColGroupSize(int[] colIndexes, int estimate, int nrUniqueUpperBound)
colIndexes
- The columns to extract compression information fromestimate
- An estimate of number of unique elements in these columnsnrUniqueUpperBound
- The upper bound of unique elements allowed in the estimate, can be calculated from the
number of unique elements estimated in sub columns multiplied together. This is flexible
in the sense that if the sample is small then this unique can be manually edited like in
CoCodeCostMatrixMult.public final CompressedSizeInfoColGroup estimateJoinCompressedSize(CompressedSizeInfoColGroup g1, CompressedSizeInfoColGroup g2)
g1
- First groupg2
- Second grouppublic CompressedSizeInfoColGroup estimateJoinCompressedSize(int[] joined, CompressedSizeInfoColGroup g1, CompressedSizeInfoColGroup g2)
joined
- The joined column indexes.g1
- First groupg2
- Second grouppublic EstimationFactors estimateCompressedColGroupSize(ABitmap ubm, int[] colIndexes)
ubm
- The UncompressedBitmap, either extracted from a sample or from the entire datasetcolIndexes
- The columns that is compressed together.public static EstimationFactors estimateCompressedColGroupSize(ABitmap ubm, int[] colIndexes, int nrRows, CompressionSettings cs)
Copyright © 2022 The Apache Software Foundation. All rights reserved.