![]() |
STAG Python
2.0.2
Spectral Toolkit of Algorithms for Graphs
|
A CKNS Gaussian KDE data structure.
This data structure implements the CKNS algorithm for kernel density estimation. Given data \(x_1, \ldots, x_n \in \mathbb{R}^d\), in matrix format, this data structure will preprocess the data in \(O(\epsilon^{-2} n^{1.25})\) time, such that for any query point, a \((1 + \epsilon)\)-approximate kernel density estimate can be returned in \(O(\epsilon^{-2} n^{0.25})\) time.
Public Member Functions | |
def | __init__ (self, stag.utility.DenseMat data, float a, float eps=1, float min_mu=None, int k1=None, float k2_constant=None, int sampling_offset=0) |
Initialise a new KDE data structure with the given dataset. | |
Union[float, np.ndarray] | query (self, Union[stag.utility.DenseMat, stag.data.DataPoint] q) |
Calculate the KDE estimate for the given query points. | |
Public Attributes | |
internal_ckns | |
def stag.kde.CKNSGaussianKDE.__init__ | ( | self, | |
stag.utility.DenseMat | data, | ||
float | a, | ||
float | eps = 1 , |
||
float | min_mu = None , |
||
int | k1 = None , |
||
float | k2_constant = None , |
||
int | sampling_offset = 0 |
||
) |
Initialise a new KDE data structure with the given dataset.
Data should be a stag.utility.DenseMat matrix \(X \in \mathbb{R}^{n \times d}\) where each row represents a data point.
The \(\epsilon\) parameter is used to control the error guarantee of the CKNS data structure. A lower value of \(\epsilon\) will give a more accurate estimate, at the cost of a higher processing time. The data structure should produce estimates which are within a \((1 \pm \epsilon)\) factor of the true kernel density.
The initialisation time complexity of the data structure is \(O(\epsilon^{-2} n^{1.25} \log^2(n))\) and the query time for each query point is \((O(\epsilon^{-2} n^{0.25} \log^2(n))\).
Usually, the data structure can be initialised with only the dataset and the parameter \(a\). If more control is needed, the eps and min_mu parameters offer a trade-off between running time and accuracy.
For those familiar with the inner workings of the CKNS algorithm, the k1, k2_constant, and sampling_offset optional parameters offer even more fine-grained control over the performance of the algorithm.
data | the \((n \times d)\) matrix containing the dataset. |
a | the parameter \(a\) of the Gaussian kernel function. |
eps | (optional) the error parameter \(\epsilon\) of the KDE data structure. Default is 1 (in practice this normally gives a good result). |
min_mu | (optional) the minimum kernel density value of any query point. A smaller number will give longer preprocessing and query time complexity. If a query point has a kernel density smaller than this value, then the data structure may not return the correct result. Default is 1 / n. |
k1 | (optional) the number of copies of the data structure to create in parallel. This parameter controls the variance of the estimator returned by the algorithm. Default is \(\epsilon^{-2} \cdot \log(n)\). |
k2_constant | (optional) controls the collision probability of each of the E2LSH hash tables used within the data structure. A higher value will give more accurate estimates at the cost of higher memory and time complexity. Default is \(0.1 \log(n)\). |
sampling_offset | (optional) the CKNS algorithm samples the dataset with various sampling probabilities. Setting a sampling offset of \(k\) will further subsample the data by a factor of \(1/2^k\). This will speed up the algorithm at the cost of some accuracy. Default is \(0\). |
Union[float, np.ndarray] stag.kde.CKNSGaussianKDE.query | ( | self, | |
Union[stag.utility.DenseMat, stag.data.DataPoint] | q | ||
) |
Calculate the KDE estimate for the given query points.
The parameter q can be either a stag.data.DataPoint object to query one data point, or a stag.utility.DenseMat matrix with the query points as rows in order to query many data points.
When querying many data points, passing a DenseMat will be more efficient than calling this method once for each data point.
q | the query data point(s) |
stag.kde.CKNSGaussianKDE.internal_ckns |