STAG Python  2.0.2
Spectral Toolkit of Algorithms for Graphs
Loading...
Searching...
No Matches
stag.kde.CKNSGaussianKDE Class Reference

Description

A CKNS Gaussian KDE data structure.

This data structure implements the CKNS algorithm for kernel density estimation. Given data \(x_1, \ldots, x_n \in \mathbb{R}^d\), in matrix format, this data structure will preprocess the data in \(O(\epsilon^{-2} n^{1.25})\) time, such that for any query point, a \((1 + \epsilon)\)-approximate kernel density estimate can be returned in \(O(\epsilon^{-2} n^{0.25})\) time.

References
Charikar, Moses, et al. "Kernel density estimation through density constrained near neighbor search." 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 2020.

Public Member Functions

def __init__ (self, stag.utility.DenseMat data, float a, float eps=1, float min_mu=None, int k1=None, float k2_constant=None, int sampling_offset=0)
 Initialise a new KDE data structure with the given dataset.
 
Union[float, np.ndarray] query (self, Union[stag.utility.DenseMat, stag.data.DataPoint] q)
 Calculate the KDE estimate for the given query points.
 

Public Attributes

 internal_ckns
 

Constructor & Destructor Documentation

◆ __init__()

def stag.kde.CKNSGaussianKDE.__init__ (   self,
stag.utility.DenseMat  data,
float  a,
float   eps = 1,
float   min_mu = None,
int   k1 = None,
float   k2_constant = None,
int   sampling_offset = 0 
)

Initialise a new KDE data structure with the given dataset.

Data should be a stag.utility.DenseMat matrix \(X \in \mathbb{R}^{n \times d}\) where each row represents a data point.

The \(\epsilon\) parameter is used to control the error guarantee of the CKNS data structure. A lower value of \(\epsilon\) will give a more accurate estimate, at the cost of a higher processing time. The data structure should produce estimates which are within a \((1 \pm \epsilon)\) factor of the true kernel density.

The initialisation time complexity of the data structure is \(O(\epsilon^{-2} n^{1.25} \log^2(n))\) and the query time for each query point is \((O(\epsilon^{-2} n^{0.25} \log^2(n))\).

Usually, the data structure can be initialised with only the dataset and the parameter \(a\). If more control is needed, the eps and min_mu parameters offer a trade-off between running time and accuracy.

For those familiar with the inner workings of the CKNS algorithm, the k1, k2_constant, and sampling_offset optional parameters offer even more fine-grained control over the performance of the algorithm.

Parameters
datathe \((n \times d)\) matrix containing the dataset.
athe parameter \(a\) of the Gaussian kernel function.
eps(optional) the error parameter \(\epsilon\) of the KDE data structure. Default is 1 (in practice this normally gives a good result).
min_mu(optional) the minimum kernel density value of any query point. A smaller number will give longer preprocessing and query time complexity. If a query point has a kernel density smaller than this value, then the data structure may not return the correct result. Default is 1 / n.
k1(optional) the number of copies of the data structure to create in parallel. This parameter controls the variance of the estimator returned by the algorithm. Default is \(\epsilon^{-2} \cdot \log(n)\).
k2_constant(optional) controls the collision probability of each of the E2LSH hash tables used within the data structure. A higher value will give more accurate estimates at the cost of higher memory and time complexity. Default is \(0.1 \log(n)\).
sampling_offset(optional) the CKNS algorithm samples the dataset with various sampling probabilities. Setting a sampling offset of \(k\) will further subsample the data by a factor of \(1/2^k\). This will speed up the algorithm at the cost of some accuracy. Default is \(0\).

Member Function Documentation

◆ query()

Union[float, np.ndarray] stag.kde.CKNSGaussianKDE.query (   self,
Union[stag.utility.DenseMat, stag.data.DataPoint q 
)

Calculate the KDE estimate for the given query points.

The parameter q can be either a stag.data.DataPoint object to query one data point, or a stag.utility.DenseMat matrix with the query points as rows in order to query many data points.

When querying many data points, passing a DenseMat will be more efficient than calling this method once for each data point.

Parameters
qthe query data point(s)
Returns
the KDE estimate(s) for the given query point(s), either as a float (for one data point) or a numpy array.

Member Data Documentation

◆ internal_ckns

stag.kde.CKNSGaussianKDE.internal_ckns