STAG C++  2.0.0
Spectral Toolkit of Algorithms for Graphs
Loading...
Searching...
No Matches
stag::CKNSGaussianKDE Class Reference

#include <kde.h>

Description

A CKNS Gaussian KDE data structure.

This data structure implements the CKNS algorithm for kernel density estimation. Given data \(x_1, \ldots, x_n \in \mathbb{R}^d\), in matrix format, this data structure will preprocess the data in \(O(\epsilon^{-2} n^{1.25})\) time, such that for any query point, a \((1 + \epsilon)\)-approximate kernel density estimate can be returned in \(O(\epsilon^{-2} n^{0.25})\) time.

References
Charikar, Moses, et al. "Kernel density estimation through density constrained near neighbor search." 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 2020.

Public Member Functions

 CKNSGaussianKDE (DenseMat *data, StagReal a, StagReal eps, StagReal min_mu)
 
 CKNSGaussianKDE (DenseMat *data, StagReal a, StagReal eps)
 
 CKNSGaussianKDE (DenseMat *data, StagReal a)
 
 CKNSGaussianKDE (DenseMat *data, StagReal a, StagReal min_mu, StagInt K1, StagReal K2_constant, StagInt sampling_offset)
 
 ExactGaussianKDE (DenseMat *data, StagReal a)
 
std::vector< StagRealquery (DenseMat *q)
 
StagReal query (const stag::DataPoint &q)
 

Constructor & Destructor Documentation

◆ CKNSGaussianKDE() [1/4]

stag::CKNSGaussianKDE::CKNSGaussianKDE ( DenseMat data,
StagReal  a,
StagReal  eps,
StagReal  min_mu 
)

Initialise a new KDE data structure with the given dataset.

Data should be a pointer to a matrix \(X \in \mathbb{R}^{n \times d}\) where each row represents a data point.

The \(\epsilon\) parameter is used to control the error guarantee of the CKNS data structure. A lower value of \(\epsilon\) will give a more accurate estimate, at the cost of a higher processing time. The data structure should produce estimates which are within a \((1 \pm \epsilon)\) factor of the true kernel density.

The initialisation time complexity of the data structure is \(O(\epsilon^{-2} n^{1.25} \log^2(n))\) and the query time for each query point is \((O(\epsilon^{-2} n^{0.25} \log^2(n))\).

Parameters
datapointer to an \((n \times d)\) matrix containing the dataset.
athe parameter \(a\) of the Gaussian kernel function.
eps(optional) the error parameter \(\epsilon\) of the KDE data structure. Default is 0.5.
min_mu(optional) the minimum kernel density value of any query point. A smaller number will give longer preprocessing and query time complexity. If a query point has a kernel density smaller than this value, then the data structure may not return the correct result. Default is 1 / n.

◆ CKNSGaussianKDE() [2/4]

stag::CKNSGaussianKDE::CKNSGaussianKDE ( DenseMat data,
StagReal  a,
StagReal  eps 
)

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

◆ CKNSGaussianKDE() [3/4]

stag::CKNSGaussianKDE::CKNSGaussianKDE ( DenseMat data,
StagReal  a 
)

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

◆ CKNSGaussianKDE() [4/4]

stag::CKNSGaussianKDE::CKNSGaussianKDE ( DenseMat data,
StagReal  a,
StagReal  min_mu,
StagInt  K1,
StagReal  K2_constant,
StagInt  sampling_offset 
)

Initialise a new KDE data structure with the given dataset.

Data should be a pointer to a matrix \(X \in \mathbb{R}^{n \times d}\) where each row represents a data point.

This constructor gives fine-grained control over the constants used by the data structure to control the accuracy and variance of the estimator. For casual application, the simpler constructors will be sufficient, and will select sensible default values for the constants.

Parameters
datapointer to an \((n \times d)\) matrix containing the dataset.
athe parameter \(a\) of the Gaussian kernel function.
min_muthe minimum kernel density value of any query point. A smaller number will give longer preprocessing and query time complexity. If a query point has a kernel density smaller than this value, then the data structure may not return the correct result.
K1the number of copies of the data structure to create in parallel. This parameter controls the variance of the estimator returned by the algorithm. Is is usually set to \(\epsilon^{-2} \cdot \log(n)\).
K2_constantcontrols the collision probability of each of the E2LSH hash tables used within the data structure. A higher value will give more accurate estimates at the cost of higher memory and time complexity. It is usually set to \(0.1 \log(n)\).
sampling_offsetthe CKNS algorithm samples the dataset with various sampling probabilities. Setting a sampling offset of \(k\) will further subsample the data by a factor of \(1/2^k\). This will speed up the algorithm at the cost of some accuracy. It is usually set to \(0\).

Member Function Documentation

◆ ExactGaussianKDE()

stag::CKNSGaussianKDE::ExactGaussianKDE ( DenseMat data,
StagReal  a 
)

Initialise the data structure with the given dataset and Gaussian kernel parameter \(a\).

The initialisation time for this data structure is \(O(1)\).

Parameters
data
a

◆ query() [1/2]

std::vector< StagReal > stag::CKNSGaussianKDE::query ( DenseMat q)

Calculate the KDE value for each of the data points in the query matrix.

The query time complexity for each query point is \(O(n)\).

Parameters
q
Returns
a vector of KDE values.

◆ query() [2/2]

StagReal stag::CKNSGaussianKDE::query ( const stag::DataPoint q)

Calculate the exact kernel density for the given query point.

Note
If you would like to obtain a kernel density for many query points, you should use the other query method to pass them all simultaneously.
Parameters
qthe query data point
Returns
the kernel density for the given query point