![]() |
STAG C++
2.0.0
Spectral Toolkit of Algorithms for Graphs
|
#include <kde.h>
A CKNS Gaussian KDE data structure.
This data structure implements the CKNS algorithm for kernel density estimation. Given data \(x_1, \ldots, x_n \in \mathbb{R}^d\), in matrix format, this data structure will preprocess the data in \(O(\epsilon^{-2} n^{1.25})\) time, such that for any query point, a \((1 + \epsilon)\)-approximate kernel density estimate can be returned in \(O(\epsilon^{-2} n^{0.25})\) time.
Public Member Functions | |
CKNSGaussianKDE (DenseMat *data, StagReal a, StagReal eps, StagReal min_mu) | |
CKNSGaussianKDE (DenseMat *data, StagReal a, StagReal eps) | |
CKNSGaussianKDE (DenseMat *data, StagReal a) | |
CKNSGaussianKDE (DenseMat *data, StagReal a, StagReal min_mu, StagInt K1, StagReal K2_constant, StagInt sampling_offset) | |
ExactGaussianKDE (DenseMat *data, StagReal a) | |
std::vector< StagReal > | query (DenseMat *q) |
StagReal | query (const stag::DataPoint &q) |
stag::CKNSGaussianKDE::CKNSGaussianKDE | ( | DenseMat * | data, |
StagReal | a, | ||
StagReal | eps, | ||
StagReal | min_mu | ||
) |
Initialise a new KDE data structure with the given dataset.
Data should be a pointer to a matrix \(X \in \mathbb{R}^{n \times d}\) where each row represents a data point.
The \(\epsilon\) parameter is used to control the error guarantee of the CKNS data structure. A lower value of \(\epsilon\) will give a more accurate estimate, at the cost of a higher processing time. The data structure should produce estimates which are within a \((1 \pm \epsilon)\) factor of the true kernel density.
The initialisation time complexity of the data structure is \(O(\epsilon^{-2} n^{1.25} \log^2(n))\) and the query time for each query point is \((O(\epsilon^{-2} n^{0.25} \log^2(n))\).
data | pointer to an \((n \times d)\) matrix containing the dataset. |
a | the parameter \(a\) of the Gaussian kernel function. |
eps | (optional) the error parameter \(\epsilon\) of the KDE data structure. Default is 0.5. |
min_mu | (optional) the minimum kernel density value of any query point. A smaller number will give longer preprocessing and query time complexity. If a query point has a kernel density smaller than this value, then the data structure may not return the correct result. Default is 1 / n. |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
stag::CKNSGaussianKDE::CKNSGaussianKDE | ( | DenseMat * | data, |
StagReal | a, | ||
StagReal | min_mu, | ||
StagInt | K1, | ||
StagReal | K2_constant, | ||
StagInt | sampling_offset | ||
) |
Initialise a new KDE data structure with the given dataset.
Data should be a pointer to a matrix \(X \in \mathbb{R}^{n \times d}\) where each row represents a data point.
This constructor gives fine-grained control over the constants used by the data structure to control the accuracy and variance of the estimator. For casual application, the simpler constructors will be sufficient, and will select sensible default values for the constants.
data | pointer to an \((n \times d)\) matrix containing the dataset. |
a | the parameter \(a\) of the Gaussian kernel function. |
min_mu | the minimum kernel density value of any query point. A smaller number will give longer preprocessing and query time complexity. If a query point has a kernel density smaller than this value, then the data structure may not return the correct result. |
K1 | the number of copies of the data structure to create in parallel. This parameter controls the variance of the estimator returned by the algorithm. Is is usually set to \(\epsilon^{-2} \cdot \log(n)\). |
K2_constant | controls the collision probability of each of the E2LSH hash tables used within the data structure. A higher value will give more accurate estimates at the cost of higher memory and time complexity. It is usually set to \(0.1 \log(n)\). |
sampling_offset | the CKNS algorithm samples the dataset with various sampling probabilities. Setting a sampling offset of \(k\) will further subsample the data by a factor of \(1/2^k\). This will speed up the algorithm at the cost of some accuracy. It is usually set to \(0\). |
Initialise the data structure with the given dataset and Gaussian kernel parameter \(a\).
The initialisation time for this data structure is \(O(1)\).
data | |
a |
Calculate the KDE value for each of the data points in the query matrix.
The query time complexity for each query point is \(O(n)\).
q |
StagReal stag::CKNSGaussianKDE::query | ( | const stag::DataPoint & | q | ) |
Calculate the exact kernel density for the given query point.
q | the query data point |