![]() |
STAG C++
2.0.0
Spectral Toolkit of Algorithms for Graphs
|
#include <lsh.h>
A Euclidean locality-sensitive hash function.
A function drawn at random from the standard family of Euclidean locality-sensitive hash functions. This function is defined by a random vector \(a \in \mathbb{R}^d\) drawn from a Gaussian distribution, and a random offset \(b \in [0, 4]\) drawn uniformly at random. Applying the function to some data point \(x \in \mathbb{R}^d\) is equivalent to computing
\[ h = \left\lfloor \frac{\langle a, x \rangle + b}{4} \right\rfloor. \]
Given two data points \(x_1\) and \(x_2\), the probability that they are hashed to the same value is given by
\[ p\left(\|x_1 - x_2\|_2\right) = \int_0^4 \frac{1}{\|x_1 - x_2\|_2} f\left(\frac{t}{\|x_1 - x_2\|_2}\right)\left(1 - \frac{t}{4}\right) \mathrm{dt}, \]
where \(f(\cdot)\) is the probability density function of the Gaussian distribution. The stag::LSHFunction::collision_probability function computes this value.
Typical STAG users will use the stag::E2LSH hash table rather than these the LSHFunction class directly.
Public Member Functions | |
LSHFunction (StagUInt dimension) | |
StagInt | apply (const DataPoint &point) |
Static Public Member Functions | |
static StagReal | collision_probability (StagReal distance) |
|
explicit |
Initialise a random LSH function with the given dimension.
dimension | the dimensionality of the data |
Apply this hash function to the given data point.
point | a data point to be hashed |
For two points at a given distance \(c\), compute the probability that they will collide in a random Euclidean LSH function.
This probability is given by
\[ p(c) = \int_0^4 \frac{1}{c} f\left(\frac{t}{c}\right)\left(1 - \frac{t}{4}\right) \mathrm{dt}, \]
where \(f(\cdot)\) is the probability density function of the Gaussian distribution. This is equivalent to
\[ p(c)=-\frac{1}{2\sqrt{2\pi}}\left( c e^{-\frac{8}{c^2}} \right) \left( e^{\frac{8}{c^2}} -1 \right) + \mathrm{erf}\left(\frac{2\sqrt{2}}{c}\right), \]
where \(\mathrm{erf}(\cdot)\) is the error function.
distance | the distance \(c\). |