STAG C++  2.0.0
Spectral Toolkit of Algorithms for Graphs
Loading...
Searching...
No Matches
stag::LSHFunction Class Reference

#include <lsh.h>

Description

A Euclidean locality-sensitive hash function.

A function drawn at random from the standard family of Euclidean locality-sensitive hash functions. This function is defined by a random vector \(a \in \mathbb{R}^d\) drawn from a Gaussian distribution, and a random offset \(b \in [0, 4]\) drawn uniformly at random. Applying the function to some data point \(x \in \mathbb{R}^d\) is equivalent to computing

\[ h = \left\lfloor \frac{\langle a, x \rangle + b}{4} \right\rfloor. \]

Given two data points \(x_1\) and \(x_2\), the probability that they are hashed to the same value is given by

\[ p\left(\|x_1 - x_2\|_2\right) = \int_0^4 \frac{1}{\|x_1 - x_2\|_2} f\left(\frac{t}{\|x_1 - x_2\|_2}\right)\left(1 - \frac{t}{4}\right) \mathrm{dt}, \]

where \(f(\cdot)\) is the probability density function of the Gaussian distribution. The stag::LSHFunction::collision_probability function computes this value.

Typical STAG users will use the stag::E2LSH hash table rather than these the LSHFunction class directly.

Public Member Functions

 LSHFunction (StagUInt dimension)
 
StagInt apply (const DataPoint &point)
 

Static Public Member Functions

static StagReal collision_probability (StagReal distance)
 

Constructor & Destructor Documentation

◆ LSHFunction()

stag::LSHFunction::LSHFunction ( StagUInt  dimension)
explicit

Initialise a random LSH function with the given dimension.

Parameters
dimensionthe dimensionality of the data

Member Function Documentation

◆ apply()

StagInt stag::LSHFunction::apply ( const DataPoint point)

Apply this hash function to the given data point.

Parameters
pointa data point to be hashed
Returns
an integer indicating which hash bucket this data point was hashed to

◆ collision_probability()

static StagReal stag::LSHFunction::collision_probability ( StagReal  distance)
static

For two points at a given distance \(c\), compute the probability that they will collide in a random Euclidean LSH function.

This probability is given by

\[ p(c) = \int_0^4 \frac{1}{c} f\left(\frac{t}{c}\right)\left(1 - \frac{t}{4}\right) \mathrm{dt}, \]

where \(f(\cdot)\) is the probability density function of the Gaussian distribution. This is equivalent to

\[ p(c)=-\frac{1}{2\sqrt{2\pi}}\left( c e^{-\frac{8}{c^2}} \right) \left( e^{\frac{8}{c^2}} -1 \right) + \mathrm{erf}\left(\frac{2\sqrt{2}}{c}\right), \]

where \(\mathrm{erf}(\cdot)\) is the error function.

Parameters
distancethe distance \(c\).
Returns
the collision probability of two points at distance \(c\).