STAG Python  2.0.2
Spectral Toolkit of Algorithms for Graphs
Loading...
Searching...
No Matches
stag.lsh.LSHFunction Class Reference

Description

A Euclidean locality-sensitive hash function.

A function drawn at random from the standard family of Euclidean locality-sensitive hhash functions. This function is defined by a random vector \(a \in \mathbb{R}^d\) drawn from a Gaussian distribution, and a random offset \(b \in [0, 4]\) drawn uniformly at random. Applying the function to some data point \(x \in \mathbb{R}^d\) is equivalent to computing

\[ h = \left\lfloor \frac{\langle a, x \rangle + b}{4} \right\rfloor. \]

Given two data points \(x_1\) and \(x_2\), the probability that they are hashed to the same value is given by

\[ p\left(\|x_1 - x_2\|_2\right) = \int_0^4 \frac{1}{\|x_1 - x_2\|_2} f\left(\frac{t}{\|x_1 - x_2\|_2}\right)\left(1 - \frac{t}{4}\right) \mathrm{dt}, \]

where \(f(\cdot)\) is the probability density function of the Gaussian distribution. The stag.lsh.LSHFunction.collision_probability function computes this value.

Typical STAG users will use the E2LSH hash table rather than these the LSHFunction class directly.

Public Member Functions

def __init__ (self, int dimension)
 Initialise a random LSH function with the given dimension.
 
int apply (self, Union[np.ndarray, data.DataPoint] point)
 Apply this hash function to the given data point.
 

Static Public Member Functions

float collision_probability (float distance)
 For two points at a given distance \(c\), compute the probability that they will collide in a random Euclidean LSH function.
 

Constructor & Destructor Documentation

◆ __init__()

def stag.lsh.LSHFunction.__init__ (   self,
int  dimension 
)

Initialise a random LSH function with the given dimension.

Parameters
dimensionthe dimensionality of the data

Member Function Documentation

◆ apply()

int stag.lsh.LSHFunction.apply (   self,
Union[np.ndarray, data.DataPoint point 
)

Apply this hash function to the given data point.

:param point: a numpy array or stag.data.DataPoint :return: the hashed value of this point

◆ collision_probability()

float stag.lsh.LSHFunction.collision_probability ( float  distance)
static

For two points at a given distance \(c\), compute the probability that they will collide in a random Euclidean LSH function.

This probability is given by

\[ p(c) = \int_0^4 \frac{1}{c} f\left(\frac{t}{c}\right)\left(1 - \frac{t}{4}\right) \mathrm{dt}, \]

where \(f(\cdot)\) is the probability density function of the Gaussian distribution. This is equivalent to

\[ p(c)=-\frac{1}{2\sqrt{2\pi}}\left( c e^{-\frac{8}{c^2}} \right) \left( e^{\frac{8}{c^2}} -1 \right) + \mathrm{erf}\left(\frac{2\sqrt{2}}{c}\right), \]

where \(\mathrm{erf}(\cdot)\) is the error function.

Parameters
distancethe distance \(c\).
Returns
the collision probability of two points at distance \(c\).