![]() |
STAG C++
2.0.0
Spectral Toolkit of Algorithms for Graphs
|
#include <lsh.h>
A Euclidean locality sensitive hash table.
The E2LSH hash table is constructed with some set of data points, which are hashed with several copies of the stag::LSHFunction.
Then, for any query point, the data structure returns the points in the original dataset which are close to the query point. The probability that a given point \(x\) in the data set is returned for query \(q\) is dependent on the distance between \(q\) and \(x\).
The E2LSH hash table takes two parameters, K and L, which control the probability that two points will collide in the hash table. For query point \(q\), a data point \(x\) at distance \(c \in \mathbb{R}\) from \(q\) is returned with probability
\[ 1 - (1 - p(c)^K)^L, \]
where \(p(c)\) is the probability that a single stag::LSHFunction will hash \(q\) and \(x\) to the same value. This probability can be computed with the stag::E2LSH::collision_probability method.
Larger values of K and L will increase both the construction and query time of the hash table.
Public Member Functions | |
E2LSH (StagUInt K, StagUInt L, std::vector< DataPoint > &dataSet) | |
std::vector< DataPoint > | get_near_neighbors (const DataPoint &query) |
StagReal | collision_probability (StagReal distance) |
Static Public Member Functions | |
static StagReal | collision_probability (StagUInt K, StagUInt L, StagReal distance) |
Initialise the E2LSH hash table.
K | parameter K of the hash table |
L | parameter L of the hash table |
dataSet | a pointer to the dataSet to be hashed into the hash table. The actual data should be stored and controlled by the calling code, and this vector of data point pointers will be used by the LSH table. |
Query the LSH table to find the near neighbors of a given query point.
Each point in the dataset will be returned with some probability dependent on the distance to the query point and the parameters K and L.
query | the data point to be queried. |
Compute the probability that a data point at a given distance from a query point will be returned by this hash table.
distance | the distance between a query point and data point |
|
static |
For a given value of K and L, return the probability that a point at a given distance from a query point will be returned by an E2LSH table with the given parameters.
K | the parameter K of the ELSH table |
L | the parameter L of the E2LSH table |
distance | the distance between a query point and data point |