STAG C++  1.2.0
Spectral Toolkit of Algorithms for Graphs
Loading...
Searching...
No Matches
cluster.h File Reference

Description

Algorithms for finding clusters in graphs.

The two key clustering methods provided by this module are stag::spectral_cluster and stag::local_cluster.

Functions

std::vector< stag_intstag::spectral_cluster (stag::Graph *graph, stag_int k)
 
std::vector< stag_intstag::local_cluster (stag::LocalGraph *graph, stag_int seed_vertex, double target_volume)
 
std::vector< stag_intstag::local_cluster_acl (stag::LocalGraph *graph, stag_int seed_vertex, double locality, double error)
 
std::vector< stag_intstag::local_cluster_acl (stag::LocalGraph *graph, stag_int seed_vertex, double locality)
 
std::tuple< SprsMat, SprsMatstag::approximate_pagerank (stag::LocalGraph *graph, SprsMat &seed_vector, double alpha, double epsilon)
 
std::vector< stag_intstag::sweep_set_conductance (stag::LocalGraph *graph, SprsMat &vec)
 
double stag::adjusted_rand_index (std::vector< stag_int > &gt_labels, std::vector< stag_int > &labels)
 
double stag::conductance (stag::LocalGraph *graph, std::vector< stag_int > &cluster)
 

Function Documentation

◆ spectral_cluster()

std::vector< stag_int > stag::spectral_cluster ( stag::Graph graph,
stag_int  k 
)

Spectral clustering algorithm.

This is a simple graph clustering method, which provides a clustering of the entire graph. To use spectral clustering, simply pass a stag::Graph object and the number of clusters you would like to find.

#include <iostream>
#include <stag/graph.h>
#include <stag/cluster.h>
int main() {
stag::Graph myGraph = stag::barbell_graph(10);
std::vector<stag_int> clusters = stag::spectral_cluster(&myGraph, 2);
for (auto c : clusters) {
std::cout << c << ", ";
}
std::cout << std::endl;
return 0;
}
The core object used to represent graphs for use with the library.
Definition: graph.h:169

The spectral clustering algorithm has the following steps.

  • Compute the \(k\) smallest eigenvectors of the normalised Laplacian matrix.
  • Embed the vertices into \(\mathbb{R}^k\) according to the eigenvectors.
  • Cluster the vertices into \(k\) clusters using a \(k\)-means clustering algorithm.
Parameters
graphthe graph object to be clustered
kthe number of clusters to find. Should be less than \(n/2\).
Returns
a vector giving the cluster membership for each vertex in the graph
References
A. Ng, M. Jordan, Y. Weiss. On spectral clustering: Analysis and an algorithm. NeurIPS'01

◆ local_cluster()

std::vector< stag_int > stag::local_cluster ( stag::LocalGraph graph,
stag_int  seed_vertex,
double  target_volume 
)

Local clustering algorithm based on personalised Pagerank.

Given a graph and starting vertex, return a cluster which is close to the starting vertex.

This method uses the ACL local clustering algorithm.

Parameters
grapha graph object implementing the LocalGraph interface
seed_vertexthe starting vertex in the graph
target_volumethe approximate volume of the cluster you would like to find
Returns
a vector containing the indices of vectors considered to be in the same cluster as the seed_vertex.
References
R. Andersen, F. Chung, K. Lang. Local graph partitioning using pagerank vectors. FOCS'06

◆ local_cluster_acl() [1/2]

std::vector< stag_int > stag::local_cluster_acl ( stag::LocalGraph graph,
stag_int  seed_vertex,
double  locality,
double  error 
)

The ACL local clustering algorithm. Given a graph and starting vertex, return a cluster close to the starting vertex, constructed in a local way.

The locality parameter is passed as the alpha parameter in the personalised Pagerank calculation.

Parameters
grapha graph object implementing the LocalGraph interface
seed_vertexthe starting vertex in the graph
localitya value in \([0, 1]\) indicating how 'local' the cluster should be. A value of \(1\) will return only the seed vertex, and a value of \(0\) will explore the whole graph.
error(optional) - the acceptable error in the calculation of the approximate pagerank. Default \(0.001\).
Returns
a vector containing the indices of vectors considered to be in the same cluster as the seed_vertex.
References
R. Andersen, F. Chung, K. Lang. Local graph partitioning using pagerank vectors. FOCS'06

◆ local_cluster_acl() [2/2]

std::vector< stag_int > stag::local_cluster_acl ( stag::LocalGraph graph,
stag_int  seed_vertex,
double  locality 
)

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

◆ approximate_pagerank()

std::tuple< SprsMat, SprsMat > stag::approximate_pagerank ( stag::LocalGraph graph,
SprsMat seed_vector,
double  alpha,
double  epsilon 
)

Compute the approximate Pagerank vector.

The parameters seed_vector, alpha, and epsilon are used as described in the ACL paper.

Note that the dimension of the returned vectors may not match the correct number of vertices in the graph provided since the approximate Pagerank is computed locally.

Parameters
grapha stag::LocalGraph object
seed_vectorthe seed vector of the personalised Pagerank
alphathe locality parameter of the personalised Pagerank
epsilonthe error parameter of the personalised Pagerank
Returns
A tuple of sparse column vectors corresponding to
  • p: the approximate Pagerank vector
  • r: the residual vector

By the definition of approximate Pagerank, it holds that p + ppr(r, alpha) = ppr(s, alpha).

Exceptions
std::invalid_argumentif the provided seed_vector is not a column vector.
References
R. Andersen, F. Chung, K. Lang. Local graph partitioning using pagerank vectors. FOCS'06

◆ sweep_set_conductance()

std::vector< stag_int > stag::sweep_set_conductance ( stag::LocalGraph graph,
SprsMat vec 
)

Find the sweep set of the given vector with the minimum conductance.

First, sort the vector such that \(v_1<= \ldots <= v_n\). Then let

\[ S_i = \{v_j : j <= i\} \]

and return the set of original indices corresponding to

\[ \mathrm{argmin}_i \phi(S_i) \]

where \(\phi(S)\) is the conductance of \(S\).

This method is expected to be run on vectors whose support is much less than the total size of the graph. If the total volume of the support of vec is larger than half of the volume of an entire graph, then this method may return unexpected results.

Note that the caller is responsible for any required normalisation of the input vector. In particular, this method does not normalise the vector by the node degrees.

Parameters
grapha stag::LocalGraph object
vecthe vector to sweep over
Returns
a vector containing the indices of vec which give the minimum conductance in the given graph

◆ adjusted_rand_index()

double stag::adjusted_rand_index ( std::vector< stag_int > &  gt_labels,
std::vector< stag_int > &  labels 
)

Compute the Adjusted Rand Index between two label vectors.

Parameters
gt_labelsthe ground truth labels for the dataset
labelsthe candidate labels whose ARI should be calculated
Returns
the ARI between the two labels vectors
References
W. M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association. 66 (336): 846–850. 1971.

◆ conductance()

double stag::conductance ( stag::LocalGraph graph,
std::vector< stag_int > &  cluster 
)

Compute the conductance of the given cluster in a graph.

Given a graph \(G = (V, E)\), the conductance of \(S \subseteq V\) is defined to be

\[ \phi(S) = \frac{w(S, V \setminus S)}{\mathrm{vol}(S)}, \]

where \(\mathrm{vol}(S) = \sum_{v \in S} \mathrm{deg}(v)\) is the volume of \(S\) and \(w(S, V \setminus S)\) is the total weight of edges crossing the cut between \(S\) and \(V \setminus S\).

Parameters
grapha stag::LocalGraph object representing \(G\).
clustera vector of node IDs in \(S\).
Returns
the conductance \(\phi_G(S)\).