STAG Python  2.0.2
Spectral Toolkit of Algorithms for Graphs
Loading...
Searching...
No Matches
stag.cluster Namespace Reference

Functions

np.ndarray spectral_cluster (graph.Graph g, int k)
 Spectral clustering algorithm.
 
np.ndarray cheeger_cut (graph.Graph g)
 Find the Cheeger cut in a graph.
 
np.ndarray local_cluster (graph.LocalGraph g, int seed_vertex, float target_volume)
 Local clustering algorithm based on personalised Pagerank.
 
np.ndarray local_cluster_acl (graph.LocalGraph g, int seed_vertex, float locality, float error=0.001)
 The ACL local clustering algorithm.
 
Tuple[utility.SprsMat, utility.SprsMatapproximate_pagerank (graph.LocalGraph g, utility.SprsMat seed_vector, float alpha, float epsilon)
 Compute the approximate pagerank vector.
 
np.ndarray sweep_set_conductance (graph.LocalGraph g, utility.SprsMat v)
 Find the sweep set of the given vector with the minimum conductance.
 
np.ndarray connected_component (graph.LocalGraph g, int v)
 Return the vertex indices of every vertex in the same connected component as the specified vertex.
 
List[np.ndarray] connected_components (graph.Graph g)
 Return a list of the connected components in the specified graph.
 
float adjusted_rand_index (np.ndarray gt_labels, np.ndarray labels)
 Compute the Adjusted Rand Index between two label vectors.
 
float mutual_information (np.ndarray gt_labels, np.ndarray labels)
 Compute the Mutual Information between two label vectors.
 
float normalised_mutual_information (np.ndarray gt_labels, np.ndarray labels)
 Compute the Normalised Mutual Information between two label vectors.
 
float conductance (graph.LocalGraph g, np.ndarray cluster)
 Compute the conductance of the given cluster in a graph.
 
np.ndarray symmetric_difference (np.ndarray s, np.ndarray t)
 Compute the symmetric difference of two sets of integers.
 
graph.Graph approximate_similarity_graph (utility.DenseMat data, float a)
 Construct an approximate similarity graph for the given dataset.
 
graph.Graph similarity_graph (utility.DenseMat data, float a)
 Construct a complete similarity graph for the given dataset.
 

Function Documentation

◆ spectral_cluster()

np.ndarray stag.cluster.spectral_cluster ( graph.Graph  g,
int  k 
)

Spectral clustering algorithm.

This is a simple graph clustering method, which provides a clustering of the entire graph. To use spectral clustering, simply pass a stag.graph.Graph object and the number of clusters you would like to find.

import stag.graph
myGraph = stag.graph.Graph.barbell_graph(10)
labels = stag.cluster.spectral_cluster(myGraph, 2)
print(labels)
Definition: cluster.py:1
np.ndarray spectral_cluster(graph.Graph g, int k)
Spectral clustering algorithm.
Definition: cluster.py:39
Definition: graph.py:1

The spectral clustering algorithm has the following steps.

  • Compute the \(k\) smallest eigenvectors of the normalised Laplacian matrix.
  • Embed the vertices into \(\mathbb{R}^k\) according to the eigenvectors.
  • Cluster the vertices into \(k\) clusters using a \(k\)-means clustering algorithm.
Parameters
gthe graph object to be clustered
kthe number of clusters to find. Should be less than \(n/2\).
Returns
an array ints giving the cluster membership for each vertex in the graph
References
A. Ng, M. Jordan, Y. Weiss. On spectral clustering: Analysis and an algorithm. NeurIPS'01

◆ cheeger_cut()

np.ndarray stag.cluster.cheeger_cut ( graph.Graph  g)

Find the Cheeger cut in a graph.

Let \(G = (V, E)\) be a graph and \(\mathcal{L}\) be its normalised Laplacian matrix with eigenvalues \(0 = \lambda_1 \leq \lambda_2 \leq \ldots \leq \lambda_n\). Then, Cheeger's inequality states that

\[ \frac{\lambda_2}{2} \leq \Phi_G \leq \sqrt{2 \lambda_2}, \]

where

\[ \Phi_G = \min_{S \subset V} \phi(S) \]

is the conductance of \(G\). The proof of Cheeger's inequality is constructive: by computing the eigenvector corresponding to \(\lambda_2\), and performing the sweep set operation, we are able to find a set \(S\) with conductance close to the optimal. The partition returned by this algorithm is called the 'Cheeger cut' of the graph.

Parameters
gthe graph object to be partitioned
Returns
An array giving the cluster membership for each vertex in the graph. Each entry in the array is either \(0\) or \(1\) to indicate which side of the cut the vertex belongs to.

◆ local_cluster()

np.ndarray stag.cluster.local_cluster ( graph.LocalGraph  g,
int  seed_vertex,
float  target_volume 
)

Local clustering algorithm based on personalised Pagerank.

Given a graph and starting vertex, return a cluster which is close to the starting vertex.

This method uses the ACL local clustering algorithm.

Parameters
ga graph object implementing the LocalGraph interface
seed_vertexthe starting vertex in the graph
target_volumethe approximate volume of the cluster you would like to find
Returns
an array containing the indices of vertices considered to be in the same cluster as the seed_vertex.
References
R. Andersen, F. Chung, K. Lang. Local graph partitioning using pagerank vectors. FOCS'06

◆ local_cluster_acl()

np.ndarray stag.cluster.local_cluster_acl ( graph.LocalGraph  g,
int  seed_vertex,
float  locality,
float   error = 0.001 
)

The ACL local clustering algorithm.

Given a graph and starting vertex, returns a cluster close to the starting vertex, constructed in a local way.

The locality parameter is passed as the alpha parameter in the personalised pagerank calculation.

Parameters
ga graph object implementing the LocalGraph interface
seed_vertexthe starting vertex in the graph
localitya value in \([0, 1]\) indicating how 'local' the cluster should be. A value of \(1\) will return the return only the seed vertex and a value of \(0\) will explore the whole graph.
error(optional) - the acceptable error in the calculation of the approximate pagerank. Default \(0.001\).
Returns
an array containing the indices of vertices considered to be in the same cluster as the seed_vertex.
References
R. Andersen, F. Chung, K. Lang. Local graph partitioning using pagerank vectors. FOCS'06

◆ approximate_pagerank()

Tuple[utility.SprsMat, utility.SprsMat] stag.cluster.approximate_pagerank ( graph.LocalGraph  g,
utility.SprsMat  seed_vector,
float  alpha,
float  epsilon 
)

Compute the approximate pagerank vector.

The parameters s, alpha, and epsilon are used as described in the ACL paper.

Note that the dimension of the returned vectors may not match the true number of vertices in the graph provided since the approximate pagerank is computed locally.

Parameters
ga stag.graph.LocalGraph object
seed_vectorthe seed vector of the personalised pagerank
alphathe locality parameter of the personalised pagerank
epsilonthe error parameter of the personalised pagerank
Returns
A tuple of sparse column vectors corresponding to
  • p: the approximate pagerank vector
  • r: the residual vector

By the definition of approximate pagerank, it is the case that p + ppr(r, alpha) = ppr(s, alpha).

Exceptions
argument_errorif the provided seed_vector is not a column vector.
References
R. Andersen, F. Chung, K. Lang. Local graph partitioning using pagerank vectors. FOCS'06

◆ sweep_set_conductance()

np.ndarray stag.cluster.sweep_set_conductance ( graph.LocalGraph  g,
utility.SprsMat  v 
)

Find the sweep set of the given vector with the minimum conductance.

First, sort the vector such that \(v_1, \ldots, v_n\). Then let

\[ S_i = \{v_j : j <= i\} \]

and return the set of original indices corresponding to

\[ \mathrm{argmin}_i \phi(S_i) \]

where \(\phi(S)\) is the conductance of \(S\).

This method is expected to be run on vectors whose support is much less than the total size of the graph. If the total volume of the support of vec is larger than half of the volume of the total graph, then this method may return unexpected results.

Note that the caller is responsible for any required normalisation of the input vector. In particular, this method does not normalise the vector by the node degrees.

Parameters
ga stag.graph.LocalGraph object
vthe vector to sweep over
Returns
a vector containing the indices of vec which give the minimum conductance in the given graph

◆ connected_component()

np.ndarray stag.cluster.connected_component ( graph.LocalGraph  g,
int  v 
)

Return the vertex indices of every vertex in the same connected component as the specified vertex.

The running time of this method is proportional to the size of the returned connected component.

The returned array is not sorted.

Parameters
ga stag.graph.LocalGraph object
va vertex of the graph
Returns
an array containing the vertex ids of every vertex in the connected component corresponding to v

◆ connected_components()

List[np.ndarray] stag.cluster.connected_components ( graph.Graph  g)

Return a list of the connected components in the specified graph.

Parameters
ga stag.graph.Graph object
Returns
a list containing the connected components of the graph

◆ adjusted_rand_index()

float stag.cluster.adjusted_rand_index ( np.ndarray  gt_labels,
np.ndarray  labels 
)

Compute the Adjusted Rand Index between two label vectors.

Parameters
gt_labelsthe ground truth labels for the dataset
labelsthe candidate labels whose ARI should be calculated
Returns
the ARI between the two labels vectors
References
W. M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association. 66 (336): 846–850. 1971.

◆ mutual_information()

float stag.cluster.mutual_information ( np.ndarray  gt_labels,
np.ndarray  labels 
)

Compute the Mutual Information between two label vectors.

Parameters
gt_labelsthe ground truth labels for the dataset
labelsthe candidate labels whose MI should be calculated
Returns
the MI between the two labels vectors

◆ normalised_mutual_information()

float stag.cluster.normalised_mutual_information ( np.ndarray  gt_labels,
np.ndarray  labels 
)

Compute the Normalised Mutual Information between two label vectors.

Parameters
gt_labelsthe ground truth labels for the dataset
labelsthe candidate labels whose NMI should be calculated
Returns
the NMI between the two labels vectors
References
Vinh, Epps, and Bailey, (2009). Information theoretic measures for clusterings comparison. 26th Annual International Conference on Machine Learning (ICML ‘09).

◆ conductance()

float stag.cluster.conductance ( graph.LocalGraph  g,
np.ndarray  cluster 
)

Compute the conductance of the given cluster in a graph.

Given a graph \(G = (V, E)\), the conductance of \(S \subseteq V\) is defined to be

\[ \phi(S) = \frac{w(S, V \setminus S)}{\mathrm{vol}(S)}, \]

where \(\mathrm{vol}(S) = \sum_{v \in S} \mathrm{deg}(v)\) is the volume of \(S\) and \(w(S, V \setminus S)\) is the total weight of edges crossing the cut between \(S\) and \(V \setminus S\).

Parameters
ga stag.graph.LocalGraph object representing \(G\).
clusteran array of node IDs in \(S\).
Returns
the conductance \(\phi_G(S)\).

◆ symmetric_difference()

np.ndarray stag.cluster.symmetric_difference ( np.ndarray  s,
np.ndarray  t 
)

Compute the symmetric difference of two sets of integers.

Given sets \(S\) and \(T\), the symmetric difference \(S \triangle T\) is defined to be

\[ S \triangle T = \{S \setminus T\} \cup \{T \setminus S\}. \]

Although \(S\) and \(T\) are provided as lists, they are treated as sets and any duplicates will be ignored.

Parameters
san array containing the first set of integers
tan array containing the second set of integers
Returns
an array containing the vertices in the symmetric difference of \(S\) and \(T\).

◆ approximate_similarity_graph()

graph.Graph stag.cluster.approximate_similarity_graph ( utility.DenseMat  data,
float  a 
)

Construct an approximate similarity graph for the given dataset.

Given datapoints \(\{x_1, \ldots, x_n\} \in \mathbb{R}^n\) and a parameter \(a\), the similarity between two data points is given by

\[ k(x_i, x_j) = \mathrm{exp}\left(- a \|x_i - x_j\|^2 \right). \]

Then, the similarity graph of the data is a complete graph on \(n\) vertices such that the weight between vertex \(i\) and \(j\) is given by \(k(x_i, x_j)\). However, the complete similarity graph requires \(O(n^2)\) time and space to construct.

This method implements an algorithm which approximates the similarity graph with a sparse graph, while preserving any cluster structure of the graph. This algorithm has running time \(\widetilde{O}(n^{1.25})\).

Parameters
dataan \(n \times d\) matrix representing the dataset.
athe parameter of the similarity kernel.
Returns
a stag.graph.Graph object representing the similarity of the data
Reference
Peter Macgregor and He Sun, Fast Approximation of Similarity Graphs with Kernel Density Estimation. In NeurIPS'23.

◆ similarity_graph()

graph.Graph stag.cluster.similarity_graph ( utility.DenseMat  data,
float  a 
)

Construct a complete similarity graph for the given dataset.

Given datapoints \(\{x_1, \ldots, x_n\} \in \mathbb{R}^n\) and a parameter \(a\), the similarity between two data points is given by

\[ k(x_i, x_j) = \mathrm{exp}\left(- a \|x_i - x_j\|^2 \right). \]

Then, the similarity graph of the data is a complete graph on \(n\) vertices such that the weight between vertex \(i\) and \(j\) is given by \(k(x_i, x_j)\).

Note that the time and space complexity of this method is \(O(n^2)\). For a faster, approximate method, you could consider using stag::approximate_similarity_graph.

Parameters
dataan \(n \times d\) matrix representing the dataset.
athe parameter of the similarity kernel.
Returns
a stag.graph.Graph object representing the similarity of the data