Analysing Militarised Interstate Dispute Dataset
While a typical application of local graph clustering is to find a set of vertices that loosely connects to the rest of an graph, these algorithms can be applied to find vertex sets with other other properties as well. This page demonstrates another application of local graph clustering, in which two sets of vertices highly connected to each other can be found in a local manner as well.
The input behind this demo is the Correlates of War Dyadic Militarised Interstae Dispute Dataset. The dataset contains a record of very militarised dispute between 1816 and 2014, and includes information of involved countries and the severity of the dispute. The dataset can be found
Date range: 1900 - 1950
Cluster size factor: 0.5
The data range specifies the time interval during which the conflict data is used in the algorithm; the cluster size factor is part of the algorithm's input, and roughly defines the size of clusters to be returned: a lower value of cluster size factor would allow the algorithm to find more localised clusters.
In addition to setting up these two parameters, you need to click a country in the map, and this picked country (coloured as dark blue) is served as the seed of the algorithm. The output of the algorithm is two sets of countries that tend to have conflicts during the chosen time range; these two sets are coloured in blue and yellow respectively, and your picked country is part of the blue cluster.
the selected country
the countries in the same group as the selected country
the countries in the other group
Our implemented algorithm for this application is based on Local Graph Clustering. Specifically, we construct a graph based on the militarised disputes within the chosen date range: every country is represented by a vertex in the graph, and there is an edge between each pair of countries weighted according to the severity of their military disputes. By performing random walks on the graph, our algoirthm returns two sets of countries which tend to enter conflicts with each other. The cluster size factor is used to control the length of the random walks, and shorter walks result in smaller clusters which are more closely related to the selected country.
Our implemented algorithm is based on the
approximate_pagerank method of the
STAG Library in order to compute the stationary distribution of the personalised Pagerank on
the constructed graph. The source code of this application can be found here.
Why is there no clustering result for some selected country and data range?
Due to the nature of this dataset, our constructed graph might not be a fully connected graph, and could contain many isolated vertices for a certain data range. If this occurs, then you wouldn't see the two sets of coloured countries.
Not every country is coloured even when I set the cluster size factor to be 1. Why?
For a certain date range, there may be countries for which the data set contains no information. These countries cannot be classified by the algorithm into either of the returned sets of countries and are left uncoloured.
Peter Macgregor, He Sun: Local Algorithms for finding densely connected clusters. ICML 2021: 7268-7278