Publications and PreprintsDetection and evaluation of clusters within sequential data. With Albert Senen-Cerda, Gianluca Kosmella, and Jaron Sanders.
Motivated by theoretical advancements in dimensionality reduction techniques we use a recent model, called Block Markov Chains, to conduct a practical study of clustering in real-world sequential data. Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees and can be deployed in sparse data regimes. Despite these favorable theoretical properties, a thorough evaluation of these algorithms in realistic settings has been lacking.
We address this issue and investigate the suitability of these clustering algorithms in exploratory data analysis of real-world sequential data. In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets. In order to evaluate the determined clusters, and the associated Block Markov Chain model, we further develop a set of evaluation tools. These tools include benchmarking, spectral noise analysis and statistical model selection tools. An efficient implementation of the clustering algorithm and the new evaluation tools is made available together with this paper.
Practical challenges associated to real-world data are encountered and discussed. It is ultimately found that the Block Markov Chain model assumption, together with the tools developed here, can indeed produce meaningful insights in exploratory data analyses despite the complexity and sparsity of real-world data.
A block Markov chain is a Markov chain whose state space can be partitioned into a finite number of clusters such that the transition probabilities only depend on the clusters. Block Markov chains thus serve as a model for Markov chains with communities. This paper establishes limiting laws for the singular value distributions of the empirical transition matrix and empirical frequency matrix associated to a sample path of the block Markov chain whenever the length of the sample path is Θ(n²) with n the size of the state space.
The proof approach is split into two parts. First, we introduce a class of symmetric random matrices with dependence called approximately uncorrelated random matrices with variance profile. We establish their limiting eigenvalue distributions by means of the moment method. Second, we develop a coupling argument to show that this general-purpose result applies to block Markov chains.
We give estimates for the zero loci of Bernstein-Sato ideals. An upper bound is proved as a multivariate generalisation of the upper bound by Lichtin for the roots of Bernstein-Sato polynomials. The lower bounds generalise the fact that log-canonical thresholds, small jumping numbers of multiplier ideals, and their real versions provide roots of Bernstein-Sato polynomials.
PresentationsTalk: Clusters within Markov chains: Detection, evaluation, and spectral fingerprints at 14th NETWORKS training week. (2022) [slides] Talk: Bulk of random matrices generated by Markov chains with community structure at fourth ZiF summer school on randomness in physics and mathematics. (2022) [slides] Talk: Spectra of random matrices with Markovian dependence and non-constant variance profile at 50th Saint-Flour probability summer school. (2022) [abstract] Poster: Singular value distribution of random matrices with block Markovian dependence at XVII Brunel-Bielefeld Workshop. (2021) [poster]
They say a figure speaks a thousand words. The following figures are taken from the papers and preprints above.