Preprints
Exact cospectrality probabilities for uniform random matrices. arXiv preprint (2026) [pdf] [arXiv] [supplement]▷Abstract
We study the conjugation action of orthogonal matrices on symmetric random matrices. Given a fixed orthogonal matrix over an algebraic number field and a random matrix with entries sufficiently uniform in the ring of integers, we wonder what the probability is that the conjugate is again integral. Our main result establishes an exact formula for this probability in terms of the Smith ideals associated to the orthogonal matrix.
As an illustrative application, we establish exact formulas for the expected number of rational orthogonal matrices that preserve the integrality of a random matrix for every fixed denominator in dimensions two and three. Notably, the dependence on the denominator turns out to be non-monotone due to number-theoretic fluctuations. We also prove bounds on the probability of rational cospectrality with bounded but arbitrarily large denominator.
▷Abstract
Sufficient conditions for a simple graph to be characterized up to isomorphism given its spectrum and the spectrum of its complement graph are known due to Wang and Xu. This note establishes a related sufficient condition in the presence of loops: if the walk matrix has square-free determinant, then the graph is characterized by its generalized spectrum. The proof includes a general result about symmetric integral matrices.
▷Abstract
We study the mixing properties of a Brownian motion whose movements are hindered by semipermeable barriers. Our setting assumes that the process takes values in a smooth planar domain and that the barriers are one-dimensional closed curves. We establish an upper bound on the mixing time and a lower bound on the stationary distribution in terms of geometric parameters. These worst-case bounds decay at an exponential rate as the domain grows large, and we give examples that show that exponential decay is necessary in our worst-case setting.
▷Abstract
We study the recovery of one-dimensional semipermeable barriers for a stochastic process in a planar domain. The considered process acts like Brownian motion when away from the barriers and is reflected upon contact until a sufficient but random amount of interaction has occurred, determined by the permeability, after which it passes through. Given a sequence of samples, we wonder when one can determine the location and shape of the barriers.
This paper identifies several different recovery regimes, determined by the available observation period and the time between samples, with qualitatively different behavior. The observation period T dictates if the full barriers or only certain pieces can be recovered, and the sampling rate significantly influences the convergence rate as T→∞. This rate turns out polynomial for fixed-frequency data, but exponentially fast in a high-frequency regime.
Further, the environment’s impact on the difficulty of the problem is quantified using interpretable parameters in the recovery guarantees, and is found to also be regime-dependent. For instance, the curvature of the barriers affects the convergence rate for fixed-frequency data, but becomes irrelevant when T→∞ with high-frequency data.
The results are accompanied by explicit algorithms, and we conclude by illustrating the application to real-life data.
▷Abstract
We prove that a sum of random matrices generated by a ψ-mixing Markov chain has similar spectral properties to a Gaussian matrix with the same mean and covariance structure. This nonasymptotic universality principle enables sharp concentration inequalities when combined with recent advances in the Gaussian literature. We illustrate the theory with examples, showing how it enables polynomial dimensional improvements relative to previous Markovian matrix concentration results when applied to Wigner-type matrices, and how one can recover sharp limiting values for a model used to study of spectral clustering techniques.
A key challenge in the proof is that techniques based only on classical cumulants, which can be used when summands are independent, are not sufficient on their own for efficient estimates in a Markovian setting. Our approach exploits Boolean cumulants and a change–of–measure argument.
Published works
Detection and evaluation of clusters within sequential data. With Albert Senen-Cerda, Gianluca Kosmella, and Jaron Sanders. Data Mining and Knowledge Discovery (2025). [pdf] [arXiv] [Journal] [GitLab] [PyPi]▷Abstract
Sequential data is ubiquitous—it is routinely gathered to gain insights into complex processes such as behavioral, biological, or physical processes. Challengingly, such data not only has dependencies within the observed sequences, but the observa- tions are also often high-dimensional, sparse, and noisy. These are all difficulties that obscure the inner workings of the complex process under study. One solution is to calculate a low-dimensional representation that describes (characteristics of) the complex process. This representation can then serve as a proxy to gain insight into the original process. However, uncovering such low-dimensional representa- tion within sequential data is nontrivial due to the dependencies, and an algorithm specifically made for sequences is needed to guarantee estimator consistency. For- tunately, recent theoretical advancements on Block Markov Chains have resulted in new clustering algorithms that can provably do just this in synthetic sequential data. This paper presents a first field study of these new algorithms in real-world sequen- tial data; a wide empirical study of clustering within a range of data sequences. We investigate broadly whether, when given sparse high-dimensional sequential data of real-life complex processes, useful low-dimensional representations can in fact be extracted using these algorithms. Concretely, we examine data sequences containing GPS coordinates describing animal movement, strands of human DNA, texts from English writing, and daily yields in a financial market. The low-dimensional rep- resentations we uncover are shown to not only successfully encode the sequential structure of the data, but also to enable gaining new insights into the underlying complex processes.
▷Abstract
The walk matrix associated to an n×n integer matrix X and an integer vector b is defined by W:=(b,Xb,…,Xn−1b). We study limiting laws for the cokernel of W in the scenario where X is a random matrix with independent entries and b is deterministic. Our first main result provides a formula for the distribution of the pm-torsion part of the cokernel, as a group, when X has independent entries from a specific distribution. The second main result relaxes the distributional assumption and concerns the Z[x]-module structure.
The motivation for this work arises from an open problem in spectral graph theory which asks to show that random graphs are often determined up to isomorphism by their (generalized) spectrum. Sufficient conditions for generalized spectral determinacy can namely be stated in terms of the cokernel of a walk matrix. Extensions of our results could potentially be used to determine how often those conditions are satisfied. Some remaining challenges for such extensions are outlined in the paper.
▷Abstract
We give estimates for the zero loci of Bernstein-Sato ideals. An upper bound is proved as a multivariate generalisation of the upper bound by Lichtin for the roots of Bernstein-Sato polynomials. The lower bounds generalise the fact that log-canonical thresholds, small jumping numbers of multiplier ideals, and their real versions provide roots of Bernstein-Sato polynomials.
▷Abstract
A block Markov chain is a Markov chain whose state space can be partitioned into a finite number of clusters such that the transition probabilities only depend on the clusters. Block Markov chains thus serve as a model for Markov chains with communities. This paper establishes limiting laws for the singular value distributions of the empirical transition matrix and empirical frequency matrix associated to a sample path of the block Markov chain whenever the length of the sample path is Θ(n²) with n the size of the state space.
The proof approach is split into two parts. First, we introduce a class of symmetric random matrices with dependence called approximately uncorrelated random matrices with variance profile. We establish their limiting eigenvalue distributions by means of the moment method. Second, we develop a coupling argument to show that this general-purpose result applies to block Markov chains.
Theses
Latent structures in Markovian processes and associated random matrices. Awarded with honors (top 5% of PhD graduates), Eindhoven University of Technology (2025). [pdf] [TU/e] [Honors distinction]▷Summary
Sequential data is essential for the understanding of processes occurring in various
areas, ranging from natural language processing to ecology and microbiology. However, deciphering the meaning hidden within the sequential order of data points
can be difficult in practice as observations are often noisy and high-dimensional,
and the underlying processes are typically complex. Adopting a mathematical
perspective, this thesis pursues fundamental understanding of algorithms that can
uncover latent structures encoding the key dynamics of such processes. Chapter 1
introduces the main themes and summarizes our contributions.
The first part of this thesis investigates a stochastic process whose movements
are hindered by barriers in the environment, referred to as Brownian motion with
semipermeable barriers. Given an observed sequence of samples from the process,
we consider the problem of recovering the shape and location of the barriers. Our
goal is to understand how the difficulty of this recovery problem depends on the
structure of the environment, including the geometry of the barriers and their
permeability, and how it depends on the nature of the data determined by the
intersample time and the total observation period. We start these investigations
in Chapter 2 by proving basic properties related to the well-definedness of the
process, and by giving key proof techniques used in the subsequent chapters.
Our results regarding the recovery problem are rigorously stated and proved in
Chapters 3 and 4 and reveal that the problem has qualitatively different behavior
depending on the nature of the available data. The observation period dictates if
one can recover the full barriers or only certain pieces. The sampling rate is found
to influence the rate of decay for the approximation error significantly: while
this rate is polynomial in a fixed-frequency regime, it is exponentially fast given
high frequency data. How the environment enters the problem is also found to
be regime-dependent. For instance, the curvature of the barriers turns out to be
key in the convergence rate for fixed-frequency data but becomes asymptotically
irrelevant for the convergence rate in a high-frequency regime; see Theorems 3.1
and 4.3. Two of the parameters used in the results depend on the mixing properties
of the process, and their direct dependence on the environment is not necessarily
transparent. We hence investigate to what extent the mixing properties can be
controlled in terms of geometric and permeability-based quantities in Chapter 5
where we establish worst-case estimates.
The second part of this thesis investigates spectral methods for uncovering
structure from data, including clustering algorithms for sequential data and the
spectral characterization of networks. The term “spectral” indicates that such methods exploit the eigendecomposition or singular value decomposition of a matrix constructed based on the data. When the data is random to some extent,
the constructed matrix will also be so. A fundamental challenge in understanding
spectral methods is hence to analyze the properties of random matrices. While
the properties of matrices constructed from independent data are well-understood,
however, this is not so for matrices constructed from sequential data because dependencies make the mathematical analysis of a system more difficult. Throughout
Chapters 6 to 10 we develop new proof techniques applicable to random matrices
in the presence of sequential dependencies.
The motivation for Chapters 6 to 9 arises from block Markov chains, a mathematical model for Markov chains with ground-truth clusters in their state space.
Understanding the properties of random matrices is here essential for the analysis
of spectral clustering algorithms which recover the underlying clusters based on
a sequence of observations. Chapter 6 characterizes the limiting singular value
distributions in a regime where the sample path is sufficiently long to construct a
dense matrix. The analysis of spectral algorithms in particular requires bounds
on the distance of the constructed random matrix to its expected value measured
in the operator norm. Chapter 7 hence establishes general-purpose matrix concentration results for random matrices generated by Markov chains. Those results
are illustrated by example with block Markov chains in Chapter 8 to get a concentration estimate whose leading-order term is asymptotically sharp, and we there
also improve the results from Chapter 6 regarding singular value distributions to
permit sparse matrices. Aside from our focus on proof-technical challenges coming
from the dependence, Chapter 9 also briefly explores the application of clustering
methods and our theoretical results to real-world sequential data.
Finally, the results in Chapter 10 are motivated by an open problem in spectral
graph theory which asks whether eigenvalues of associated matrices typically
provide enough information to characterize a network uniquely. Deterministic
sufficient conditions for a graph to be characterized by its generalized spectrum
can be stated in terms of the number-theoretic properties of the walk matrix of
the graph, but it is not known how often those conditions are satisfied. To make
progress towards the latter question, we develop new proof techniques for studying
the walk matrix of a random graph, exploiting that the columns of this matrix
can be interpreted as a stochastic process in the analysis.
▷Summary
This masters’ thesis concerned topics in algebraic geometry surrounding D-modules and Bernstein-Sato ideals. After some preliminaries on derived category theory and spectral sequences, the first two chapters established new estimates for the zero loci of Bernstein-Sato ideals that generalized previous monovariate results from the literature. (These results have since been published in PRIMS; see also arXiv:2111.03334.) The final chapter explored how a notion from from D-module theory known as holonomicity can be used to derive recursions for the moments of a random variable.
Figures
They say a figure speaks thousand words. The following figures illustrate some of my research from the papers and preprints above.






