Luca Ambrogioni
Assistant Professor - Donder's institute of Cognition
I am an assistant professor in AI. My areas of expertise are probabilistic machine learning and theoretical neuroscience. In my work I design probabilistic models of the human brain based on deep neural networks. I am also active in pure machine learning research, especially in the field of variational inference and optimal transport.
Raya, G., & Ambrogioni, L. (2024). Spontaneous symmetry breaking in generative diffusion models. Advances in Neural Information Processing Systems, 36
Abstract taken from Google Scholar:
Generative diffusion models have recently emerged as a leading approach for generating high-dimensional data. In this paper, we show that the dynamics of these models exhibit a spontaneous symmetry breaking that divides the generative dynamics into two distinct phases: 1) A linear steady-state dynamics around a central fixed-point and 2) an attractor dynamics directed towards the data manifold. These two" phases''are separated by the change in stability of the central fixed-point, with the resulting window of instability being responsible for the diversity of the generated samples. Using both theoretical and empirical evidence, we show that an accurate simulation of the early dynamics does not significantly contribute to the final generation, since early fluctuations are reverted to the central fixed point. To leverage this insight, we propose a Gaussian late initialization scheme, which significantly improves model performance, achieving up to 3x FID improvements on fast samplers, while also increasing sample diversity (eg, racial composition of generated CelebA images). Our work offers a new way to understand the generative dynamics of diffusion models that has the potential to bring about higher performance and less biased fast-samplers.
Ambrogioni, L. (2023). In search of dispersed memories: Generative diffusion models are associative memory networks. arXiv preprint arXiv:2309.17290,
Abstract taken from Google Scholar:
Hopfield networks are widely used in neuroscience as simplified theoretical models of biological associative memory. The original Hopfield networks store memories by encoding patterns of binary associations, which result in a synaptic learning mechanism known as Hebbian learning rule. Modern Hopfield networks can achieve exponential capacity scaling by using highly non-linear energy functions. However, the energy function of these newer models cannot be straightforwardly compressed into binary synaptic couplings and it does not directly provide new synaptic learning rules. In this work we show that generative diffusion models can be interpreted as energy-based models and that, when trained on discrete patterns, their energy function is equivalent to that of modern Hopfield networks. This equivalence allows us to interpret the supervised training of diffusion models as a synaptic learning process that encodes the associative dynamics of a modern Hopfield network in the weight structure of a deep neural network. Accordingly, in our experiments we show that the storage capacity of a continuous modern Hopfield network is identical to the capacity of a diffusion model. Our results establish a strong link between generative modeling and the theoretical neuroscience of memory, which provide a powerful computational foundation for the reconstructive theory of memory, where creative generation and memory recall can be seen as parts of a unified continuum.
Ambrogioni, L., & Ólafsdóttir, H. (2023). Rethinking the hippocampal cognitive map as a meta-learning computational module. Elsevier
Abstract taken from Google Scholar:
A hallmark of biological intelligence is the ability to adaptively draw on past experience to guide behaviour under novel situations. Yet, the neurobiological principles that underlie this form of meta-learning remain relatively unexplored. In this Opinion, we review the existing literature on hippocampal spatial representations and reinforcement learning theory and describe a novel theoretical framework that aims to account for biological meta-learning. We conjecture that so-called hippocampal cognitive maps of familiar environments are part of a larger meta-representation (meta-map) that encodes information states and sources, which support exploration and provides a foundation for learning. We also introduce concrete hypotheses on how these generic states can be encoded using a principle of superposition.
Silvestri, G., Roos, D., & Ambrogioni, L. (2023). Deterministic training of generative autoencoders using invertible layers. International Conference on Learning Representations,
Abstract taken from Google Scholar:
In this work, we provide a deterministic alternative to the stochastic variational training of generative autoencoders. We refer to these new generative autoencoders as AutoEncoders within Flows (AEF), since the encoder and decoder are defined as affine layers of an overall invertible architecture. This results in a deterministic encoding of the data, as opposed to the stochastic encoding of VAEs. The paper introduces two related families of AEFs. The first family relies on a partition of the ambient space and is trained by exact maximum-likelihood. The second family exploits a deterministic expansion of the ambient space and is trained by maximizing the log-probability in this extended space. This latter case leaves complete freedom in the choice of encoder, decoder and prior architectures, making it a drop-in replacement for the training of existing VAEs and VAE-style models. We show that these AEFs can have strikingly higher performance than architecturally identical VAEs in terms of log-likelihood and sample quality, especially for low dimensional latent spaces. Importantly, we show that AEF samples are substantially sharper than VAE samples.
Ambrogioni, L. (2023). The statistical thermodynamics of generative diffusion models. arXiv preprint arXiv:2310.17467,
Abstract taken from Google Scholar:
Generative diffusion models have achieved spectacular performance in many areas of generative modeling. While the fundamental ideas behind these models come from non-equilibrium physics, in this paper we show that many aspects of these models can be understood using the tools of equilibrium statistical mechanics. Using this reformulation, we show that generative diffusion models undergo second-order phase transitions corresponding to symmetry breaking phenomena. We argue that this lead to a form of instability that lies at the heart of their generative capabilities and that can be described by a set of mean field critical exponents. We conclude by analyzing recent work connecting diffusion models and associative memory networks in view of the thermodynamic formulations.
Ambrogioni, L. (2023). Stationarity without mean reversion: Improper Gaussian process regression and improper kernels. arXiv preprint arXiv:2310.02877,
Abstract taken from Google Scholar:
Gaussian processes (GP) regression has gained substantial popularity in machine learning applications. The behavior of a GP regression depends on the choice of covariance function. Stationary covariance functions are favorite in machine learning applications. However, (non-periodic) stationary covariance functions are always mean reverting and can therefore exhibit pathological behavior when applied to data that does not relax to a fixed global mean value. In this paper, we show that it is possible to use improper GP prior with infinite variance to define processes that are stationary but not mean reverting. To this aim, we introduce a large class of improper kernels that can only be defined in this improper regime. Specifically, we introduce the Smooth Walk kernel, which produces infinitely smooth samples, and a family of improper Mat\'ern kernels, which can be defined to be -times differentiable for any integer . The resulting posterior distributions can be computed analytically and it involves a simple correction of the usual formulas. By analyzing both synthetic and real data, we demonstrate that these improper kernels solve some known pathologies of mean reverting GP regression while retaining most of the favourable properties of ordinary smooth stationary kernels.
Jonge, M., Wubben, N., Kaam, C., Frenzel, T., Hoedemaekers, C., Ambrogioni, L., Hoeven, J., Boogaard, M., & Zegers, M. (2022). Optimizing an existing prediction model for quality of life one‐year post‐intensive care unit: An exploratory analysis. Acta Anaesthesiologica Scandinavica, 66(10), 1228-1236
Abstract taken from Google Scholar:
This study aimed to improve the PREPARE model, an existing linear regression prediction model for long‐term quality of life (QoL) of intensive care unit (ICU) survivors by incorporating additional ICU data from patients' electronic health record (EHR) and bedside monitors.The 1308 adult ICU patients, aged ≥16, admitted between July 2016 and January 2019 were included. Several regression‐based machine learning models were fitted on a combination of patient‐reported data and expert‐selected EHR variables and bedside monitor data to predict change in QoL 1 year after ICU admission. Predictive performance was compared to a five‐feature linear regression prediction model using only 24‐hour data (R2 = 0.54, mean square error (MSE) = 0.031, mean absolute error (MAE) = 0.128).The 67.9% of the included ICU survivors was male and the median age was 65.0 [IQR: 57 …
Silvestri, G., Fertig, E., Moore, D., & Ambrogioni, L. (2022). Embedded-model flows: Combining the inductive biases of model-free deep learning and explicit probabilistic modeling. International Conference on Learning Representations,
Abstract taken from Google Scholar:
Normalizing flows have shown great success as general-purpose density estimators. However, many real world applications require the use of domain-specific knowledge, which normalizing flows cannot readily incorporate. We propose embedded-model flows (EMF), which alternate general-purpose transformations with structured layers that embed domain-specific inductive biases. These layers are automatically constructed by converting user-specified differentiable probabilistic models into equivalent bijective transformations. We also introduce gated structured layers, which allow bypassing the parts of the models that fail to capture the statistics of the data. We demonstrate that EMFs can be used to induce desirable properties such as multimodality, hierarchical coupling and continuity. Furthermore, we show that EMFs enable a high performance form of variational inference where the structure of the prior model is embedded in the variational architecture. In our experiments, we show that this approach outperforms state-of-the-art methods in common structured inference problems.
Hinne, M., Leeftink, D., Gerven, M., & Ambrogioni, L. (2022). Bayesian model averaging for nonparametric discontinuity design. Plos one, 17(6), e0270310
Abstract taken from Google Scholar:
Quasi-experimental research designs, such as regression discontinuity and interrupted time series, allow for causal inference in the absence of a randomized controlled trial, at the cost of additional assumptions. In this paper, we provide a framework for discontinuity-based designs using Bayesian model averaging and Gaussian process regression, which we refer to as ‘Bayesian nonparametric discontinuity design’, or BNDD for short. BNDD addresses the two major shortcomings in most implementations of such designs: overconfidence due to implicit conditioning on the alleged effect, and model misspecification due to reliance on overly simplistic regression models. With the appropriate Gaussian process covariance function, our approach can detect discontinuities of any order, and in spectral features. We demonstrate the usage of BNDD in simulations, and apply the framework to determine the effect of running for political positions on longevity, of the effect of an alleged historical phantom border in the Netherlands on Dutch voting behaviour, and of Kundalini Yoga meditation on heart rate.
Berezutskaya, J., Ambrogioni, L., Ramsey, N., & Gerven, M. (2022). Towards naturalistic speech decoding from intracranial brain data. IEEE
Abstract taken from Google Scholar:
Speech decoding from brain activity can enable development of brain-computer interfaces (BCIs) to restore naturalistic communication in paralyzed patients. Previous work has focused on development of decoding models from isolated speech data with a clean background and multiple repetitions of the material. In this study, we describe a novel approach to speech decoding that relies on a generative adversarial neural network (GAN) to reconstruct speech from brain data recorded during a naturalistic speech listening task (watching a movie). We compared the GAN-based approach, where reconstruction was done from the compressed latent representation of sound decoded from the brain, with several baseline models that reconstructed sound spectrogram directly. We show that the novel approach provides more accurate reconstructions compared to the baselines. These results underscore the potential of GAN …