Optimal Transport and Machine Learning

NeurIPS 2021 Workshop. 13th of December 2021


The workshop will alternate between invited speakers surveying the state of the art and contributed talks presenting recent advances. The presentations will highlight the interplay between recent theoretical advances, innovative efficient numerical solvers and successful applications in ML.

Time (CET)
Time (EST)
What? Who?
14:00 : 14:45 08:00 : 08:45 Plenary Speaker Caroline Uhler
Optimal Transport in the Biomedical Sciences: Challenges and Opportunities
14:45 : 15:00 08:45 : 09:00 Oral Danilo J. Rezende, Sebastien Racaniere
Implicit Riemannian Concave Potential Maps
We are interested in the challenging problem of modelling densities on Riemannian manifolds with a known symmetry group using normalising flows. This has many potential applications in physical sciences such as molecular dynamics and quantum simulations. In this work we combine ideas from implicit neural layers and optimal transport theory to propose a generalisation of existing work on exponential map flows, Implicit Riemannian Concave Potential Maps, IRCPMs. IRCPMs have some nice properties such as simplicity of incorporating knowledge about symmetries and are less expensive then ODE-flows. We provide an initial theoretical analysis of its properties and layout sufficient conditions for stable optimisation. Finally, we illustrate the properties of IRCPMs with density learning experiments on tori and spheres.
15:00 : 16:10 09:00 : 10:10 Plenary Speaker Alessio Figalli
Regularity Theory of Optimal Transport Maps
In optimal transport, understanding the regularity of optimal maps is an important topic. This lecture aims to present the regularity theory for optimal maps, explain the connection to Monge-Ampère type equations, and overview the most fundamental results available.
16:10 : 16:35 10:10 : 10:35 Keynote Speaker Beatrice Acciaio
Generative Adversarial Learning with Adapted Distances
Generative Adversarial Networks (GANs) have proven to be a powerful framework for learning to draw samples from complex distributions. In this talk I will discuss the challenge of learning sequential data via GANs. This notably requires the choice of a loss function that reflects the discrepancy between (measures on) paths. To take on this task, we employ adapted versions of optimal transport distances, that result from imposing a temporal causality constraint on classical transport problems. This constraint provides a natural framework to parameterize the cost function that is learned by the discriminator as a robust (worst-case) distance. We then employ a modification of the empirical measure, to ensure consistency of the estimators. Following Genevay et al. (2018), we also include an entropic penalization term which allows for the use of the Sinkhorn algorithm when computing the optimal transport cost.
16:35 : 17:15 10:35 : 11:15 Spotlights
17:00 : 17:45 11:00 : 11:45 Poster Session
17:45 : 18:30 11:45 : 12:30 Plenary Speaker Lenaic Chizat
Entropic Regularization of Optimal Transport as a Statistical Regularization
The squared 2-Wasserstein distance is a natural loss to compare probability distributions in generative models or density fitting tasks thanks to its « informative » gradient, but this loss suffers from a poor sample and computational complexity compared to alternative losses such as kernel MMD. Adding an entropic regularization and debiaising the resulting quantity (yielding the Sinkhorn divergence) mitigates these downsides but also leads to a degradation of the discriminative power of the loss and of the quality of its gradients. In order to understand the trade-offs at play, we propose to study entropic regularization as one typically studies regularization in Machine Learning: by discussing the optimization, estimation and approximation errors, and their trade-offs, covering in passing a variety of recent works in the field. The analysis, complemented with numerical experiments, suggests that entropic regularization actually improves the quality and efficiency of the estimation of the squared 2-Wasserstein distance, compared to the plug-in (i.e unregularized) estimator.
18:30 : 18:55 12:30 : 12:55 Keynote Speaker Chin-Wei Huang
Optimal Transport and Probability Flows
In this talk, I will present some recent work at the intersection of optimal transport (OT) and probability flows. Optimal transport is an elegant theory that has diverse downstream applications. For likelihood estimation in particular, there has been a recent interest in using parametric invertible models (aka normalizing flows) to approximate the data distribution of interest. I will present my recent work on parameterizing flows using a neural convex potential, which is inspired by Brenier's theorem. In addition, I will cover a few other recently proposed probability flow models related to OT.
18:55 : 19:20 12:55 : 13:20 Keynote Speaker Yongxin Chen
Graphical Optimal Transport and its Applications
Multi-marginal optimal transport (MOT) is a generalization of optimal transport theory to settings with possibly more than two marginals. The computation of the solutions to MOT problems has been a longstanding challenge. In this talk, we introduce graphical optimal transport, a special class of MOT problems. We consider MOT problems from a probabilistic graphical model perspective and point out an elegant connection between the two when the underlying cost for optimal transport allows a graph structure. In particular, an entropy regularized MOT is equivalent to a Bayesian marginal inference problem for probabilistic graphical models with the additional requirement that some of the marginal distributions are specified. This relation on the one hand extends the optimal transport as well as the probabilistic graphical model theories, and on the other hand leads to fast algorithms for MOT by leveraging the well-developed algorithms in Bayesian inference. We will cover recent developments of graphical optimal transport in theory and algorithms. We will also go over several applications in aggregate filtering and mean field games.
19:20 : 20:00 13:20 : 14:00 Poster Session
20:00 : 20:25 14:00 : 14:25 Keynote Speaker Pinar Demetci
Enabling integrated analysis of single-cell multi-omic datasets with optimal transport
In this work, I will present an application of optimal transport to integrate multi-modal biological datasets. Cells in multicellular organisms specialize to carry out different functions despite having the same genetic material. This is thanks to cell-type-specific gene regulation and misregulation of genes can result in disease. With today’s sequencing technologies, we can take measurements at the single-cell resolution and probe different aspects of the genome that influence gene regulation, such as chemical modifications on the DNA, its 3D structure, etc. Jointly studying these measurements will give a holistic view of the regulatory mechanisms. However, with a few exceptions, applying multiple technologies on the same single cell is not possible. Then, computational integration of separately taken multi-modal genomic (“multi-omic”) measurements is crucial to enable joint analyses. This task requires an unsupervised approach due to the lack of correspondences known as a priori. We present an algorithm, Single Cell alignment with Optimal Transport (SCOT), that relies on Gromov-Wasserstein optimal transport to align single-cell multi-omic datasets. We show that SCOT yields alignments competitive with state-of-the-art and unlike previous methods, can approximately self-tune its hyperparameters by tracking the Gromov-Wasserstein distance between the aligned datasets. With its unbalanced multi-modal extension, it can integrate more than two datasets and yields quality alignments in different scenarios of disproportionate cell type representation across measurements.
20:25 : 20:40 14:25 : 14:40 Oral Aram-Alexandre Pooladian
Entropic Estimation of Optimal Transport Maps
We develop a computationally tractable method for estimating the optimal map between two distributions over with rigorous finite-sample guarantees. Leveraging an entropic version of Brenier's theorem, we show that our estimator---the barycentric projection of the optimal entropic plan---is easy to compute using Sinkhorn's algorithm. As a result, unlike current approaches for map estimation, which are slow to evaluate when the dimension or number of samples is large, our approach is parallelizable and extremely efficient even for massive data sets. Under smoothness assumptions on the optimal map, we show that our estimator enjoys comparable statistical performance to other estimators in the literature, but with much lower computational cost. We showcase the efficacy of our proposed estimator through numerical examples. Our proofs are based on a modified duality principle for entropic optimal transport and on a method for approximating optimal entropic plans due to Pal (2019).
20:40 : 20:55 14:40 : 14:55 Oral Zaid Harchaoui, Lang Liu
Discrete Schrödinger Bridges with Applications to Two-Sample Homogeneity Testing
We introduce an entropy-regularized statistic that defines a divergence between probability distributions. The statistic is the transport cost of a coupling which admits an expression as a weighted average of Monge couplings with respect to a Gibbs measure. This coupling is related to the static Schrödinger bridge given a finite number of particles. We establish the asymptotic consistency of the statistic as the sample size goes to infinity and show that the population limit is the solution of Föllmer's entropy-regularized optimal transport. The proof technique relies on a chaos decomposition for paired samples. We illustrate the interest of the approach on the two-sample homogeneity testing problem.
20:55 : 21:20 14:55 : 15:20 Keynote Speaker Yunan Yang
Benefits of using Optimal Transport in Computational Learning and Inversion
Understanding the generalization capacity has been a central topic in mathematical machine learning. In this talk, I will present a generalized weighted least-squares optimization method for computational learning and inversion with noisy data. In particular, using the Wasserstein metric as the objective function and implementing the Wasserstein gradient flow (or Wasserstein natural gradient descent method) fall into the framework. The weighting scheme encodes both a priori knowledge on the object to be learned and a strategy to weight the contribution of different data points in the loss function. We will see that appropriate weighting from prior knowledge can greatly improve the generalization capability of the learned model.
21:20 : 21:25 15:20 : 15:25 Closing Remarks
21:25 : 22:00 15:25 : 16:00 Poster Session