Reflected Diffusion Models Diffusion models are trained to reverse a stochastic process through score matching. However, a lot of diffusion models rely on a small but critical implementation detail called thresholding. Thresholding projects the sampling process to the data support after each discretized diffusion step, stabilizing generation at the cost of breaking the theoretical framework. Interestingly, as one limits the number of steps to infinity, thresholding converges to a reflected stochastic differential equation. In this blog post, we will be discussing our recent work on Reflected Diffusion Models, which explores this connection to develop a new class of diffusion models which correctly trains for thresholded sampling and respects general boundary constraints.
Hyena Hierarchy: Towards Larger Convolutional Language Models Attention is amazing, but it does have some drawbacks. Attention is fundamentally a quadratic operation, as it compares each pair of points in a sequence. This quadratic runtime has limited the amount of context that our models can process. We're excited to share our latest work on Hyena, a subquadratic-time layer that has the potential to significantly increase context length in sequence models.
Learning to Imitate An overview of our NeurIPS 2021 Spotlight paper, "IQ-Learn: Inverse Q-Learning for Imitation". Learning from imitation of experts is a powerful paradigm for building better AI systems for decision-making. Nevertheless, current methods are heavily data-inefficient or rely on unstable adversarial training. We present a novel framework that introduces simple, stable, and data-efficient learning from few experts that scales well to complex environments. It obtains SOTA results often outperforming existing methods by more than 3x.
A Mechanism Design Alternative to Individual Calibration Here is an overview of our AISTATS 2021 oral paper, "Right Decisions from Wrong Predictions, A Mechanism Design Alternative to Individual Calibration". We consider how a prediction service can implement an “insurance” against misprediction, such that its customers can use each prediction as if it is perfectly correct. The insurance can be implemented at provably no cost to the prediction service (in the long run).
Probabilistic Circuits for Variational Inference in Discrete Graphical Models Here is an overview of our NeurIPS 2020 paper, "Probabilistic Circuits for Variational Inference in Discrete Graphical Models". We consider the problem of variational inference in high-dimensional discrete settings, which is challenging for stochastic/sampling-based optimization methods. Instead, we would like to optimize using analytic (exact) gradients, but this has typically been limited to simple mean-field variational families. In this work, we extend this direction of analytic optimization by proposing an expressive variational family based on tools from probabilistic circuits (i.e. arithmetic circuits, Sum Product Networks). Our method's combination of expressivity and tractability of computing exact gradients is suitable for inference in high-dimensional discrete settings.
Bias and Generalization in Deep Generative Models Highlights from our NeurIPS 2018 spotlight paper, "Bias and Generalization in Deep Generative Models." In high dimensional settings, density estimation algorithms rely crucially on their inductive bias. Despite recent empirical success, the inductive bias of deep generative models is not well understood. In this paper we propose a framework to systematically investigate bias and generalization in deep generative models of images. Inspired by experimental methods from cognitive psychology, we probe each learning algorithm with carefully designed training datasets to characterize when and how existing models generate novel attributes and their combinations. We identify similarities to human psychology and verify that these patterns are consistent across commonly used models and architectures.
Sliced Score Matching: A Scalable Approach to Density and Score Estimation An overview for our UAI 2019 paper on Sliced Score Matching. We show how to use random projections to scale up score matching—a classic method to learn unnormalized probabilisic models—to high-dimensional data. Theoretically, sliced score matching produces a consistent and asymptotic normal estimator under some regularity conditions. We apply sliced score matching to training deep energy-based models, learning VAEs with implicit encoders and training Wasserstein Auto-Encoders (WAEs).
Controllable Fairness in Machine Learning An overview of our AISTATS 2019 paper, "Learning Controllable Fair Representations". We consider an urgent question: how do we control the fairness of machine learning systems? In response, we introduce a theoretically grounded method for learning controllable fair representations. Using our method, a party who is concerned with fairness (like a data collector, community organizer, or regulatory body) can convert data to representations with controllable limits on unfairness, then release only the representations. This makes it much more difficult for any downstream machine learning models to discriminate.
Uncertainty Autoencoders: Learning Compressed Representations via Variational Information Maximization An overview for our AISTATS 2019 paper introducing Uncertainty Autoencoders (UAE). Compressed sensing techniques enable efficient acquisition and recovery of sparse, high-dimensional data signals via low-dimensional projections. In an uncertainty autoencoder (UAE), we treat the low-dimensional projections as noisy latent representations of an autoencoder and directly learn both the acquisition (i.e., encoding) and amortized recovery (i.e., decoding) procedures via a tractable variational information maximization objective. UAEs provide a unified treatment of several lines of related work in dimensionality reduction, compressed sensing, and generative modeling.
Tile2Vec: Unsupervised representation learning for spatially distributed data Highlights from our AAAI 2019 paper, "Tile2Vec: Unsupervised representation learning for spatially distributed data". We extend the distributional hypothesis from NLP to spatial data and use it to learn representations of satellite imagery that can be used for different kinds of downstream tasks. We show that the features learned by Tile2Vec, without any labeled data, achieve performance comparable to fully supervised CNNs trained on 50k+ labeled examples on a difficult land cover classification task.
Accelerating Natural Gradient with Higher-Order Invariance An overview for our ICML 2018 paper, Accelerating Natural Gradient with Higher-Order Invariance. Natural gradient update loses its invariance due to the finite step size. In this paper, we study the invariance of natural gradient from the perspective of Riemannian geometry, and propose several new update rules to improve its invariance. Empirical results show that better invariance can result in faster convergence in several supervised learning, unsupervised learning and reinforcement learning applications.
A-NICE-MC: Adversarial Training for MCMC A brief introduction to our NIPS paper, A-NICE-MC: Adversarial Training for MCMC. In this paper, we introduce a method that allows us to train domain specific MCMC proposals that are more efficient than hand crafted ones.
Variational Rejection Sampling An overview for our upcoming AISTATS 2018 paper, Variational Rejection Sampling. Variational learning of intractable posteriors is computationally efficient, but the resulting approximations can be poor. We propose a general approach to increase the flexibility of any variational family by augmenting a parameterized variational posterior with a differentiable and tunable accept-reject step. The rejection rate for the resulting resampled posterior can be adaptively controlled to monotonically trade-off computation for statistical accuracy, with the resampled posterior matching the true posterior in the limit.
Approximate Inference via Weighted Rademacher Complexity Highlights from our AAAI 2018 paper, "Approximate Inference via Weighted Rademacher Complexity." In this work we consider the challenging problem of computing the sum of more numbers than can be explicitly enumerated. This sum arises in various contexts, such as the partition function of a graphical model, the permanent of a matrix, or the number of satisfying assignments of a propositional formula. By establishing a novel connection with Rademacher complexity, we show how this sum can be estimated and bounded by solving an optimization problem; finding the largest number in the sum after random perturbations have been applied.
Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models An overview for our upcoming AAAI 2018 paper, Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models. We propose a new approach to evaluate, compare, and interpolate between objectives based on maximum likelihood estimation and advesarial training for learning generative models. We find that even though adversarial training generates visually appealing samples, it obtains log-likelihoods that are orders of magnitudes worse than maximum likelihood -- even a trivial Gaussian mixture model baseline memorizing the data can obtain better likelihoods (and beautiful samples)! But, why?