May 29–30, 2026 · Columbia University, New York

Day 1 · May 29 · SSW Bldg · Room C03

8:30Breakfast
9:00–9:10Opening remarks, Tian Zheng (Columbia)
9:10–9:50 Extended Talk A Jennifer Hill (NYU) Jennifer Hill (Professor of Applied Statistics, NYU) Multiple comparisons: Why even Bayesians may need to worry Researchers asking causal questions are often interested not only in the average treatment effect but also subgroup specific treatment effects that allow for a more nuanced understanding of who benefits from an intervention. However, this pursuit can lead to issues with unwarranted "researcher degrees of freedom" and failure to properly adjust for multiple comparisons. While previous research has demonstrated that Bayesian methods with regularizing prior distributions are more conservative than their frequentist counterparts and can lead to more appropriate assessments of uncertainty than either ignoring the issue or using corrections (Bonferroni, FDR), the extent to which Bayesian methods eliminate the problem of multiple comparisons still depends critically on the context and specifics of the method. Critically, we demonstrate a setting common in social sciences where standard Bayesian regularizing priors are not sufficient to control false positive claims and additionally can lead to sign errors. We characterize this setting as dominated by “shrinkage to the wrong place” and discuss potential remedies. Chair: Ben Goodrich (Columbia)
10:00–11:00 Session 1: The Talented Mr. P Stephen Ansolabehere (Harvard) Stephen Ansolabehere (Frank G. Thompson Professor of Government, Harvard) TBDTBD Yajuan Si (Michigan) Yajuan Si (Research Associate Professor, Michigan) MRP is turning 30: Are we entering the golden age, or just getting started?Developed by Gelman and Little (1997), Multilevel Regression and Poststratification (MRP) was designed to obtain desirable subgroup estimates using complex sample survey data. In many ways, MRP perfectly embodies Andrew’s favorite maxims: "fit multilevel models as defaults," "statisticians cannot avoid adjustment," and notably, "survey weighting is a mess." Over the past 30 years, MRP has proven practical success across disciplines. Yet MRP is frequently critiqued for lacking theoretical guarantees like design consistency or double robustness. As a statistical model that accounts for data collection, MRP is subject to model misspecification and cannot fix low data quality or bad study design. In this talk, I will trace the evolution of MRP, discussing recent methodological improvements and current efforts to extend the framework into the AI era, ensuring it continues to yield reproducible and reliable findings in an increasingly complex data landscape. Qixuan Chen (Columbia) Qixuan Chen (Associate Professor of Biostatistics, Columbia University) Predictive Inference for Non-Probability Samples Using Bayesian Machine LearningProbability surveys are the gold standard for population inference but are increasingly costly and subject to declined response rates. Non-probability samples, though more accessible, raise concerns about generalizability. In this talk, I present Bayesian predictive inference methods that integrate non-probability samples with administrative data or electronic health records in data-rich settings with high-dimensional auxiliary information. We first consider estimation of population means using non-probability surveys and then extend the framework to generalizability and transportability of causal effects from randomized trials. Our methods model high-dimensional covariates via Bayesian Additive Regression Trees and incorporate the propensity score for sample inclusion using natural cubic splines, along with a balancing transformation to better align propensity score distributions between trial and target populations. Simulation studies show improved performance over existing methods, with smaller root mean squared error and coverage closer to nominal levels. We illustrate the approaches with real-world applications. Chair: Shigeo Hirano (Columbia)
11:20–12:20 Session 2: 0.234 — Theory in Computing Charles Margossian (UBC) Charles Margossian (Assistant Professor, University of British Columbia) TBDTBD Collin Cademartori (Wake Forest) Collin Cademartori (Assistant Professor, Wake Forest) Can Probabilistic Programming Make Workflows Work?Probabilistic programming has represented a major step in the separation of concerns between Bayesian modeling and inference, with corresponding gains in the efficiency and flexibility of model building. As model building has become easier, a growing literature has emerged to tackle the central question of workflow: When so many models can be easily built, how do we decide what to build, how to build it, and how to evaluate the end product? This literature has offered up numerous tools to handle pieces of this question, including tools for validation of computation (SBC), specification of priors (elicitation), evaluation of model predictions (LOO), and assessment of model fit (PPCs), among other tasks. While such tools are often designed to fit into a coherent workflow pipeline, substantial hurdles remain for integrating them in practice. Different tools leverage different pieces of the underlying model (e.g. log likelihoods for LOO, hyperparameters for elicitation). Currently, furnishing the right metadata and gluing the pieces together often requires manual work from the user and repetition of intent across different blocks of code. Existing probabilistic programming languages can offer little in the way of automation for these tasks. Languages like Stan do not guarantee any concrete model structure until runtime, while more restricted languages define some model properties statically, but unnecessarily limit the class of models which can be implemented. Both of these are poor fits for integrating a variety of workflow tools, each with their own requirements of the underlying model. This talk aspirationally considers possible future developments in probabilistic programming to address the integration of disparate tooling into customizable workflow pipelines, with the aim of defining a flexible software platform which current and future methodologies can plug into. Matt Hoffman Matt Hoffman Running Markov Chain Monte Carlo on Modern Hardware and SoftwareToday, cheap numerical hardware offers huge amounts of parallel computing power, much of which is used for the task of fitting and applying neural networks. Adoption of this hardware to accelerate statistical Markov chain Monte Carlo (MCMC) applications has been slower. We suggest some patterns for speeding up MCMC workloads using the hardware (e.g., GPUs, TPUs) and software (e.g., PyTorch, JAX) that have driven progress in deep learning over the last fifteen years or so. We offer some intuitions for why these new systems are so well suited to MCMC, and show some examples where we use them to achieve dramatic speedups over a CPU-based workflow. Finally, we discuss some potential pitfalls to watch out for. Chair: Ruobin Gong (Rutgers)
12:20–2:00Lunch
2:00–3:00 Session 3: The Science of Defaults Susan Gelman (Michigan) Susan Gelman (Heinz Werner Distinguished University Professor of Psychology and Linguistics, University of Michigan) Andrew Gelman: The Early Years (and Beyond) This talk will share personal reflections on Andrew's early years and his important influence on the field of psychology, from my perspective as a psychologist who studies cognitive development, and as Andrew's sister. Upmanu Lall (Arizona State/Columbia) Upmanu Lall (Arizona State/Columbia) TBD Tian Zheng (Columbia) Tian Zheng (Professor of Statistics, Columbia University) Statistical Thinking and AI EducationAs AI becomes more common in education and practice, statistical thinking remains essential. In my talk, I will discuss the importance of core ideas such as uncertainty, model validation, and data interpretation in AI Education across disciplines. Integrating these concepts helps students move beyond using tools to understanding how and why models work. This approach supports more reliable, transparent, and responsible use of AI, and highlights the role of statisticians in shaping effective AI education. Chair: Rahul Dodhia (Microsoft Research)
3:10–4:10 Session 4: Regression and Other Stories Jonathan Auerbach (George Mason) Jonathan Auerbach (Assistant Professor, George Mason) How temperature regimes near the equinox synchronize spring biological events?Many biological processes, including plant leafout and flowering, occur once cumulative temperatures reach a threshold (the thermal-sum model). In this way, temperatures are thought to coordinate the timing of biological events. But growing evidence suggests that as climates warm, both the advancement of spring has slowed (declining sensitivity) and the variance in the timing of spring events has increased (declining synchrony), raising questions about the resilience of temperature-based coordination to anthropogenic climate change. To answer these questions, researchers have complicated the thermal-sum model, introducing additional factors and mechanisms. We consider whether such complexity is necessary. Using results from the theory of stopped random walks, we show that sensitivity and synchrony are exactly as predicted by the basic thermal-sum model. The theory suggests a nonlinear relationship between temperatures and both the timing and synchrony of biological events. In particular, it predicts that as temperatures increase and springtime events shift from the equinox toward the solstice, the events themselves become less coordinated and more variable. We verify these predictions using experimental and real-world data, including 10,000 observations of common lilacs (United States, 1956-2025). We conclude that the theory provides a powerful tool for understanding the thermal-sum model, particularly when considering additional complexity. Rob Trangucci (Oregon State) Rob Trangucci (Assistant Professor, Oregon State University) Identified vaccine efficacy for post-infection outcomes"In order to meet regulatory approval, a new vaccine must show that it reduces the risk of an outcome like symptomatic disease, severe illness, or death in a randomized clinical trial. Because infection is necessary for these outcomes, one may be interested in the causal effect on a post-infection outcome, namely an outcome conditional on infection. Conditioning on a post-treatment outcome affected by the treatment leads to selection bias, but one can use principal stratification to do valid causal inference; this method partitions the total causal effect of vaccination into two causal effects: vaccine efficacy against infection, and the principal effect of vaccine efficacy on post-infection outcomes in patients who would be infected under both placebo and vaccination. Despite the importance of such principal effects to policymakers, these estimands are generally unidentifiable, even under strong assumptions that are rarely satisfied in real-world trials. We develop a novel method to point identify these principal effects while eliminating the monotonicity assumption and allowing for measurement error. Furthermore, our results allow for multiple treatments, and are general enough to be applicable outside of vaccine efficacy. Our method relies on the fact that many vaccine trials are run at geographically disparate health centers, and measure biologically-relevant categorical pretreatment covariates. We show that our method can be applied to a variety of clinical trial settings where vaccine efficacy against infection and a post-infection outcome can be jointly inferred. This methodology can yield new insights from existing vaccine efficacy trial data and will aid researchers in designing new multi-arm clinical trials. " David Rothschild (Microsoft Research) David Rothschild (Economist, Microsoft Research) Survey Research from MRP to AI: Applying What We Learned from the Last Disruption to Guiding the NextEarly work on multilevel regression and poststratification (MRP), including collaborations using non-probability data such as Xbox samples, demonstrated that credible population inference could be recovered from unconventional data through modeling. This work shifted attention from sampling to the full survey workflow. In this talk, I reflect on how decomposing surveys into ideation, design, target population, administration, processing, and reporting reveals where assumptions enter and where error accumulates. And how as these methods became mainstream, through the persistence of many in this room, they not only transformed survey research directly but also reshaped how the field responds to disruption. And, that shift is now driving how survey research is confronting the new transformation driven by AI. Chair: Shira Mitchell (Blue Rose Research)
4:20–5:00 Extended Talk B Sophia Rabe-Hesketh (Berkeley) Sophia Rabe-Hesketh (Professor of Educational Statistics and Biostatistics, UC Berkeley) Simple suggestions for missing data and the DIC Missing data methods that ignore the missingness process, such as multiple imputation or joint modeling of the response variable(s) and partially observed covariates, assume that data are missing at random (MAR). My first simple suggestion is to “make” the missingness MAR under certain MAR violations by deleting more data (Rabe-Hesketh & Skrondal, Psychometrika, 2023). The deviance information criterion (DIC) is not invariant to reparameterization and can be unstable with a negative effective number of parameters, for instance in finite mixture models. My second simple suggestion is to define a new version of the DIC that does not suffer from these problems (Xiao & Rabe-Hesketh, in progress), making use of an alternative definition of the effective number of parameters (Gelman, Hwang, & Vehtari. Stat Comput, 2014).
5:00–6:00Light refreshments (Faculty House)
6:00–9:30Dinner banquet (Faculty House)

Day 2 · May 30 · SSW Bldg · Room C03

8:30Breakfast
9:00–9:40 Extended Talk C Aki Vehtari (Aalto) Aki Vehtari (Academy Professor in computational Bayesian modeling, Aalto University) PSIS-LOO and loo package: 10+ years on I'll provide a brief history of PSIS-LOO and loo R package for Bayesian cross-validation, progress to review recent advances in Bayesian cross-validation for estimating and comparing predictive performance and model checking, and conclude with some practical advice on model selection. Chair: Charles Margossian
10:00–11:00 Session 5: The Folk Theorem of Applied Statistics Tom Belin (UCLA) Tom Belin (Professor of Biostatistics, UCLA) TBDTBD Yair Ghitza (Catalist) Yair Ghitza (Chief Scientist, Catalist) Andrew Gelman's Influence on Real-World CampaignsOver the past 15 years, Andrew's work has quietly become the basis of a large portion of actual campaign analysis. His views on uncertainty and humility, his development of MRP and associated tools, and his contributions to our understanding of electoral forecasts have become daily drivers of much of the political work happening in the real world. I'll trace through this influence, highlighting some of our joint work alongside parallel work across the ecosystem. Yuling Yao (UT Austin) Yuling Yao (Assistant Professor, UT Austin) I cannot believe you have not used MRP in importance weightingVanilla importance sampling is the building block of modern statistical computing. The estimate is prone to high or infinite variance, especially in a high-dimensional target distribution. Inspired by the heuristics of Multilevel Regression and Poststratification (MRP) in survey sampling, we propose Post-stratified importance sampling (P-SIS) as an orthogonal alternative to vanilla importance sampling, applicable to the generic setting of computing expectations using proposal draws. Our approach rearranges the target expectation by first conditioning on the importance ratios. We estimate the conditional expectation via Gaussian process regression and the marginal distribution via a score-based heavy-tailed normalizing flow. We show in both simulation and theory that P-SIS achieves lower error than existing tools. We apply P-SIS to various applications, including cross-validation. Chair: Chuanhai Liu (Purdue)
11:20–12:20 Session 6: The Garden of Forking Paths Tyler McCormick (University of Washington) Tyler McCormick (Professor of Statistics and Sociology, University of Washington) Through the Woods and Under the Gate: Navigating the Rashomon Set of Forking PathsThe Rashomon Effect, which refers to the existence of multiple models near the statistically optimal one (e.g., the maximum a posteriori model), means that several forking paths can lead to models that are strongly supported by the data, but have vastly different substantive implications. In this talk, we introduce Rashomon Partition Sets (RPSs) as a Bayesian approach to navigate this model multiplicity. We will demonstrate how to exhaustively enumerate the "Rashomon set" of all near-optimal models, identifying statistically indistinguishable ways to partition the data. By exploring this diverse set of highly plausible explanations, we offer a framework for drawing more robust conclusions and confidently navigating the forking paths of data analysis. Masanao Yajima (Boston University/Takeda) Masanao Yajima (Professor, Boston University; Associate Director of Statistics, Takeda Pharmaceuticals) Walking the Forking Paths Together: A Collaborative Workflow for Robust InferenceThe “garden of forking paths” is a systemic challenge in applied statistics: even when we report a single final analysis, our conclusions are shaped by many reasonable, data‑contingent choices. In this talk, I describe a practical response developed in Boston University’s M.S. in Statistical Practice (MSSP) Statistics Practicum consulting model. Instead of policing scientists, we “walk the path together” with clients by making help accessible and free of charge. The secret sauce lies in the educational win-win relationship we create. The model is inspired by the Applied Consulting Center at Columbia University and by hands-on experience with Andrew. Dan Simpson Dan Simpson TBDTBD Chair: Maria Grazia Pittau (Sapienza)
12:20–2:00Lunch
2:00–3:00 Session 7: Panel — 10 Things I (still) Hate About Stan Daniel Lee (Bayesian Ops) Daniel Lee (Bayesian Ops) Edward Roualdes (Cal State) Edward Roualdes (Cal State) Matthijs Vákár (Utrecht) Matthijs Vákár (Utrecht) Mitzi Morris (GoldbeltFed) Mitzi Morris (GoldbeltFed) Steve Bronder (Flatiron) Steve Bronder (Flatiron)
3:20–4:20 Session 8: Blue Piranha, Red Piranha Ben Goodrich (Columbia) Ben Goodrich (Columbia) Table Fusion: Estimating the joint distribution of population characteristicsGelman's Multilevel Regression and Poststratification framework requires a population table to poststratify the predictions of the multilevel regression from a non-representative sample. While the U.S. Census department makes it relatively easy to construct a population table at the state level crossed with race, sex, and age, as more demographic variables are added to the joint distribution, the process becomes more difficult and it is even harder to stratify by geographic areas that are smaller than states. Table Fusion is an estimation method that takes advantage of the properties of the Dirichlet-multinomial distribution to extend a population table so that it includes one or more additional variables where we observe data on SOME conditional probability but not THE probability conditional on the variables whose joint distribution is known in the population. The data-generating process for all the variables jointly is assumed to be Dirichlet-multinomial, whose unknown parameters imply a log-probability of observing various incomplete tables, such as those produced by the American Community Survey. Draws from the posterior distribution of these unknown parameters can be obtained from Stan and used to obtain a predictive distribution for all the variables jointly in the population. Sharad Goel (Harvard) Sharad Goel (Professor of Public Policy, Harvard Kennedy School) Teaching Statistics with AI: A Bag of TricksInspired by Andrew's "Teaching Statistics: A Bag of Tricks" (co-authored with Deborah Nolan), I'll present some of the ways I've been experimenting with AI in my own introductory stats classes. I'll also describe ways we're using AI to improve foundational numeracy among K-12 students around the world. Along the way, I'll discuss some of the big questions on the future of teaching and learning. Douglas Rivers (Stanford) Douglas Rivers (Professor of Political Science, Stanford; Chief Scientist, YouGov) The Evidence of Your Eyes and EarsReal-world experiments that expose subjects to information are challenging because each subject brings their own biases and stock of private prior information, which can be difficult to measure. When subjects have strong prior beliefs and information is widespread, the effects of incremental information are likely to be small. Yet, it is in these situations that the effects are of greatest interest. Using a survey experiment conducted in the immediate aftermath of the shooting of Alex Pretti in Minneapolis, we estimate the effect of exposure to new information. The information is vivid, shocking videos that was often inconsistent with subjects’ prior beliefs. Controlling for baseline attitudes and prior exposure, we find small average treatment effects that mask heterogeneous responses. Specifically, among those not previously exposed to the videos, we find large changes in factual beliefs, with smaller but still significant effects on attitudes. However, these are mediated by prior attitudes, with the direction of effects often moving in opposite directions. Chair: Robert S. Erikson (Columbia)
4:30–5:30 Closing Talk: Four Decades of Bad Ideas, Blind Alleys, Misunderstandings, and Errors Andrew Gelman (Columbia) Andrew Gelman (Higgins Professor of Statistics, Professor of Political Science, Columbia)Four Decades of Bad Ideas, Blind Alleys, Misunderstandings, and Errors I will review some of the many errors and oversights—I've made over the years from false theorems and impossible algorithms, to modeling and data coding errors, to avoidable lapses in communication—with the goal of understanding how we can more effectively learn and recover from our mistakes. I would also like to thank my many collaborators on these projects, because there's no way I could've made all these wrong turns on my own! Introduced by Caroline Gelman (CUNY Hunter College)