Schedule | Gelman 60-ish Workshop

May 29–30, 2026 · Columbia University, New York

Day 1 · May 29 · SSW Bldg · Room C03

8:30	Breakfast
9:00–9:10	Opening remarks, Tian Zheng (Columbia)
9:10–9:50	Extended Talk A Jennifer Hill (NYU) Jennifer Hill (Professor of Applied Statistics, NYU) Multiple comparisons: Why even Bayesians may need to worry Researchers asking causal questions are often interested not only in the average treatment effect but also subgroup specific treatment effects that allow for a more nuanced understanding of who benefits from an intervention. However, this pursuit can lead to issues with unwarranted "researcher degrees of freedom" and failure to properly adjust for multiple comparisons. While previous research has demonstrated that Bayesian methods with regularizing prior distributions are more conservative than their frequentist counterparts and can lead to more appropriate assessments of uncertainty than either ignoring the issue or using corrections (Bonferroni, FDR), the extent to which Bayesian methods eliminate the problem of multiple comparisons still depends critically on the context and specifics of the method. Critically, we demonstrate a setting common in social sciences where standard Bayesian regularizing priors are not sufficient to control false positive claims and additionally can lead to sign errors. We characterize this setting as dominated by “shrinkage to the wrong place” and discuss potential remedies. Chair: Ben Goodrich (Columbia)
10:00–11:00	Session 1: The Talented Mr. P Ben Goodrich (Columbia) Ben Goodrich (Columbia) Table fusion: Estimating the joint distribution of population characteristicsGelman's Multilevel Regression and Poststratification framework requires a population table to poststratify the predictions of the multilevel regression from a non-representative sample. While the U.S. Census department makes it relatively easy to construct a population table at the state level crossed with race, sex, and age, as more demographic variables are added to the joint distribution, the process becomes more difficult and it is even harder to stratify by geographic areas that are smaller than states. Table Fusion is an estimation method that takes advantage of the properties of the Dirichlet-multinomial distribution to extend a population table so that it includes one or more additional variables where we observe data on SOME conditional probability but not THE probability conditional on the variables whose joint distribution is known in the population. The data-generating process for all the variables jointly is assumed to be Dirichlet-multinomial, whose unknown parameters imply a log-probability of observing various incomplete tables, such as those produced by the American Community Survey. Draws from the posterior distribution of these unknown parameters can be obtained from Stan and used to obtain a predictive distribution for all the variables jointly in the population. Yajuan Si (Michigan) Yajuan Si (Research Associate Professor, Michigan) MRP is turning 30: Are we entering the golden age, or just getting started?Developed by Gelman and Little (1997), Multilevel Regression and Poststratification (MRP) was designed to obtain desirable subgroup estimates using complex sample survey data. In many ways, MRP perfectly embodies Andrew’s favorite maxims: "fit multilevel models as defaults," "statisticians cannot avoid adjustment," and notably, "survey weighting is a mess." Over the past 30 years, MRP has proven practical success across disciplines. Yet MRP is frequently critiqued for lacking theoretical guarantees like design consistency or double robustness. As a statistical model that accounts for data collection, MRP is subject to model misspecification and cannot fix low data quality or bad study design. In this talk, I will trace the evolution of MRP, discussing recent methodological improvements and current efforts to extend the framework into the AI era, ensuring it continues to yield reproducible and reliable findings in an increasingly complex data landscape. Qixuan Chen (Columbia) Qixuan Chen (Associate Professor of Biostatistics, Columbia University) Predictive inference for non-probability samples using Bayesian machine learningProbability surveys are the gold standard for population inference but are increasingly costly and subject to declined response rates. Non-probability samples, though more accessible, raise concerns about generalizability. In this talk, I present Bayesian predictive inference methods that integrate non-probability samples with administrative data or electronic health records in data-rich settings with high-dimensional auxiliary information. We first consider estimation of population means using non-probability surveys and then extend the framework to generalizability and transportability of causal effects from randomized trials. Our methods model high-dimensional covariates via Bayesian Additive Regression Trees and incorporate the propensity score for sample inclusion using natural cubic splines, along with a balancing transformation to better align propensity score distributions between trial and target populations. Simulation studies show improved performance over existing methods, with smaller root mean squared error and coverage closer to nominal levels. We illustrate the approaches with real-world applications. Chair: Shigeo Hirano (Columbia)
11:20–12:20	Session 2: 0.234 — Theory in Computing Charles Margossian (UBC) Charles Margossian (Assistant Professor, University of British Columbia) Variational inference in the presence of symmetryVariational inference (VI) approximates a target density p by the best match q in a family of tractable distributions. The best variational approximation is found by minimizing a divergence between distributions and several divergences have been proposed as objective functions of VI, with different choices leading to different approximations. In this talk, I show that even when these divergences have different minimizers, the resulting approximations all abide by certain symmetry-matching principles. Under favorable conditions, these principles lead to exact recovery of the mean and correlation matrix. I also explore cases where the target density p only exhibits symmetry along some but not all of its coordinates. These partial symmetries arise naturally in Bayesian hierarchical models, where the prior induces a challenging geometry but still possesses axes of symmetry. Collin Cademartori (Wake Forest) Collin Cademartori (Assistant Professor, Wake Forest) Can probabilistic programming make workflows work?Probabilistic programming has represented a major step in the separation of concerns between Bayesian modeling and inference, with corresponding gains in the efficiency and flexibility of model building. As model building has become easier, a growing literature has emerged to tackle the central question of workflow: When so many models can be easily built, how do we decide what to build, how to build it, and how to evaluate the end product? This literature has offered up numerous tools to handle pieces of this question, including tools for validation of computation (SBC), specification of priors (elicitation), evaluation of model predictions (LOO), and assessment of model fit (PPCs), among other tasks. While such tools are often designed to fit into a coherent workflow pipeline, substantial hurdles remain for integrating them in practice. Different tools leverage different pieces of the underlying model (e.g. log likelihoods for LOO, hyperparameters for elicitation). Currently, furnishing the right metadata and gluing the pieces together often requires manual work from the user and repetition of intent across different blocks of code. Existing probabilistic programming languages can offer little in the way of automation for these tasks. Languages like Stan do not guarantee any concrete model structure until runtime, while more restricted languages define some model properties statically, but unnecessarily limit the class of models which can be implemented. Both of these are poor fits for integrating a variety of workflow tools, each with their own requirements of the underlying model. This talk aspirationally considers possible future developments in probabilistic programming to address the integration of disparate tooling into customizable workflow pipelines, with the aim of defining a flexible software platform which current and future methodologies can plug into. Matt Hoffman Matt Hoffman Nested R hat: Assessing the convergence of Markov chain Monte Carlo when running many short chainsThe Gelman-Rubin (1992) potential scale reduction factor R̂ is an extremely popular Markov chain Monte Carlo (MCMC) convergence diagnostic, but it can require a long sampling phase to work well. This is fine if we are only running a few chains (say, four), and need to run them for a long time to get enough samples to get low-variance estimates. But recent developments in parallel MCMC algorithms allow us to run thousands of chains almost as quickly as a single chain, using hardware accelerators such as GPUs. While each chain still needs to forget its initial point during a warmup phase, we no longer need a long sampling phase to reduce variance, since we're already averaging across many chains. But how can we determine if these short chains are reliable? We propose nested R̂, a generalization of the classic Gelman-Rubin statistic that works well in the many-short-chains regime. The derivation (based on a simple application of the law of total variance) and accompanying theory also offer some insights into the logic of the original R̂. Chair: Ruobin Gong (Rutgers)
12:20–2:00	Lunch
2:00–3:00	Session 3: The Science of Defaults Susan Gelman (Michigan) Susan Gelman (Heinz Werner Distinguished University Professor of Psychology and Linguistics, University of Michigan) Andrew Gelman: The early years (and beyond) This talk will share personal reflections on Andrew's early years and his important influence on the field of psychology, from my perspective as a psychologist who studies cognitive development, and as Andrew's sister. Upmanu Lall (Arizona State/Columbia) Upmanu Lall (Arizona State/Columbia) Weather Jiu-Jitsu: From predictions to weather control using AI enabled spatio-temporal simulators Spatio-temporal prediction of weather extremes has been a scientific challenge for a long time, and a variety of methods have emerged. This talk goes a step further. It asks the question - can we control or modify emerging extreme weather trajectories? The Paradigm is Weather Jiu Jitsu - can small perturbations introduced at situations with chaotic instabilities (Positive Lyapunov exponents in the flow) be amplified by leveraging the energy of the atmosphere to defuse or shift the extreme away from where it would cause impact. I will present some idealized examples and also some illustrations using AI based weather forecast models to show the potential of moving tropical cyclones, atmospheric rivers, freezes and heat waves in some conditions. Tian Zheng (Columbia) Tian Zheng (Professor of Statistics, Columbia University) Statistical thinking and AI educationAs AI becomes more common in education and practice, statistical thinking remains essential. In my talk, I will discuss the importance of core ideas such as uncertainty, model validation, and data interpretation in AI Education across disciplines. Integrating these concepts helps students move beyond using tools to understanding how and why models work. This approach supports more reliable, transparent, and responsible use of AI, and highlights the role of statisticians in shaping effective AI education. Chair: Rahul Dodhia (Microsoft Research)
3:20–4:20	Session 4: Regression and Other Stories Jonathan Auerbach (George Mason) Jonathan Auerbach (Assistant Professor, George Mason) How temperature regimes near the equinox synchronize spring biological events?Many biological processes, including plant leafout and flowering, occur once cumulative temperatures reach a threshold (the thermal-sum model). In this way, temperatures are thought to coordinate the timing of biological events. But growing evidence suggests that as climates warm, both the advancement of spring has slowed (declining sensitivity) and the variance in the timing of spring events has increased (declining synchrony), raising questions about the resilience of temperature-based coordination to anthropogenic climate change. To answer these questions, researchers have complicated the thermal-sum model, introducing additional factors and mechanisms. We consider whether such complexity is necessary. Using results from the theory of stopped random walks, we show that sensitivity and synchrony are exactly as predicted by the basic thermal-sum model. The theory suggests a nonlinear relationship between temperatures and both the timing and synchrony of biological events. In particular, it predicts that as temperatures increase and springtime events shift from the equinox toward the solstice, the events themselves become less coordinated and more variable. We verify these predictions using experimental and real-world data, including 10,000 observations of common lilacs (United States, 1956-2025). We conclude that the theory provides a powerful tool for understanding the thermal-sum model, particularly when considering additional complexity. Rob Trangucci (Oregon State) Rob Trangucci (Assistant Professor, Oregon State University) Identified vaccine efficacy for post-infection outcomes"In order to meet regulatory approval, a new vaccine must show that it reduces the risk of an outcome like symptomatic disease, severe illness, or death in a randomized clinical trial. Because infection is necessary for these outcomes, one may be interested in the causal effect on a post-infection outcome, namely an outcome conditional on infection. Conditioning on a post-treatment outcome affected by the treatment leads to selection bias, but one can use principal stratification to do valid causal inference; this method partitions the total causal effect of vaccination into two causal effects: vaccine efficacy against infection, and the principal effect of vaccine efficacy on post-infection outcomes in patients who would be infected under both placebo and vaccination. Despite the importance of such principal effects to policymakers, these estimands are generally unidentifiable, even under strong assumptions that are rarely satisfied in real-world trials. We develop a novel method to point identify these principal effects while eliminating the monotonicity assumption and allowing for measurement error. Furthermore, our results allow for multiple treatments, and are general enough to be applicable outside of vaccine efficacy. Our method relies on the fact that many vaccine trials are run at geographically disparate health centers, and measure biologically-relevant categorical pretreatment covariates. We show that our method can be applied to a variety of clinical trial settings where vaccine efficacy against infection and a post-infection outcome can be jointly inferred. This methodology can yield new insights from existing vaccine efficacy trial data and will aid researchers in designing new multi-arm clinical trials. " David Rothschild (Microsoft Research) David Rothschild (Economist, Microsoft Research) Survey research from MRP to AI: Applying What We Learned from the Last Disruption to Guiding the NextEarly work on multilevel regression and poststratification (MRP), including collaborations using non-probability data such as Xbox samples, demonstrated that credible population inference could be recovered from unconventional data through modeling. This work shifted attention from sampling to the full survey workflow. In this talk, I reflect on how decomposing surveys into ideation, design, target population, administration, processing, and reporting reveals where assumptions enter and where error accumulates. And how as these methods became mainstream, through the persistence of many in this room, they not only transformed survey research directly but also reshaped how the field responds to disruption. And, that shift is now driving how survey research is confronting the new transformation driven by AI. Chair: Shira Mitchell (Blue Rose Research)
4:40–5:20	Extended Talk B Sophia Rabe-Hesketh (Berkeley) Sophia Rabe-Hesketh (Professor of Educational Statistics and Biostatistics, UC Berkeley) Simple suggestions for missing data and the DIC Missing data methods that ignore the missingness process, such as multiple imputation or joint modeling of the response variable(s) and partially observed covariates, assume that data are missing at random (MAR). My first simple suggestion is to “make” the missingness MAR under certain MAR violations by deleting more data (Rabe-Hesketh & Skrondal, Psychometrika, 2023). The deviance information criterion (DIC) is not invariant to reparameterization and can be unstable with a negative effective number of parameters, for instance in finite mixture models. My second simple suggestion is to define a new version of the DIC that does not suffer from these problems (Xiao & Rabe-Hesketh, in progress), making use of an alternative definition of the effective number of parameters (Gelman, Hwang, & Vehtari. Stat Comput, 2014). Chair: Yajuan Si (Michigan)
5:45–6:45	Light refreshments (Faculty House)
6:45–9:30	Dinner banquet (Faculty House)

Day 2 · May 30 · SSW Bldg · Room C03

8:30	Breakfast
9:00–9:40	Extended Talk C Aki Vehtari (Aalto) Aki Vehtari (Academy Professor in computational Bayesian modeling, Aalto University) PSIS-LOO and loo package: 10+ years on I'll provide a brief history of PSIS-LOO and loo R package for Bayesian cross-validation, progress to review recent advances in Bayesian cross-validation for estimating and comparing predictive performance and model checking, and conclude with some practical advice on model selection. Chair: Charles Margossian
10:00–11:00	Session 5: The Folk Theorem of Applied Statistics Tom Belin (UCLA) Tom Belin (Professor of Biostatistics, UCLA) What would a major-league scholar recommend? Andrew Gelman has had a transforming impact on the field of political science by bringing a statistical perspective to discussions of how political systems operate and by clarifying where ethics and value judgments belong in the decision-making process. Although the impact of partisanship in the redistricting process was not so prominent a few decades ago, in recent years, and especially in the past year, congressional redistricting in the United States has become chaotic owing to greater detail in available data, advances in information technology, breakdowns of traditional norms that had served as a buffer against aggressive partisanship, and legal decisions that have relaxed redistricting rules to allow more aggressive partisanship. With analogies to rules made in sports leagues, where the lack of competitiveness tolerated in the political realm would never be tolerated by sports fans, this presentation will discuss redistricting in the U.S. in the spirit of Andrew Gelman's contributions to the field, anticipating how different rules might affect how systems would operate and aiming to clarify ways that innovations in the legal realm (potentially including a constitutional amendment) channeling important value judgments could lead to better long-run governance. Yair Ghitza (Catalist) Yair Ghitza (Chief Scientist, Catalist) Andrew Gelman's influence on real-world campaignsOver the past 15 years, Andrew's work has quietly become the basis of a large portion of actual campaign analysis. His views on uncertainty and humility, his development of MRP and associated tools, and his contributions to our understanding of electoral forecasts have become daily drivers of much of the political work happening in the real world. I'll trace through this influence, highlighting some of our joint work alongside parallel work across the ecosystem. Yuling Yao (UT Austin) Yuling Yao (Assistant Professor, UT Austin) I cannot believe you have not used MRP in importance weightingVanilla importance sampling is the building block of modern statistical computing. The estimate is prone to high or infinite variance, especially in a high-dimensional target distribution. Inspired by the heuristics of Multilevel Regression and Poststratification (MRP) in survey sampling, we propose Post-stratified importance sampling (P-SIS) as an orthogonal alternative to vanilla importance sampling, applicable to the generic setting of computing expectations using proposal draws. Our approach rearranges the target expectation by first conditioning on the importance ratios. We estimate the conditional expectation via Gaussian process regression and the marginal distribution via a score-based heavy-tailed normalizing flow. We show in both simulation and theory that P-SIS achieves lower error than existing tools. We apply P-SIS to various applications, including cross-validation. Chair: Chuanhai Liu (Purdue)
11:20–12:20	Session 6: The Garden of Forking Paths Tyler McCormick (University of Washington) Tyler McCormick (Professor of Statistics and Sociology, University of Washington) Through the woods and under the gate: Navigating the rashomon set of forking pathsThe Rashomon Effect, which refers to the existence of multiple models near the statistically optimal one (e.g., the maximum a posteriori model), means that several forking paths can lead to models that are strongly supported by the data, but have vastly different substantive implications. In this talk, we introduce Rashomon Partition Sets (RPSs) as a Bayesian approach to navigate this model multiplicity. We will demonstrate how to exhaustively enumerate the "Rashomon set" of all near-optimal models, identifying statistically indistinguishable ways to partition the data. By exploring this diverse set of highly plausible explanations, we offer a framework for drawing more robust conclusions and confidently navigating the forking paths of data analysis. Masanao Yajima (Boston University/Takeda) Masanao Yajima (Professor, Boston University; Associate Director of Statistics, Takeda Pharmaceuticals) Walking the forking paths together: A collaborative workflow for robust inferenceThe “garden of forking paths” is a systemic challenge in applied statistics: even when we report a single final analysis, our conclusions are shaped by many reasonable, data‑contingent choices. In this talk, I describe a practical response developed in Boston University’s M.S. in Statistical Practice (MSSP) Statistics Practicum consulting model. Instead of policing scientists, we “walk the path together” with clients by making help accessible and free of charge. The secret sauce lies in the educational win-win relationship we create. The model is inspired by the Applied Consulting Center at Columbia University and by hands-on experience with Andrew. Dan Simpson Dan Simpson TBDTBD Chair: Maria Grazia Pittau (Sapienza)
12:20–2:00	Lunch
2:00–3:00	Session 7: Panel — 10 Things I (still) Hate About Stan Daniel Lee (Bayesian Ops) Daniel Lee (Bayesian Ops) Edward Roualdes (Cal State) Edward Roualdes (Professor of Statistics, Cal State Chico) Matthijs Vákár (Utrecht) Matthijs Vákár (Associate Professor of Computer Science, Utrecht) Mitzi Morris (GoldbeltFed) Mitzi Morris (GoldbeltFed) Steve Bronder (Flatiron) Steve Bronder (Flatiron)
3:20–4:20	Session 8: Blue Piranha, Red Piranha Rajiv Sethi (Columbia) Rajiv Sethi (Professor of Economics at Barnard College, Columbia University) Political prediction markets Political prediction markets have been around for several decades but have attracted renewed attention recently with the growth of two new entrants, Polymarket and Kalshi. In this talk I'll discuss the power and perils of such markets, focusing on the assessment of accuracy, insiders and spoofing, reflexivity, market manipulation, identity-verification, and interaction across ideological boundaries. Sharad Goel (Harvard) Sharad Goel (Professor of Public Policy, Harvard Kennedy School) Teaching statistics with AI: A bag of tricksInspired by Andrew's "Teaching Statistics: A Bag of Tricks" (co-authored with Deborah Nolan), I'll present some of the ways I've been experimenting with AI in my own introductory stats classes. I'll also describe ways we're using AI to improve foundational numeracy among K-12 students around the world. Along the way, I'll discuss some of the big questions on the future of teaching and learning. Douglas Rivers (Stanford) Douglas Rivers (Professor of Political Science, Stanford; Chief Scientist, YouGov) The evidence of your eyes and earsReal-world experiments that expose subjects to information are challenging because each subject brings their own biases and stock of private prior information, which can be difficult to measure. When subjects have strong prior beliefs and information is widespread, the effects of incremental information are likely to be small. Yet, it is in these situations that the effects are of greatest interest. Using a survey experiment conducted in the immediate aftermath of the shooting of Alex Pretti in Minneapolis, we estimate the effect of exposure to new information. The information is vivid, shocking videos that was often inconsistent with subjects’ prior beliefs. Controlling for baseline attitudes and prior exposure, we find small average treatment effects that mask heterogeneous responses. Specifically, among those not previously exposed to the videos, we find large changes in factual beliefs, with smaller but still significant effects on attitudes. However, these are mediated by prior attitudes, with the direction of effects often moving in opposite directions. Chair: Robert S. Erikson (Columbia)
4:30–5:30	Closing Talk: Four Decades of Bad Ideas, Blind Alleys, Misunderstandings, and Errors Andrew Gelman (Columbia) Andrew Gelman (Higgins Professor of Statistics, Professor of Political Science, Columbia)Four Decades of Bad Ideas, Blind Alleys, Misunderstandings, and Errors I will review some of the many errors and oversights—I've made over the years from false theorems and impossible algorithms, to modeling and data coding errors, to avoidable lapses in communication—with the goal of understanding how we can more effectively learn and recover from our mistakes. I would also like to thank my many collaborators on these projects, because there's no way I could've made all these wrong turns on my own! Introduced by Caroline Gelman (CUNY Hunter College)