Calendar

Module I — The Generalization Gap in Biohealth: Why “Scale” Fails

Objective: Understand why predictive success often fails to translate to deployment for two distinct reasons — generalization failure (models break under distribution shift) and identification failure (models capture associations, not causal mechanisms).

Week 1: Predictive Success vs. Causal Validity

Mar 30

Lecture 1 Why causality matters in biohealth

Course framing: identification and generalization as scientific validity in real-world clinical and biological practice
“World models” vs shortcut predictors; predictive accuracy vs counterfactual validity; getting to mechanistic validity.
Canonical failure modes across domains: proxy learning, selection/measurement bias, and feedback loops
Course overview and logistics

Apr 1

Lecture 2 Dataset shift and identification, reframed causally

Shift taxonomy (covariate/label/concept) : what changed in the data-generating process?
Selection vs. sampling; collider bias and selection; Berkson’s bias
Feedback and performativity: when deployment changes the data-generating process
Identification failure within a single population
RCTs as the gold standard for identification: what they solve (confounding) and what they don’t (transportability, mechanism)

Week 2: Foundations & Aspirations

Apr 6

Lecture 3 : TA-led Causal inference primer

DAGs, the do-operator, confounding, d-separation, backdoor criterion
Identification strategies: adjustment, instrumental variables, front-door criterion
ATE, ATT, CATE: the estimands that matter in biomedicine

Apr 8

Lecture 4 : Guest lecture Virtual cell models — hope vs. hype

Objectives and evaluation of “virtual cells” / “digital twins”
Interpolation vs. extrapolation in perturbation space
Evaluation beyond reconstruction: interventional prediction, transport across labs, mechanistic sanity checks

Module II — Mechanistic & Hybrid Models

Objective: Integrate mechanistic knowledge with ML to improve both identification (constraining models toward causal mechanisms) and generalization (enabling extrapolation beyond the training distribution, e.g., to new interventions and contexts).

Week 3: Inductive Bias and the Hybrid Modeling Toolkit

Apr 13

Lecture 5 Inductive bias taxonomy through case studies

Taxonomy: architectural / regularization / data / evaluation, with biomedical examples (equivariance, pathway priors, biological data augmentation, benchmark leakage)
Cautionary tales: Mechanism-aligned bias vs. “bias toward the wrong story”
How bias choice connects to both failure modes: shift-robust features and mechanism-aligned representations
Student presentation inductive bias in biomedical ML (e.g., equivariance in molecular models, graph-structured priors, or evaluation-as-bias)

Apr 15

Lecture 6 The hybrid modeling toolkit

The hybrid spectrum: pure mechanistic → gray-box → pure data-driven
Neural ODEs, universal differential equations, physics-informed neural networks
Case studies: glucose dynamics (CGM), pharmacokinetics, wearable biosignals
When hybrids help (extrapolation, sample efficiency, identifiability, interpretability) vs. when they mislead (compensating errors)
Student presentation hybrid modeling (e.g., neural ODE for clinical trajectories, PK/PD, mechanistic pathway integration, or gray-box approaches in biological systems)

Module III — Causal Representations & Learning from Interventions

Objective: Learn representations that capture causal structure rather than associational shortcuts; leverage interventional data to validate and improve them.

Week 4: Causal Representation Learning

Apr 20

Lecture 7 From pixels and counts to causal state

Why representation is the bottleneck for both generalization and identification
Invariance across environments; identifiability of latent causal variables
Causal disentanglement; representations as hypotheses tested by interventional and OOD probes
Student presentation hybrid or mechanistic modeling (e.g., structured dynamics, physics-informed approaches to clinical data, or domain-knowledge-constrained learning)

Apr 22

Lecture 8 : Student presentations Causal representation learning (3 papers)

Invariant/causal representations across environments, or causal foundation models
Non-identifiability, nuisance leakage, or representation failure
Causal disentanglement, independent mechanism analysis, or identifiability in single cells

Week 5: Learning from Interventional Data — Perturbation Biology as Causal Inference

Apr 27

Lecture 9 Perturbation biology, multimodal representations, and interpretability

Estimands in perturbation biology
Perturbation screens as the biological analogue of RCTs, with their own identification challenges (batch/plate and CRISPR non-targeting confounders)
CRISPR as “intent-to-treat”: PerturbVI
Multimodal learning from unpaired data
Counterfactual inference in single cells; the benchmarking challenge (linear baselines vs. deep models)
Student presentation perturbation biology (e.g., response prediction, counterfactual inference, or benchmarking)

Apr 29

Lecture 10 : Student presentations Perturbation biology, counterfactual inference & causal discovery (3 papers)

Perturbation response prediction or counterfactual inference in single cells
Causal structure learning from interventional data
Experimental design or active learning for perturbation screens

May 1

Project proposal due

1-page proposal (teams of up to 2)

Week 6: Foundation Models, Generative Approaches, and Evaluation

May 4

Lecture 11 : Guest lecture CellFlux — flow matching for perturbation prediction

CellFlux: flow matching for modeling morphological responses to perturbations
SDE extension with Bayesian treatment for improved generalization and OOD detection
CellFluxRL: RL-based post-training with biologically anchored rewards
Student presentation generative modeling or flow matching for biological data

May 6

Lecture 12 : Student presentations Foundation models, evaluation & benchmarking (3 papers)

Foundation models for single-cell or perturbation data
Evaluation methodology and benchmarking
Multimodal biological learning or mechanistic interpretability

Module IV — Decision-Making and Moving Models Across Domains

Objective: Learn and evaluate treatment policies from observational data; formalize when and how causal effects transfer across populations and biological systems.

Week 7: Policy Learning — Off-Policy Evaluation & Treatment Decisions

May 11

Lecture 13 Estimating the value of a policy you’ve never run

The decision problem: learning a treatment policy from observational data
Why naive evaluation fails; inverse propensity weighting and its instability
Doubly robust estimation; learning individualized treatment rules
Biomedical applications: adaptive treatment strategies, personalized dosing
Student presentation clinical policy learning or off-policy evaluation

May 13

Lecture 14 : Student presentations Policy learning & experimental design (3 papers)

Clinical policy learning or off-policy evaluation
Active learning or Bayesian experimental design
Treatment effect estimation or confounding-robust evaluation

Week 8: Causal Transportability

May 18

Lecture 15 When can you trust a model trained elsewhere?

Pearl’s transportability framework vs. domain adaptation; selection diagrams as a tool for reasoning about what must be invariant
Two failure modes at the transport level: distribution shift vs. misidentified mechanism
The biological evidence ladder as a transportability problem: cell lines → organoids → animal models → patients; transportability across cellular contexts
Practical transportability across institutions and populations: what target-site data and operational constraints are needed
Student presentations cross-site/cross-population transfer; external validity across cellular contexts or populations

May 18

Project midway (stress test) report due

One negative control + one domain shift / robustness experiment

May 20

Lecture 16 : Guest lecture Transportability in clinical development

Synthetic control arms, real-world evidence (RWE), bridging RCTs and observational data
FDA’s evolving stance on external controls; “virtual twin” approaches
Student presentation synthetic control arms, RWE, or external validity in clinical trials

Module V — Frontiers & Course Wrap-up

Objective: Evaluate foundation models, AI agents, and “world models” as scientific tools in biohealth; synthesize the course’s dual “identification + generalization” framework into a practical audit checklist.

Week 9: Agentic AI and Scientific Reasoning

May 25

No class (Memorial Day)

May 27

Lecture 17 : Guest lecture Can LLMs and AI agents reason causally about biology?

Where foundation models help: representation, multimodal alignment, hypothesis generation, protocol writing
Where they fail: hallucination, implicit selection bias, weak causal grounding
Evaluation: stress tests under shift, counterfactual probes, calibration of scientific claims
Student presentation AI agents for science, or evaluation of foundation models in biomedicine

Week 10: Course Synthesis & Final Presentations

Jun 1

Lecture 18 Integrative synthesis

Integrative synthesis: what we learned about inductive bias, state representation, interventions, and transport
The dual thesis: every model claim stress-tested against identification and generalization
A “checklist for mechanistic generalization claims” to carry into research
Open problems and where the field is headed
Student presentations LLMs and AI agents for causal reasoning in biology (2 papers)

Jun 3

Final project presentations

Short talks or poster session (TBD)

Week 11: Final Report Submission

Jun 8

Final project report due

8 page report (plus references) including a “generalization and identification contract” section