Chronosight
← Back to Case Studies
Case Study · Nature · September 2025

Delphi-2M: Learning the natural history of human disease with generative transformers

Nature Volume 647, pages 248–256 Published September 2025
Read the full article at Nature →
0.76 Avg AUC (internal)
1,000+ Diseases modelled
1.9M Validation cohort
20yr Trajectory horizon

Summary

Decision-making in healthcare relies on understanding patients' past and current health states to predict and, ultimately, change their future course. In this study, the authors modified the GPT (generative pretrained transformer) architecture to model the progression and competing nature of human diseases.

The resulting model, Delphi-2M, was trained on data from 0.4 million UK Biobank participants and validated on 1.9 million Danish individuals with no change in parameters. Delphi-2M predicts the rates of more than 1,000 diseases conditional on each individual's past disease history, with accuracy comparable to existing single-disease models.

Its generative nature enables sampling of synthetic future health trajectories, providing meaningful estimates of potential disease burden for up to 20 years. Explainable AI methods reveal clusters of co-morbidities and time-dependent consequences on future health, while also highlighting biases learnt from training data.

Main findings

Key figures from the study

The following summaries describe the main figures in the Nature paper. View the full figures and extended data in the original article.

Figure 1 — Delphi model architecture

Schematic of health trajectories (ICD-10 diagnoses, lifestyle and padding tokens at distinct ages), data splits (UK Biobank and Danish registries), and the modified GPT-2 architecture with age encoding, causal attention, and an exponential waiting-time head. Includes scaling laws and ablation results showing the contribution of architectural changes.

View Figure 1 in Nature →

Figure 2 — Disease rate predictions

Predicted rates for nine exemplary diagnoses and death as a function of age; comparison with sex- and age-stratified incidence; average AUC by training occurrences and by ICD-10 chapter; ROC curves vs clinical and ML comparators; and comparison with MILTON biomarker-based model.

View Figure 2 in Nature →

Figure 3 — Generative modelling

Design for simulating trajectories from age 60; modelled vs observed disease rates at 70–75 years; fraction of correctly predicted diagnoses over time; simulated vs observed fold changes for smoking, alcohol and BMI; and AUC of models trained on synthetic vs real data.

View Figure 3 in Nature →

Figure 4 — Explainable AI

UMAP projection of token embeddings (diseases cluster by ICD-10 chapter); SHAP contributions for individual trajectories (e.g. pancreatic cancer risk and mortality); SHAP effect matrix across diseases and chapters; and rate of mortality over time after selected diagnoses.

View Figure 4 in Nature →

Figure 5 — External validation and bias

AUC comparison between UK Biobank and Danish data; mortality estimates vs ONS national data (immortality bias); data source distribution and missingness; SHAP matrix by dominating data source showing learned biases (e.g. hospital-record exclusivity).

View Figure 5 in Nature →

Global Media Coverage

Altmetric Score 1,714 50+ Confirmed Outlets 6+ Languages 5 Continents

The paper attracted worldwide press attention on publication day and beyond — covered by institutional releases from DKFZ, EMBL and UK Biobank, Nature News & Podcast, Scientific American, Handelsblatt, Videnskab.dk, and 40+ further outlets across six languages.

View Full Media Coverage → Nature News Article →

Citation

Shmatko, A., Jung, A.W., Gaurav, K. et al. Learning the natural history of human disease with generative transformers. Nature 647, 248–256 (2025). https://doi.org/10.1038/s41586-025-09529-3