Apples-to-Apples: Age-Sex Standardisation of Public Chest X-ray Datasets is a dataset published in Cureus (2025). On theSindex it has a DataRank of 0, placing it in the top 100% of the data-sharing corpus. Its calibrated FAIR score is 41/100.
Background Public chest radiograph datasets are widely used for model development and benchmarking, but differences in patient demographics can inflate apparent between-dataset differences in disease label prevalence. Objective To quantify the proportion of NIH ChestX-ray14 versus CheXpert prevalence differences that is explained by age and sex alone. Methods A cross-sectional analysis of NIH ChestX-ray14 (n=112,120 studies) and CheXpert (n=223,413) databases was performed. Sex was harmonised to Male/Female and age was categorised as 0-17, 18-39, 40-59, 60-79, and ≥80 years. Five shared labels were assessed: consolidation, atelectasis, pleural effusion, edema, and cardiomegaly. For CheXpert, label uncertainty (-1) was treated as negative in the primary analysis. For each label, we calculated crude prevalence with Wilson 95% confidence intervals and compared datasets using a two-proportion z-test. We then performed direct standardisation by reweighting CheXpert age-sex strata to the NIH age-sex distribution and reported the reduction in the crude prevalence gap attributable to age-sex adjustment. Results Crude prevalence was higher in CheXpert than NIH for all labels (all p<0.001). After age-sex standardisation, CheXpert prevalence decreased for every label, indicating that demographics account for a substantial share of between-dataset differences. For consolidation, the crude gap of 1.96 percentage points (6.12% vs 4.16%) decreased to a standardised gap of 1.47 percentage points (CheXpert standardised 5.63% vs NIH 4.16%), representing approximately a 25% reduction. For atelectasis, the gap declined from 4.85 to 2.84 percentage points (41% reduction approx.). For pleural effusion, the gap declined from 28.10 to 19.03 percentage points (32% reduction approx.). For edema, the gap declined from 21.70 to 14.78 percentage points (32% reduction approx.). For cardiomegaly, the gap declined from 9.45 to 6.55 percentage points (31% reduction approx.). Across labels, age-sex standardisation explained approximately 25% to 40% of the crude prevalence differences. Conclusion A simple age-sex standardisation step explains a large proportion of apparent label prevalence differences between NIH ChestX-ray14 and CheXpert. Routine reporting of standardised prevalence alongside crude estimates and demographic composition can improve fairness and interpretability in cross-dataset benchmarking and reduce the risk of attributing demographic composition effects to labelling or model performance.
FAIR checklist signals are shown for context only and do not affect DataRank scoring.
Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →