Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

Apples-to-Apples: Age-Sex Standardisation of Public Chest X-ray Datasets

Cureus(2025)10.7759/cureus.97260Source: DataRank Database

Apples-to-Apples: Age-Sex Standardisation of Public Chest X-ray Datasets is a dataset published in Cureus (2025). On theSindex it has a DataRank of 0, placing it in the top 100% of the data-sharing corpus. Its calibrated FAIR score is 41/100.

Top 100%percentile

0DataRank

0Top 100%

Dataset Open Access0 citations · base score 0

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

Background Public chest radiograph datasets are widely used for model development and benchmarking, but differences in patient demographics can inflate apparent between-dataset differences in disease label prevalence. Objective To quantify the proportion of NIH ChestX-ray14 versus CheXpert prevalence differences that is explained by age and sex alone. Methods A cross-sectional analysis of NIH ChestX-ray14 (n=112,120 studies) and CheXpert (n=223,413) databases was performed. Sex was harmonised to Male/Female and age was categorised as 0-17, 18-39, 40-59, 60-79, and ≥80 years. Five shared labels were assessed: consolidation, atelectasis, pleural effusion, edema, and cardiomegaly. For CheXpert, label uncertainty (-1) was treated as negative in the primary analysis. For each label, we calculated crude prevalence with Wilson 95% confidence intervals and compared datasets using a two-proportion z-test. We then performed direct standardisation by reweighting CheXpert age-sex strata to the NIH age-sex distribution and reported the reduction in the crude prevalence gap attributable to age-sex adjustment. Results Crude prevalence was higher in CheXpert than NIH for all labels (all p<0.001). After age-sex standardisation, CheXpert prevalence decreased for every label, indicating that demographics account for a substantial share of between-dataset differences. For consolidation, the crude gap of 1.96 percentage points (6.12% vs 4.16%) decreased to a standardised gap of 1.47 percentage points (CheXpert standardised 5.63% vs NIH 4.16%), representing approximately a 25% reduction. For atelectasis, the gap declined from 4.85 to 2.84 percentage points (41% reduction approx.). For pleural effusion, the gap declined from 28.10 to 19.03 percentage points (32% reduction approx.). For edema, the gap declined from 21.70 to 14.78 percentage points (32% reduction approx.). For cardiomegaly, the gap declined from 9.45 to 6.55 percentage points (31% reduction approx.). Across labels, age-sex standardisation explained approximately 25% to 40% of the crude prevalence differences. Conclusion A simple age-sex standardisation step explains a large proportion of apparent label prevalence differences between NIH ChestX-ray14 and CheXpert. Routine reporting of standardised prevalence alongside crude estimates and demographic composition can improve fairness and interpretability in cross-dataset benchmarking and reduce the risk of attributing demographic composition effects to labelling or model performance.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (1/2)

Open Access

Interoperable (0/2)

Reusable (1/3)

Dataset classification

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

41FAIR score

F Findable

A Accessible

I Interoperable

R Reusable

Top 79% by FAIRLLM-assessed✓ full text read

Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →

Authors (5)

Maiar ElhariryORCID,Amrit Chirrimar,Ashrit Chohan,Ahmed BadawyORCID,Amr Badawy