Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison

Proceedings of the AAAI Conference on Artificial Intelligence(2019)10.1609/aaai.v33i01.3301590Source: DataRank Database

CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison is a dataset published in Proceedings of the AAAI Conference on Artificial Intelligence (2019). On theSindex it has a DataRank of 13.1, placing it in the top 15.3% of the data-sharing corpus. It has been cited 1,710 times, with 189 citing works in its 1-hop citation network. Its calibrated FAIR score is 40/100.

Top 15%percentile

13.1DataRank

13.1Top 15%

Dataset Open Access1710 citations · base score 6.0

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

Large, labeled datasets have driven deep learning methods to achieve expert-level performance on a variety of medical imaging tasks. We present CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. We design a labeler to automatically detect the presence of 14 observations in radiology reports, capturing uncertainties inherent in radiograph interpretation. We investigate different approaches to using the uncertainty labels for training convolutional neural networks that output the probability of these observations given the available frontal and lateral radiographs. On a validation set of 200 chest radiographic studies which were manually annotated by 3 board-certified radiologists, we find that different uncertainty approaches are useful for different pathologies. We then evaluate our best model on a test set composed of 500 chest radiographic studies annotated by a consensus of 5 board-certified radiologists, and compare the performance of our model to that of 3 additional radiologists in the detection of 5 selected pathologies. On Cardiomegaly, Edema, and Pleural Effusion, the model ROC and PR curves lie above all 3 radiologist operating points. We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (1/2)

Open Access

Interoperable (0/2)

Reusable (1/3)

Dataset classification

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

40FAIR score

F Findable

A Accessible

I Interoperable

R Reusable

Top 80% by FAIRLLM-assessed✓ full text read

Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →

DataRank Breakdown

Base Score 7%Citation Network 93%

Base Score Contribution

0.896

From this paper's citation signal

Citation Network Contribution

12.2

From 189 citing papers with measurable signal

Learn more about DataRank methodology →

Top 5 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

Ultra-lightweight uncertainty-aware ensemble for large-scale multi-class medical MRI diagnosis
Frontiers in Radiology20252 citationsDataRank 0.165
Generalized Nesterov-Boosted Adversarial Data Augmentation Framework for Multi-Label Chest X-Ray Image Classification
2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)20250 citationsDataRank 0
DeViDe: Faceted Medical Knowledge to Enhance Vision Foundation Model Pretraining for Radiology
2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)20250 citationsDataRank 0
Classifying Pulmonary Diseases Using Small Multimodal Model on X-Ray Images and Reports
2025 RIVF International Conference on Computing and Communication Technologies (RIVF)20250 citationsDataRank 0
An Interpretable Chest X-ray Classification Framework Using Prototype Memory and Counterfactual Consistency
Cureus20260 citationsDataRank 0

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 7% comes from its base citations and 93% from the citation network (189 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank