🏆 Finalist — NIH Data Sharing Index (“S-Index”) Challenge
Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

MIMIC-IV, a freely accessible electronic health record dataset

Scientific Data(2023)10.1038/s41597-022-01899-xSource: DataRank Database

MIMIC-IV, a freely accessible electronic health record dataset is a dataset published in Scientific Data (2023). On theSindex it has a DataRank of 13.0, placing it in the top 15.4% of the data-sharing corpus. It has been cited 2,640 times, with 190 citing works in its 1-hop citation network. Its calibrated FAIR score is 59/100.

Top 15%percentile
13.0DataRank
13.0Top 15%
Dataset Open Access2640 citations · base score 7.7
Cite:
datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

Digital data collection during routine clinical practice is now ubiquitous within hospitals. The data contains valuable information on the care of patients and their response to treatments, offering exciting opportunities for research. Typically, data are stored within archival systems that are not intended to support research. These systems are often inaccessible to researchers and structured for optimal storage, rather than interpretability and analysis. Here we present MIMIC-IV, a publicly available database sourced from the electronic health record of the Beth Israel Deaconess Medical Center. Information available includes patient measurements, orders, diagnoses, procedures, treatments, and deidentified free-text clinical notes. MIMIC-IV is intended to support a wide array of research studies and educational material, helping to reduce barriers to conducting clinical research.

Data sources & pipeline
Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring
Enrichment:Pending

FAIR Checklist

Context only (not used in score)
Findable (2/2)
  • Has DOI
  • Indexed in repositories
Accessible (1/2)
  • Open Access
Interoperable (2/2)
  • DataCite relations
  • Linked datasets
Reusable (1/3)
  • Dataset classification

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

59FAIR score
F Findable
53
A Accessible
68
I Interoperable
75
R Reusable
42
Top 8% by FAIRLLM-assessed✓ full text read

Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →

DataRank Breakdown

Base Score 9%Citation Network 91%

Base Score Contribution

1.2

From this paper's citation signal

Citation Network Contribution

11.9

From 190 citing papers with measurable signal

Learn more about DataRank methodology →

Top 5 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(Cq) over citers.

  1. MIMIC-III, a freely accessible critical care database
    Scientific Data20168,052 citationsDataRank 18.2Top 8%
  2. Best Practices for Scientific Computing
    PLoS Biology2014710 citationsDataRank 0.985
  3. The MIMIC Code Repository: enabling reproducibility in critical care research
    Journal of the American Medical Informatics Association2017454 citationsDataRank 13.7Top 15%
  4. Automated de-identification of free-text medical records
    BMC Medical Informatics and Decision Making2008417 citationsDataRank 0.905
Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 9% comes from its base citations and 91% from the citation network (190 citing papers contributed measurable signal).

Base score B(p)
log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p)
Σ over citers of log1p(Cq) ÷ max(outdegreeq, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85
DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded
Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank

Authors (13)

Lucas BulgarelliORCID,Lu Shen,Alvin Gayles,Ayad Shammout,Steven HorngORCID