About DataRank
The metric data sharing has been waiting for
DataRank measures the impact of a scientific paper by combining its own citation count with the citations of the papers that cite it. FAIR and DataCite metadata sit alongside each score for context β they aren't baked into the number.
Multi-source corpus
NIH-funded papers, enriched with OpenAlex citation data and DataCite/FAIR repository signals for context.
DataRank engine
Each paper's own citations plus a one-step propagation through the papers that cite it. Heavily-cited citers carry more weight.
Percentile ranking
Data papers mapped to a 0β100 percentile against the rest of the data-paper corpus. The 99th percentile is the top 1%.
The pipeline
From DOI to DataRank
Six steps from raw DOI metadata to a percentile-ranked score.
Aggregate metadata β π¬ DOIphin
DOIphin, our federated aggregator, cross-walks each DOI across 14+ scholarly APIs (CrossRef, OpenAlex, DataCite, Zenodo, Dryad, and more) into one unified record β and builds the citation/link graph DataRank scores. FAIR/DataCite signals are kept as context.
How DOIphin worksIdentify data papers (DrPaper)
Our DrPaper classifier β a fine-tuned SciBERT model β reads each paper and decides whether its main contribution is a dataset (cohort, atlas, benchmark, database) versus a method, theory, or review. Only data papers receive a percentile ranking.
How DrPaper worksFAIR checklist
Each paper is evaluated against the FAIR principles (Findable, Accessible, Interoperable, Reusable). Shown alongside scores for transparency β not part of the score itself.
Build the citation graph
For each paper we fetch its citers from OpenAlex β the papers that cite it β to measure downstream influence.
Compute DataRank
Combine the paper's own citation count with a one-step propagation through its citers, weighted so heavily-cited citers count more. Self-citations are removed.
Rank as a percentile
Sort data papers by DataRank and map each to a percentile in [0, 100] within the data-paper corpus. A paper at the 99th percentile is in the top 1% by impact.
Go deeper
Explore the methodology, meet the team, or browse the open-source tools we've built.
Methodology
Full technical details on the citation-only 1-hop scoring system (v6.0) and percentile ranking.
Read moreTeam
Meet the computational scientists and open-science advocates behind DataRank.
Meet the teamResources
Open-source notebooks, datasets, APIs, and models β all freely available.
View artifacts