πŸ† Finalist β€” NIH Data Sharing Index (β€œS-Index”) Challenge

About DataRank

The metric data sharing has been waiting for

DataRank measures the impact of a scientific paper by combining its own citation count with the citations of the papers that cite it. FAIR and DataCite metadata sit alongside each score for context β€” they aren't baked into the number.

01

Multi-source corpus

NIH-funded papers, enriched with OpenAlex citation data and DataCite/FAIR repository signals for context.

02

DataRank engine

Each paper's own citations plus a one-step propagation through the papers that cite it. Heavily-cited citers carry more weight.

03

Percentile ranking

Data papers mapped to a 0–100 percentile against the rest of the data-paper corpus. The 99th percentile is the top 1%.

The pipeline

From DOI to DataRank

Six steps from raw DOI metadata to a percentile-ranked score.

01

Aggregate metadata β€” 🐬 DOIphin

DOIphin, our federated aggregator, cross-walks each DOI across 14+ scholarly APIs (CrossRef, OpenAlex, DataCite, Zenodo, Dryad, and more) into one unified record β€” and builds the citation/link graph DataRank scores. FAIR/DataCite signals are kept as context.

How DOIphin works
02

Identify data papers (DrPaper)

Our DrPaper classifier β€” a fine-tuned SciBERT model β€” reads each paper and decides whether its main contribution is a dataset (cohort, atlas, benchmark, database) versus a method, theory, or review. Only data papers receive a percentile ranking.

How DrPaper works
03

FAIR checklist

Each paper is evaluated against the FAIR principles (Findable, Accessible, Interoperable, Reusable). Shown alongside scores for transparency β€” not part of the score itself.

04

Build the citation graph

For each paper we fetch its citers from OpenAlex β€” the papers that cite it β€” to measure downstream influence.

05

Compute DataRank

Combine the paper's own citation count with a one-step propagation through its citers, weighted so heavily-cited citers count more. Self-citations are removed.

06

Rank as a percentile

Sort data papers by DataRank and map each to a percentile in [0, 100] within the data-paper corpus. A paper at the 99th percentile is in the top 1% by impact.