πŸ† Finalist β€” NIH Data Sharing Index (β€œS-Index”) Challenge

About DataRank

The metric data sharing has been waiting for

theSindex.org computes DataRank scores for scientific papers using a citation-only 1-hop model built from seed citation counts and citer citation-network signal. FAIR and DataCite metadata are preserved for context and transparency.

01

Multi-Source Dataset

Papers from the NIH DOI metadata repository, enriched with OpenAlex citation graphs plus FAIR/DataCite context metadata.

02

DataRank Engine

Citation-only 1-hop approximation with damping d=0.85: base score from seed citations plus propagated citer citation signal.

03

Percentile Ranking

Papers are ranked by DataRank percentile and placed into five tiers (S1–S5), giving researchers, funders, and institutions a clear benchmark.

The pipeline

From DOI to DataRank

Five stages transform raw metadata into a percentile-ranked citation-network score with explicit FAIR/DataCite context.

01

Data Ingestion

We parse DOI metadata and enrich each paper with DataCite/FAIR/repository context for interpretation and UI display.

02

FAIR Checklist

Each paper is evaluated against a FAIR checklist for context. FAIR signals are surfaced to users but are not used in v4 scoring.

03

Citation Graph

We fetch the citer neighbourhood from OpenAlex, building a directed graph that captures the flow of scholarly influence.

04

DataRank Computation

Citation-only 1-hop model: DataRank(p)=(1-d)*log1p(Cp)+d*Ξ£[log1p(Cq)/outdeg(q)], with d=0.85 and self-citation filtering.

05

Percentile & Ranking

Papers are ranked by their DataRank percentile and assigned to tiers (S1–S5) β€” placing each paper in context against all others in the corpus.