πŸ† Finalist β€” NIH Data Sharing Index (β€œS-Index”) Challenge

Open Source

Resources & Artifacts

Every component of the DataRank pipeline is open. Explore our notebooks, datasets, APIs, and models β€” all freely available for research and educational use.

4

Open-source artifacts

4,000+

Labeled papers

14

Metadata sources

MIT

License

Notebook

Data Paper Classification Notebook

SciBERT fine-tuning notebook for data paper classification. Trains a binary classifier to distinguish data papers from regular publications.

Dataset

4K Papers Labeled Dataset

4,000+ papers labeled as data paper or not, used for SciBERT training. Curated from GigaScience, Dryad, and PubMed sources.

API

DOI Metadata API

Open-source API to fetch metadata from 14 sources (CrossRef, OpenAlex, DataCite, Zenodo, Dryad, and more) given a DOI.

Model

SciBERT Data Paper Classifier

Fine-tuned SciBERT model for binary classification of data papers. Achieves high precision on the 4K labeled dataset.

How to cite

If you use any of these resources in your research, please cite our work.

@misc{thesindex2026,
  title   = {DataRank v4.0: Citation-Only 1-Hop Scoring for Scholarly Influence},
  author  = {Korkusuz, Zehra and Huang, Kuan-lin and Edmunds, Scott C.},
  year    = {2026},
  url     = {https://thesindex.org}
}

Build with DataRank

Use our open API, datasets, and models to power your own research tools and analyses.