Open Source
Resources & Artifacts
Every component of the DataRank pipeline is open. Explore our notebooks, datasets, APIs, and models β all freely available for research and educational use.
4
Open-source artifacts
4,000+
Labeled papers
14
Metadata sources
MIT
License
Data Paper Classification Notebook
SciBERT fine-tuning notebook for data paper classification. Trains a binary classifier to distinguish data papers from regular publications.
4K Papers Labeled Dataset
4,000+ papers labeled as data paper or not, used for SciBERT training. Curated from GigaScience, Dryad, and PubMed sources.
DOI Metadata API
Open-source API to fetch metadata from 14 sources (CrossRef, OpenAlex, DataCite, Zenodo, Dryad, and more) given a DOI.
SciBERT Data Paper Classifier
Fine-tuned SciBERT model for binary classification of data papers. Achieves high precision on the 4K labeled dataset.
How to cite
If you use any of these resources in your research, please cite our work.
@misc{thesindex2026,
title = {DataRank v4.0: Citation-Only 1-Hop Scoring for Scholarly Influence},
author = {Korkusuz, Zehra and Huang, Kuan-lin and Edmunds, Scott C.},
year = {2026},
url = {https://thesindex.org}
}Build with DataRank
Use our open API, datasets, and models to power your own research tools and analyses.