Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

UniProt: a hub for protein information

Nucleic Acids Research(2014)10.1093/nar/gku989Source: DataRank Database

UniProt: a hub for protein information is a dataset published in Nucleic Acids Research (2014). On theSindex it has a DataRank of 23.2, placing it in the top 2.9% of the data-sharing corpus. It has been cited 5,274 times, with 200 citing works in its 1-hop citation network. Its calibrated FAIR score is 59/100.

Top 3%percentile

23.2DataRank

23.2Top 3%

Dataset Open Access5274 citations · base score 8.6

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

UniProt is an important collection of protein sequences and their annotations, which has doubled in size to 80 million sequences during the past year. This growth in sequences has prompted an extension of UniProt accession number space from 6 to 10 characters. An increasing fraction of new sequences are identical to a sequence that already exists in the database with the majority of sequences coming from genome sequencing projects. We have created a new proteome identifier that uniquely identifies a particular assembly of a species and strain or subspecies to help users track the provenance of sequences. We present a new website that has been designed using a user-experience design process. We have introduced an annotation score for all entries in UniProt to represent the relative amount of knowledge known about each protein. These scores will be helpful in identifying which proteins are the best characterized and most informative for comparative analysis. All UniProt data is provided freely and is available on the web at http://www.uniprot.org/.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (1/2)

Open Access

Interoperable (0/2)

Reusable (1/3)

Dataset classification

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

59FAIR score

F Findable

A Accessible

I Interoperable

R Reusable

Top 8% by FAIRLLM-assessed✓ full text read

Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →

DataRank Breakdown

Base Score 6%Citation Network 94%

Base Score Contribution

1.3

From this paper's citation signal

Citation Network Contribution

21.9

From 200 citing papers with measurable signal

Learn more about DataRank methodology →

Top 5 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

The FAIR Guiding Principles for scientific data management and stewardship
Scientific Data201617,221 citationsDataRank 1.5
The Perseus computational platform for comprehensive analysis of (prote)omics data
Nature Methods20168,788 citationsDataRank 12.5Top 16%
KEGG as a reference resource for gene and protein annotation
Nucleic Acids Research20157,796 citationsDataRank 15.9Top 11%
NCBI prokaryotic genome annotation pipeline
Nucleic Acids Research20166,925 citationsDataRank 14.6Top 13%
A Tissue-Mapped Axolotl De Novo Transcriptome Enables Identification of Limb Regeneration Factors
Cell Reports20171,085 citationsDataRank 8.1Top 25%

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 6% comes from its base citations and 94% from the citation network (200 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank