🏆 Finalist — NIH Data Sharing Index (“S-Index”) Challenge
Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models

Nucleic Acids Research(2021)10.1093/nar/gkab1061Source: DataRank Database

AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models is a dataset published in Nucleic Acids Research (2021). On theSindex it has a DataRank of 14.9, placing it in the top 12.9% of the data-sharing corpus. It has been cited 8,161 times, with 191 citing works in its 1-hop citation network. Its calibrated FAIR score is 58/100.

Top 13%percentile
14.9DataRank
14.9Top 13%
Dataset Open Access8161 citations · base score 9.0
Cite:
datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

The AlphaFold Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) is an openly accessible, extensive database of high-accuracy protein-structure predictions. Powered by AlphaFold v2.0 of DeepMind, it has enabled an unprecedented expansion of the structural coverage of the known protein-sequence space. AlphaFold DB provides programmatic access to and interactive visualization of predicted atomic coordinates, per-residue and pairwise model-confidence estimates and predicted aligned errors. The initial release of AlphaFold DB contains over 360,000 predicted structures across 21 model-organism proteomes, which will soon be expanded to cover most of the (over 100 million) representative sequences from the UniRef90 data set.

Data sources & pipeline
Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring
Enrichment:Pending

FAIR Checklist

Context only (not used in score)
Findable (1/2)
  • Has DOI
Accessible (1/2)
  • Open Access
Interoperable (0/2)
    Reusable (1/3)
    • Dataset classification

    FAIR checklist signals are shown for context only and do not affect DataRank scoring.

    58FAIR score
    F Findable
    65
    A Accessible
    80
    I Interoperable
    38
    R Reusable
    50
    Top 8% by FAIRLLM-assessed✓ full text read

    Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →

    DataRank Breakdown

    Base Score 9%Citation Network 91%

    Base Score Contribution

    1.3

    From this paper's citation signal

    Citation Network Contribution

    13.5

    From 191 citing papers with measurable signal

    Learn more about DataRank methodology →

    Top 5 citers driving the network score

    Ranked by citation count — the same ordering the engine uses when summing log1p(Cq) over citers.

    1. Ensembl 2023
      Nucleic Acids Research20231,082 citationsDataRank 7.6Top 24%
    2. A partnership between the lipid scramblase XK and the lipid transfer protein VPS13A at the plasma membrane
      Proceedings of the National Academy of Sciences202274 citationsDataRank 2.2
    3. Protein amyloid aggregate: Structure and function
      Aggregate202348 citationsDataRank 1.3
    Why this DataRank?

    DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 9% comes from its base citations and 91% from the citation network (191 citing papers contributed measurable signal).

    Base score B(p)
    log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
    Network N(p)
    Σ over citers of log1p(Cq) ÷ max(outdegreeq, 1). Being cited by a highly-cited paper with few references counts most.
    Damping factor d = 0.85
    DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
    Self-citations excluded
    Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

    Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

    Read the full methodology →

    Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

    Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank

    Authors (28)