🏆 Finalist — NIH Data Sharing Index (“S-Index”) Challenge
Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

The PRIDE database and related tools and resources in 2019: improving support for quantification data

Nucleic Acids Research(2018)10.1093/nar/gky1106Source: DataRank Database

The PRIDE database and related tools and resources in 2019: improving support for quantification data is a dataset published in Nucleic Acids Research (2018). On theSindex it has a DataRank of 15.0, placing it in the top 12.5% of the data-sharing corpus. It has been cited 7,380 times, with 188 citing works in its 1-hop citation network. Its calibrated FAIR score is 37/100.

Top 13%percentile
15.0DataRank
15.0Top 13%
Dataset Open Access7380 citations · base score 8.9
Cite:
datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data, and is one of the founding members of the global ProteomeXchange (PX) consortium. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2016. In the last 3 years, public data sharing through PRIDE (as part of PX) has definitely become the norm in the field. In parallel, data re-use of public proteomics data has increased enormously, with multiple applications. We first describe the new architecture of PRIDE Archive, the archival component of PRIDE. PRIDE Archive and the related data submission framework have been further developed to support the increase in submitted data volumes and additional data types. A new scalable and fault tolerant storage backend, Application Programming Interface and web interface have been implemented, as a part of an ongoing process. Additionally, we emphasize the improved support for quantitative proteomics data through the mzTab format. At last, we outline key statistics on the current data contents and volume of downloads, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas.

Data sources & pipeline
Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring
Enrichment:Pending

FAIR Checklist

Context only (not used in score)
Findable (1/2)
  • Has DOI
Accessible (1/2)
  • Open Access
Interoperable (0/2)
    Reusable (1/3)
    • Dataset classification

    FAIR checklist signals are shown for context only and do not affect DataRank scoring.

    37FAIR score
    F Findable
    53
    A Accessible
    55
    I Interoperable
    25
    R Reusable
    17
    Top 81% by FAIRLLM-assessed✓ full text read

    Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →

    DataRank Breakdown

    Base Score 9%Citation Network 91%

    Base Score Contribution

    1.3

    From this paper's citation signal

    Citation Network Contribution

    13.7

    From 188 citing papers with measurable signal

    Learn more about DataRank methodology →

    Top 5 citers driving the network score

    Ranked by citation count — the same ordering the engine uses when summing log1p(Cq) over citers.

    1. GENCODE 2021
      Nucleic Acids Research20201,452 citationsDataRank 9.7Top 22%
    2. The proteome landscape of the kingdoms of life
      Nature2020222 citationsDataRank 5.8Top 27%
    3. A high-stringency blueprint of the human proteome
      Nature Communications2020217 citationsDataRank 4.3Top 28%
    Why this DataRank?

    DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 9% comes from its base citations and 91% from the citation network (188 citing papers contributed measurable signal).

    Base score B(p)
    log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
    Network N(p)
    Σ over citers of log1p(Cq) ÷ max(outdegreeq, 1). Being cited by a highly-cited paper with few references counts most.
    Damping factor d = 0.85
    DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
    Self-citations excluded
    Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

    Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

    Read the full methodology →

    Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

    Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank

    Authors (26)

    Attila Csordas,Jingwen Bai,Manuel Bernal-Llinares,Suresh HewapathiranaORCID,Deepti J KunduORCID

    Related Papers (10)

    Nucleic Acids Research(2019)
    co-citedsame journal
    10.1093/nar/gky1049
    Bioinformatics(2013)
    co-cited
    10.1093/bioinformatics/bts635
    Array programming with NumPy
    N/A
    1.5DataRank · unranked
    Nature(2020)
    co-cited
    10.1038/s41586-020-2649-2