Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

Functional Coverage of the Human Genome by Existing Structures, Structural Genomics Targets, and Homology Models

PLoS Computational Biology(2005)10.1371/journal.pcbi.0010031Source: DataRank Database

Functional Coverage of the Human Genome by Existing Structures, Structural Genomics Targets, and Homology Models is a dataset published in PLoS Computational Biology (2005). On theSindex it has a DataRank of 3.5, placing it in the top 31.5% of the data-sharing corpus. It has been cited 65 times, with 58 citing works in its 1-hop citation network. Its calibrated FAIR score is 43/100.

Top 31%percentile

3.5DataRank

3.5Top 31%

Dataset Open Access65 citations · base score 4.2

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

The bias in protein structure and function space resulting from experimental limitations and targeting of particular functional classes of proteins by structural biologists has long been recognized, but never continuously quantified. Using the Enzyme Commission and the Gene Ontology classifications as a reference frame, and integrating structure data from the Protein Data Bank (PDB), target sequences from the structural genomics projects, structure homology derived from the SUPERFAMILY database, and genome annotations from Ensembl and NCBI, we provide a quantified view, both at the domain and whole-protein levels, of the current and projected coverage of protein structure and function space relative to the human genome. Protein structures currently provide at least one domain that covers 37% of the functional classes identified in the genome; whole structure coverage exists for 25% of the genome. If all the structural genomics targets were solved (twice the current number of structures in the PDB), it is estimated that structures of one domain would cover 69% of the functional classes identified and complete structure coverage would be 44%. Homology models from existing experimental structures extend the 37% coverage to 56% of the genome as single domains and 25% to 31% for complete structures. Coverage from homology models is not evenly distributed by protein family, reflecting differing degrees of sequence and structure divergence within families. While these data provide coverage, conversely, they also systematically highlight functional classes of proteins for which structures should be determined. Current key functional families without structure representation are highlighted here; updated information on the "most wanted list" that should be solved is available on a weekly basis from http://function.rcsb.org:8080/pdb/function_distribution/index.html.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (1/2)

Open Access

Interoperable (0/2)

Reusable (1/3)

Dataset classification

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

43FAIR score

F Findable

A Accessible

I Interoperable

R Reusable

Top 79% by FAIRLLM-assessed✓ full text read

Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →

DataRank Breakdown

Base Score 18%Citation Network 82%

Base Score Contribution

0.628

From this paper's citation signal

Citation Network Contribution

2.8

From 54 citing papers with measurable signal

Learn more about DataRank methodology →

Top 5 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets
Nucleic Acids Research201819,062 citationsDataRank 13.8Top 14%
Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. Cohen
Journal of Molecular Biology200112,937 citationsDataRank 1.4
Structure-based systems biology for analyzing off-target binding
Current Opinion in Structural Biology2011152 citationsDataRank 5.4
A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery
Bioinformatics200996 citationsDataRank 3.4
Structural Evolution of the Protein Kinase-Like Superfamily
PLoS Computational Biology20053 citationsDataRank 0.208

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 18% comes from its base citations and 82% from the citation network (54 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank