Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

Deep generative models of protein structure uncover distant relationships across a continuous fold space

Nature Communications(2024)10.1038/s41467-024-52020-2Source: DataRank Database

Deep generative models of protein structure uncover distant relationships across a continuous fold space is a research paper published in Nature Communications (2024). On theSindex it has a DataRank of 0.544. It has been cited 11 times, with 10 citing works in its 1-hop citation network. Its calibrated FAIR score is 55/100.

N/A

0.544DataRank · unranked

0.544

Open Access11 citations · base score 2.5

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

Our views of fold space implicitly rest upon many assumptions that impact how we analyze, interpret and understand protein structure, function and evolution. For instance, is there an optimal granularity in viewing protein structural similarities (e.g., architecture, topology or some other level)? Similarly, the discrete/continuous dichotomy of fold space is central, but remains unresolved. Discrete views of fold space bin similar folds into distinct, non-overlapping groups; unfortunately, such binning can miss remote relationships. While hierarchical systems like CATH are indispensable resources, less heuristic and more conceptually flexible approaches could enable more nuanced explorations of fold space. Building upon an Urfold model of protein structure, here we present a deep generative modeling framework, termed DeepUrfold, for analyzing protein relationships at scale. DeepUrfold's learned embeddings occupy high-dimensional latent spaces that can be distilled for a given protein in terms of an amalgamated representation uniting sequence, structure and biophysical properties. This approach is structure-guided, versus being purely structure-based, and DeepUrfold learns representations that, in a sense, define superfamilies. Deploying DeepUrfold with CATH reveals evolutionarily-remote relationships that evade existing methodologies, and suggests a mostly-continuous view of fold space-a view that extends beyond simple geometric similarity, towards the realm of integrated sequence ↔ structure ↔ function properties.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (1/2)

Open Access

Interoperable (0/2)

Reusable (0/3)

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

55FAIR score

F Findable

100

A Accessible

I Interoperable

R Reusable

Top 20% by FAIRdeterministic✓ full text read

Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →

DataRank Breakdown

Base Score 69%Citation Network 31%

Base Score Contribution

0.373

From this paper's citation signal

Citation Network Contribution

0.171

From 6 citing papers with measurable signal

Learn more about DataRank methodology →

Top 5 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

Deep learning
Nature201579,092 citationsDataRank 1.7
MUSCLE: multiple sequence alignment with high accuracy and high throughput
Nucleic Acids Research200446,160 citationsDataRank 1.6
Search and clustering orders of magnitude faster than BLAST
Bioinformatics201021,556 citationsDataRank 1.5
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences
Proceedings of the National Academy of Sciences20212,999 citationsDataRank 1.2
The curse of the protein ribbon diagram
PLOS Biology20227 citationsDataRank 0.340

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 69% comes from its base citations and 31% from the citation network (6 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank