Deep Generative Models of Protein Structure Uncover Distant Relationships Across a Continuous Fold Space is a research paper (2022). On theSindex it has a DataRank of 0.512. It has been cited 8 times, with 4 citing works in its 1-hop citation network.
Our views of fold space implicitly rest upon many assumptions that impact how we analyze, interpret and understand biological systems—from protein structure comparison and classification to function prediction and evolutionary analyses. For instance, is there an optimal granularity at which to view protein structural similarities (e.g., architecture, topology or some other level)? If so, how does it vary with the type of question being asked? Similarly, the discrete/continuous dichotomy of fold space is central in structural bioinformatics, but remains unresolved. Discrete views of fold space bin 'similar' folds into distinct, non-overlapping groups; unfortunately, such binning may inherently miss many remote relationships. While hierarchical systems like CATH, SCOP and ECOD represent major steps forward in protein classification, a scalable, objective and conceptually flexible method, with less reliance on assumptions and heuristics, could enable a more systematic and nuanced exploration of fold space, particularly as regards evolutionarily-distant relationships. Building upon a recent 'Urfold' model of protein structure, we have developed a new approach to analyze protein interrelationships. This framework, termed 'DeepUrfold', is rooted in deep generative modeling via variational Bayesian inference, and we find it to be useful for comparative analysis across the protein universe. Critically, DeepUrfold leverages its deep generative model's learned embeddings, which occupy high-dimensional latent spaces and can be distilled for a given protein in terms of an amalgamated representation that unites sequence, structure, biophysical and phylogenetic properties. Notably, DeepUrfold is structure- guided , versus being purely structure-based, and its architecture allows each trained model to learn protein features (structural and otherwise) that, in a sense, 'define' different superfamilies. Deploying DeepUrfold with CATH suggests a new, mostly-continuous view of fold space—a view that extends beyond simple 3D structural/geometric similarity, towards the realm of integrated sequence ↔ structure ↔ function properties. We find that such an approach can quantitatively represent and detect evolutionarily-remote relationships that evade existing methods.
FAIR checklist signals are shown for context only and do not affect DataRank scoring.
Base Score Contribution
0.330
From this paper's citation signal
Citation Network Contribution
0.183
From 4 citing papers with measurable signal
Ranked by citation count — the same ordering the engine uses when summing log1p(Cq) over citers.
DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 64% comes from its base citations and 36% from the citation network (4 citing papers contributed measurable signal).
Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.
Click a node to highlight its connections. Use scroll to zoom. Drag to pan.