Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

Scalable and universal prediction of cellular phenotypes enables in silico experiments

(2024)10.1101/2024.08.12.607533Source: DataRank Database

Scalable and universal prediction of cellular phenotypes enables in silico experiments is a research paper (2024). On theSindex it has a DataRank of 0.527. It has been cited 12 times, with 11 citing works in its 1-hop citation network.

N/A

0.527DataRank · unranked

0.527

Open Access12 citations · base score 2.6

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

Biological systems can be interrogated by perturbing individual components and observing the consequences across molecular, cellular, and phenotypic levels. The vast combinatorial space of possible perturbations and responses makes exhaustive experimentation infeasible. Recent advances in machine learning have shown that training on diverse datasets enables transfer learning across tasks, capturing patterns that generalize and improving performance on previously unseen problems. Inspired by this principle, we present Prophet, a transformer-based model pretrained on a vast, heterogeneous collection of perturbation experiments. This pretraining allows Prophet to predict the outcomes of untested genetic or chemical perturbations in novel cellular contexts, spanning phenotypes such as gene expression, viability, and morphology. By leveraging shared structure across apparently disconnected assays, Prophet provides a scalable framework for large-scale virtual screening and prioritization of informative experiments. Prophet consistently outperforms baseline models, including those trained on single phenotypes, showing that transfer learning between phenotypes not only is possible but improves predictive accuracy. Its capabilities extends to in vivo developmental systems, where it recapitulates known lineage biology and proposes new candidates. In a large-scale in silico screen for melanoma, Prophet identified and experimentally validated compounds with selective activity that mirrored clinically approved therapies, demonstrating its ability to transform perturbation biology into a predictive and scalable engine for therapeutic discovery.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (1/2)

Open Access

Interoperable (0/2)

Reusable (0/3)

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

Run a calibrated FAIR evaluation for this paper →

DataRank Breakdown

Base Score 73%Citation Network 27%

Base Score Contribution

0.385

From this paper's citation signal

Citation Network Contribution

0.142

From 7 citing papers with measurable signal

Learn more about DataRank methodology →

Top 5 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets
Nucleic Acids Research201819,062 citationsDataRank 13.8Top 14%
UMAP: Uniform Manifold Approximation and Projection
Journal of Open Source Software20189,287 citationsDataRank 1.4
Evolutionary-scale prediction of atomic-level protein structure with a language model
Science20234,605 citationsDataRank 1.3
Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen
Nature Communications2019356 citationsDataRank 5.4Top 28%
Scalable genetic screening for regulatory circuits using compressed Perturb-seq
Nature Biotechnology202396 citationsDataRank 2.2

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 73% comes from its base citations and 27% from the citation network (7 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank