Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

Signals in the Cells: Multimodal and Contextualized Machine Learning Foundations for Therapeutics

(2024)10.1101/2024.06.12.598655Source: DataRank Database

Signals in the Cells: Multimodal and Contextualized Machine Learning Foundations for Therapeutics is a dataset (2024). On theSindex it has a DataRank of 0.532, placing it in the top 48.1% of the data-sharing corpus. It has been cited 12 times, with 11 citing works in its 1-hop citation network. Its calibrated FAIR score is 50/100.

Top 48%percentile

0.532DataRank

0.532Top 48%

Dataset Open Access12 citations · base score 2.3

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

Drug discovery AI datasets and benchmarks have not traditionally included single-cell analysis biomarkers. While benchmarking efforts in single-cell analysis have recently released collections of single-cell tasks, they have yet to comprehensively release datasets, models, and benchmarks that integrate a broad range of therapeutic discovery tasks with cell-type-specific biomarkers. Therapeutics Commons (TDC-2) presents datasets, tools, models, and benchmarks integrating cell-type-specific contextual features with ML tasks across therapeutics. We present four tasks for contextual learning at single-cell resolution: drug-target nomination, genetic perturbation response prediction, chemical perturbation response prediction, and protein-peptide interaction prediction. We introduce datasets, models, and benchmarks for these four tasks. Finally, we detail the advancements and challenges in machine learning and biology that drove the implementation of TDC-2 and how they are reflected in its architecture, datasets and benchmarks, and foundation model tooling.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (1/2)

Open Access

Interoperable (0/2)

Reusable (1/3)

Dataset classification

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

50FAIR score

F Findable

A Accessible

I Interoperable

R Reusable

Top 22% by FAIRLLM-assessed✓ full text read

Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →

DataRank Breakdown

Base Score 65%Citation Network 35%

Base Score Contribution

0.345

From this paper's citation signal

Citation Network Contribution

0.187

From 6 citing papers with measurable signal

Learn more about DataRank methodology →

Top 3 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

The Tabula Sapiens: A multiple-organ, single-cell transcriptomic atlas of humans
Science2022979 citationsDataRank 9.0Top 23%
scGen predicts single-cell perturbation responses
Nature Methods2019666 citationsDataRank 0.975
Defining and benchmarking open problems in single-cell analysis
Nature Biotechnology202529 citationsDataRank 0.937

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 65% comes from its base citations and 35% from the citation network (6 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank

Authors (9)

Xiang LinORCID,Michelle M. LiORCID,Kexin HuangORCID,Wenhao GaoORCID,Tianfan FuORCID