Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

Population-level integration of single-cell datasets enables multi-scale analysis across samples

Nature Methods(2023)10.1038/s41592-023-02035-2Source: DataRank Database

Population-level integration of single-cell datasets enables multi-scale analysis across samples is a research paper published in Nature Methods (2023). On theSindex it has a DataRank of 2.0. It has been cited 108 times, with 86 citing works in its 1-hop citation network.

N/A

2.0DataRank · unranked

2.0

Open Access108 citations · base score 4.7

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

The increasing generation of population-level single-cell atlases has the potential to link sample metadata with cellular data. Constructing such references requires integration of heterogeneous cohorts with varying metadata. Here we present single-cell population level integration (scPoli), an open-world learner that incorporates generative models to learn sample and cell representations for data integration, label transfer and reference mapping. We applied scPoli on population-level atlases of lung and peripheral blood mononuclear cells, the latter consisting of 7.8 million cells across 2,375 samples. We demonstrate that scPoli can explain sample-level biological and technical variations using sample embeddings revealing genes associated with batch effects and biological effects. scPoli is further applicable to single-cell sequencing assay for transposase-accessible chromatin and cross-species datasets, offering insights into chromatin accessibility and comparative genomics. We envision scPoli becoming an important tool for population-level single-cell data integration facilitating atlas use but also interpretation by means of multi-scale analyses.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (1/2)

Open Access

Interoperable (0/2)

Reusable (0/3)

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

Run a calibrated FAIR evaluation for this paper →

DataRank Breakdown

Base Score 36%Citation Network 64%

Base Score Contribution

0.704

From this paper's citation signal

Citation Network Contribution

1.3

From 59 citing papers with measurable signal

Learn more about DataRank methodology →

Top 5 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

limma powers differential expression analyses for RNA-sequencing and microarray studies
Nucleic Acids Research201542,254 citationsDataRank 1.6
Comprehensive Integration of Single-Cell Data
Cell201916,515 citationsDataRank 1.5
Auto-Encoding Variational Bayes
201315,586 citationsDataRank 1.4
Integrated analysis of multimodal single-cell data
Cell202115,542 citationsDataRank 1.4
Fast, sensitive and accurate integration of single-cell data with Harmony
Nature Methods201910,108 citationsDataRank 1.4

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 36% comes from its base citations and 64% from the citation network (59 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank