Population-level integration of single-cell datasets enables multi-scale analysis across samples
Population-level integration of single-cell datasets enables multi-scale analysis across samples is a research paper published in Nature Methods (2023). On theSindex it has a DataRank of 2.0. It has been cited 108 times, with 86 citing works in its 1-hop citation network.
Abstract
The increasing generation of population-level single-cell atlases has the potential to link sample metadata with cellular data. Constructing such references requires integration of heterogeneous cohorts with varying metadata. Here we present single-cell population level integration (scPoli), an open-world learner that incorporates generative models to learn sample and cell representations for data integration, label transfer and reference mapping. We applied scPoli on population-level atlases of lung and peripheral blood mononuclear cells, the latter consisting of 7.8 million cells across 2,375 samples. We demonstrate that scPoli can explain sample-level biological and technical variations using sample embeddings revealing genes associated with batch effects and biological effects. scPoli is further applicable to single-cell sequencing assay for transposase-accessible chromatin and cross-species datasets, offering insights into chromatin accessibility and comparative genomics. We envision scPoli becoming an important tool for population-level single-cell data integration facilitating atlas use but also interpretation by means of multi-scale analyses.
›Data sources & pipeline
FAIR Checklist
Context only (not used in score)- Has DOI
- Open Access
FAIR checklist signals are shown for context only and do not affect DataRank scoring.
DataRank Breakdown
Base Score Contribution
0.704
From this paper's citation signal
Citation Network Contribution
1.3
From 59 citing papers with measurable signal
Top 5 citers driving the network score
Ranked by citation count — the same ordering the engine uses when summing log1p(Cq) over citers.
- limma powers differential expression analyses for RNA-sequencing and microarray studiesNucleic Acids Research201542,254 citationsDataRank 1.6
- Comprehensive Integration of Single-Cell DataCell201916,515 citationsDataRank 1.5
- Auto-Encoding Variational Bayes201315,586 citationsDataRank 1.4
- Integrated analysis of multimodal single-cell dataCell202115,542 citationsDataRank 1.4
- Fast, sensitive and accurate integration of single-cell data with HarmonyNature Methods201910,108 citationsDataRank 1.4
Why this DataRank?
DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 36% comes from its base citations and 64% from the citation network (59 citing papers contributed measurable signal).
- Base score B(p)
- log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
- Network N(p)
- Σ over citers of log1p(Cq) ÷ max(outdegreeq, 1). Being cited by a highly-cited paper with few references counts most.
- Damping factor d = 0.85
- DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
- Self-citations excluded
- Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.
Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.
Click a node to highlight its connections. Use scroll to zoom. Drag to pan.