Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

Delineating the effective use of self-supervised learning in single-cell genomics

Nature Machine Intelligence(2024)10.1038/s42256-024-00934-3Source: DataRank Database

Delineating the effective use of self-supervised learning in single-cell genomics is a research paper published in Nature Machine Intelligence (2024). On theSindex it has a DataRank of 0.595. It has been cited 21 times, with 17 citing works in its 1-hop citation network.

N/A

0.595DataRank · unranked

0.595

Open Access21 citations · base score 3.1

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

Abstract Self-supervised learning (SSL) has emerged as a powerful method for extracting meaningful representations from vast, unlabelled datasets, transforming computer vision and natural language processing. In single-cell genomics (SCG), representation learning offers insights into the complex biological data, especially with emerging foundation models. However, identifying scenarios in SCG where SSL outperforms traditional learning methods remains a nuanced challenge. Furthermore, selecting the most effective pretext tasks within the SSL framework for SCG is a critical yet unresolved question. Here we address this gap by adapting and benchmarking SSL methods in SCG, including masked autoencoders with multiple masking strategies and contrastive learning methods. Models trained on over 20 million cells were examined across multiple downstream tasks, including cell-type prediction, gene-expression reconstruction, cross-modality prediction and data integration. Our empirical analyses underscore the nuanced role of SSL, namely, in transfer learning scenarios leveraging auxiliary data or analysing unseen datasets. Masked autoencoders excel over contrastive methods in SCG, diverging from computer vision trends. Moreover, our findings reveal the notable capabilities of SSL in zero-shot settings and its potential in cross-modality prediction and data integration. In summary, we study SSL methods in SCG on fully connected networks and benchmark their utility across key representation learning scenarios.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (1/2)

Open Access

Interoperable (0/2)

Reusable (0/3)

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

Run a calibrated FAIR evaluation for this paper →

DataRank Breakdown

Base Score 78%Citation Network 22%

Base Score Contribution

0.464

From this paper's citation signal

Citation Network Contribution

0.132

From 11 citing papers with measurable signal

Learn more about DataRank methodology →

Top 5 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles
Proceedings of the National Academy of Sciences200555,906 citationsDataRank 1.6
The Molecular Signatures Database Hallmark Gene Set Collection
Cell Systems201514,282 citationsDataRank 17.1Top 9%
Molecular signatures database (MSigDB) 3.0
Bioinformatics20117,680 citationsDataRank 18.0Top 8%
Benchmarking atlas-level data integration in single-cell genomics
Nature Methods20211,376 citationsDataRank 10.3Top 21%
Best practices for single-cell analysis across modalities
Nature Reviews Genetics20231,013 citationsDataRank 1.0

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 78% comes from its base citations and 22% from the citation network (11 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank