Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

Probabilistic PCA of censored data: accounting for uncertainties in the visualization of high-throughput single-cell qPCR data

Bioinformatics(2014)10.1093/bioinformatics/btu134Source: DataRank Database

Probabilistic PCA of censored data: accounting for uncertainties in the visualization of high-throughput single-cell qPCR data is a research paper published in Bioinformatics (2014). On theSindex it has a DataRank of 1.4. It has been cited 25 times, with 22 citing works in its 1-hop citation network.

N/A

1.4DataRank · unranked

1.4

Open Access25 citations · base score 3.3

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

MotivationHigh-throughput single-cell quantitative real-time polymerase chain reaction (qPCR) is a promising technique allowing for new insights in complex cellular processes. However, the PCR reaction can be detected only up to a certain detection limit, whereas failed reactions could be due to low or absent expression, and the true expression level is unknown. Because this censoring can occur for high proportions of the data, it is one of the main challenges when dealing with single-cell qPCR data. Principal component analysis (PCA) is an important tool for visualizing the structure of high-dimensional data as well as for identifying subpopulations of cells. However, to date it is not clear how to perform a PCA of censored data. We present a probabilistic approach that accounts for the censoring and evaluate it for two typical datasets containing single-cell qPCR data.ResultsWe use the Gaussian process latent variable model framework to account for censoring by introducing an appropriate noise model and allowing a different kernel for each dimension. We evaluate this new approach for two typical qPCR datasets (of mouse embryonic stem cells and blood stem/progenitor cells, respectively) by performing linear and non-linear probabilistic PCA. Taking the censoring into account results in a 2D representation of the data, which better reflects its known structure: in both datasets, our new approach results in a better separation of known cell types and is able to reveal subpopulations in one dataset that could not be resolved using standard PCA.Availability and implementationThe implementation was based on the existing Gaussian process latent variable model toolbox (https://github.com/SheffieldML/GPmat); extensions for noise models and kernels accounting for censoring are available at http://icb.helmholtz-muenchen.de/censgplvm.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (1/2)

Open Access

Interoperable (0/2)

Reusable (0/3)

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

Run a calibrated FAIR evaluation for this paper →

DataRank Breakdown

Base Score 36%Citation Network 64%

Base Score Contribution

0.489

From this paper's citation signal

Citation Network Contribution

0.868

From 19 citing papers with measurable signal

Learn more about DataRank methodology →

Top 5 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

Diffusion maps for high-dimensional single-cell analysis of differentiation data
Bioinformatics2015709 citationsDataRank 0.985
Revealing the vectors of cellular identity with single-cell genomics
Nature Biotechnology2016701 citationsDataRank 0.983
Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments
Bioinformatics2012441 citationsDataRank 0.914
Characterization of transcriptional networks in blood stem and progenitor cells using high-throughput single-cell gene expression analysis
Nature Cell Biology2013279 citationsDataRank 0.845
A novel approach for resolving differences in single-cell gene expression patterns from zygote to blastocyst
Bioinformatics201267 citationsDataRank 2.6

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 36% comes from its base citations and 64% from the citation network (19 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank