🏆 Finalist — NIH Data Sharing Index (“S-Index”) Challenge
Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

Conditional out-of-distribution generation for unpaired data using transfer VAE

Bioinformatics(2020)10.1093/bioinformatics/btaa800Source: DataRank Database

Conditional out-of-distribution generation for unpaired data using transfer VAE is a research paper published in Bioinformatics (2020). On theSindex it has a DataRank of 3.1. It has been cited 139 times, with 112 citing works in its 1-hop citation network.

N/A
3.1DataRank · unranked
3.1
Open Access139 citations · base score 4.9
Cite:
datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

MotivationWhile generative models have shown great success in sampling high-dimensional samples conditional on low-dimensional descriptors (stroke thickness in MNIST, hair color in CelebA, speaker identity in WaveNet), their generation out-of-distribution poses fundamental problems due to the difficulty of learning compact joint distribution across conditions. The canonical example of the conditional variational autoencoder (CVAE), for instance, does not explicitly relate conditions during training and, hence, has no explicit incentive of learning such a compact representation.ResultsWe overcome the limitation of the CVAE by matching distributions across conditions using maximum mean discrepancy in the decoder layer that follows the bottleneck. This introduces a strong regularization both for reconstructing samples within the same condition and for transforming samples across conditions, resulting in much improved generalization. As this amount to solving a style-transfer problem, we refer to the model as transfer VAE (trVAE). Benchmarking trVAE on high-dimensional image and single-cell RNA-seq, we demonstrate higher robustness and higher accuracy than existing approaches. We also show qualitatively improved predictions by tackling previously problematic minority classes and multiple conditions in the context of cellular perturbation response to treatment and disease based on high-dimensional single-cell gene expression data. For generic tasks, we improve Pearson correlations of high-dimensional estimated means and variances with their ground truths from 0.89 to 0.97 and 0.75 to 0.87, respectively. We further demonstrate that trVAE learns cell-type-specific responses after perturbation and improves the prediction of most cell-type-specific genes by 65%.Availability and implementationThe trVAE implementation is available via github.com/theislab/trvae. The results of this article can be reproduced via github.com/theislab/trvae_reproducibility.

Data sources & pipeline
Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring
Enrichment:Pending

FAIR Checklist

Context only (not used in score)
Findable (1/2)
  • Has DOI
Accessible (1/2)
  • Open Access
Interoperable (0/2)
    Reusable (0/3)

      FAIR checklist signals are shown for context only and do not affect DataRank scoring.

      DataRank Breakdown

      Base Score 24%Citation Network 76%

      Base Score Contribution

      0.741

      From this paper's citation signal

      Citation Network Contribution

      2.4

      From 83 citing papers with measurable signal

      Learn more about DataRank methodology →

      Top 5 citers driving the network score

      Ranked by citation count — the same ordering the engine uses when summing log1p(Cq) over citers.

      1. A single-cell survey of the small intestinal epithelium
        Nature20171,850 citationsDataRank 1.1
      2. Benchmarking atlas-level data integration in single-cell genomics
        Nature Methods20211,376 citationsDataRank 10.3Top 21%
      3. Single-cell RNA-seq denoising using a deep count autoencoder
        Nature Communications20191,140 citationsDataRank 1.1
      4. scGen predicts single-cell perturbation responses
        Nature Methods2019666 citationsDataRank 0.975
      Why this DataRank?

      DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 24% comes from its base citations and 76% from the citation network (83 citing papers contributed measurable signal).

      Base score B(p)
      log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
      Network N(p)
      Σ over citers of log1p(Cq) ÷ max(outdegreeq, 1). Being cited by a highly-cited paper with few references counts most.
      Damping factor d = 0.85
      DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
      Self-citations excluded
      Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

      Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

      Read the full methodology →

      Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

      Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank