Computationally Efficient Assembly of Pseudomonas aeruginosa Gene Expression Compendia is a dataset published in mSystems (2023). On theSindex it has a DataRank of 0.342, placing it in the top 51.4% of the data-sharing corpus. It has been cited 6 times, with 2 citing works in its 1-hop citation network. Its calibrated FAIR score is 50/100.
Thousands of Pseudomonas aeruginosa RNA sequencing (RNA-seq) gene expression profiles are publicly available via the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). In this work, the transcriptional profiles from hundreds of studies performed by over 75 research groups were reanalyzed in aggregate to create a powerful tool for hypothesis generation and testing. Raw sequence data were uniformly processed using the Salmon pseudoaligner, and this read mapping method was validated by comparison to a direct alignment method. We developed filtering criteria to exclude samples with aberrant levels of housekeeping gene expression or an unexpected number of genes with no reported values and normalized the filtered compendia using the ratio-of-medians method. The filtering and normalization steps greatly improved gene expression correlations for genes within the same operon or regulon across the 2,333 samples. Since the RNA-seq data were generated using diverse strains, we report the effects of mapping samples to noncognate reference genomes by separately analyzing all samples mapped to cDNA reference genomes for strains PAO1 and PA14, two divergent strains that were used to generate most of the samples. Finally, we developed an algorithm to incorporate new data as they are deposited into the SRA. Our processing and quality control methods provide a scalable framework for taking advantage of the troves of biological information hibernating in the depths of microbial gene expression data and yield useful tools for P. aeruginosa RNA-seq data to be leveraged for diverse research goals. IMPORTANCE Pseudomonas aeruginosa is a causative agent of a wide range of infections, including chronic infections associated with cystic fibrosis. These P. aeruginosa infections are difficult to treat and often have negative outcomes. To aid in the study of this problematic pathogen, we mapped, filtered for quality, and normalized thousands of P. aeruginosa RNA-seq gene expression profiles that were publicly available via the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). The resulting compendia facilitate analyses across experiments, strains, and conditions. Ultimately, the workflow that we present could be applied to analyses of other microbial species.
FAIR checklist signals are shown for context only and do not affect DataRank scoring.
Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →
Base Score Contribution
0.292
From this paper's citation signal
Citation Network Contribution
0.0499
From 2 citing papers with measurable signal
Ranked by citation count — the same ordering the engine uses when summing log1p(Cq) over citers.
DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 85% comes from its base citations and 15% from the citation network (2 citing papers contributed measurable signal).
Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.
Click a node to highlight its connections. Use scroll to zoom. Drag to pan.