Companion data deposit of manuscript: Evaluating and improving the representation of bacterial contents in long-read metagenome assemblies is a research paper published in arXiv (Cornell University) (2022). On theSindex it has a DataRank of 0.165. It has been cited 2 times.
Background: In the metagenome assembly of a microbiome community, we may think abundant species would be easier to assemble due to their deeper coverage. However, this conjucture is rarely tested. We often do not know how many abundant species we are missing and do not have an approach to recover these species. Results: Here we proposed k-mer based and 16S RNA based methods to measure the completeness of metagenome assembly. We showed that even with PacBio High-Fidelity (HiFi) reads, abundant species are often not assembled as high strain diversity may lead to fragmented contigs. We developed a novel algorithm to recover abundant metagenome-assembled genomes (MAGs) by identifying circular assembly subgraphs. Our algorithm is reference-free and complement to standard metagenome binning. Evaluated on 14 real datasets, it rescued many abundant species that would be missing with existing methods. Conclusions: Our work stresses the importance of metagenome completeness which is often overlooked before. Our algorithm generates more circular MAGs and moves a step closer to the complete representation of microbiome communities.
FAIR checklist signals are shown for context only and do not affect DataRank scoring.
Base Score Contribution
0.165
From this paper's citation signal
Citation Network Contribution
0
Citation network not refreshed for this result
This paper's DataRank is currently driven only by its base citation score. Citation network data was not refreshed for this result.
Learn more about DataRank methodology →DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 100% comes from its base citations and 0% from the citation network.
Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.