Identifying centromeric satellites with dna-brnn
Identifying centromeric satellites with dna-brnn is a research paper published in Bioinformatics (2019). On theSindex it has a DataRank of 1.3. It has been cited 34 times, with 25 citing works in its 1-hop citation network. Its calibrated FAIR score is 74/100.
Abstract
SummaryHuman alpha satellite and satellite 2/3 contribute to several percent of the human genome. However, identifying these sequences with traditional algorithms is computationally intensive. Here we develop dna-brnn, a recurrent neural network to learn the sequences of the two classes of centromeric repeats. It achieves high similarity to RepeatMasker and is times faster. Dna-brnn explores a novel application of deep learning and may accelerate the study of the evolution of the two repeat classes.Availability and implementationhttps://github.com/lh3/dna-nn.
›Data sources & pipeline
FAIR Checklist
Context only (not used in score)- Has DOI
- Open Access
FAIR checklist signals are shown for context only and do not affect DataRank scoring.
Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →
DataRank Breakdown
Base Score Contribution
0.533
From this paper's citation signal
Citation Network Contribution
0.775
From 23 citing papers with measurable signal
Top 5 citers driving the network score
Ranked by citation count — the same ordering the engine uses when summing log1p(Cq) over citers.
- Tandem repeats finder: a program to analyze DNA sequencesNucleic Acids Research19999,791 citationsDataRank 1.4
- The Simons Genome Diversity Project: 300 genomes from 142 diverse populationsNature20161,762 citationsDataRank 8.5Top 24%
- Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assemblyGenome Research20171,276 citationsDataRank 10.2Top 21%
- The complete sequence of a human Y chromosomeNature2023452 citationsDataRank 5.2Top 28%
- A Draft Human Pangenome Reference202274 citationsDataRank 2.9Top 33%
Why this DataRank?
DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 41% comes from its base citations and 59% from the citation network (23 citing papers contributed measurable signal).
- Base score B(p)
- log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
- Network N(p)
- Σ over citers of log1p(Cq) ÷ max(outdegreeq, 1). Being cited by a highly-cited paper with few references counts most.
- Damping factor d = 0.85
- DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
- Self-citations excluded
- Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.
Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.
Click a node to highlight its connections. Use scroll to zoom. Drag to pan.