Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

Identifying centromeric satellites with dna-brnn

Bioinformatics(2019)10.1093/bioinformatics/btz264Source: DataRank Database

Identifying centromeric satellites with dna-brnn is a research paper published in Bioinformatics (2019). On theSindex it has a DataRank of 1.3. It has been cited 34 times, with 25 citing works in its 1-hop citation network. Its calibrated FAIR score is 74/100.

N/A

1.3DataRank · unranked

1.3

Open Access34 citations · base score 3.6

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

SummaryHuman alpha satellite and satellite 2/3 contribute to several percent of the human genome. However, identifying these sequences with traditional algorithms is computationally intensive. Here we develop dna-brnn, a recurrent neural network to learn the sequences of the two classes of centromeric repeats. It achieves high similarity to RepeatMasker and is times faster. Dna-brnn explores a novel application of deep learning and may accelerate the study of the evolution of the two repeat classes.Availability and implementationhttps://github.com/lh3/dna-nn.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (1/2)

Open Access

Interoperable (0/2)

Reusable (0/3)

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

74FAIR score

F Findable

100

A Accessible

I Interoperable

100

R Reusable

Top 1% by FAIRdeterministic⚠ abstract only

Estimated from the abstract only. The agent couldn't read this paper's full text, so body-dependent criteria (data-availability statement, formats, license) are inferred. For a confident score, upload the PDF or supply full text →

Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →

DataRank Breakdown

Base Score 41%Citation Network 59%

Base Score Contribution

0.533

From this paper's citation signal

Citation Network Contribution

0.775

From 23 citing papers with measurable signal

Learn more about DataRank methodology →

Top 5 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

Tandem repeats finder: a program to analyze DNA sequences
Nucleic Acids Research19999,791 citationsDataRank 1.4
The Simons Genome Diversity Project: 300 genomes from 142 diverse populations
Nature20161,762 citationsDataRank 8.5Top 24%
Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly
Genome Research20171,276 citationsDataRank 10.2Top 21%
The complete sequence of a human Y chromosome
Nature2023452 citationsDataRank 5.2Top 28%
A Draft Human Pangenome Reference
202274 citationsDataRank 2.9Top 33%

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 41% comes from its base citations and 59% from the citation network (23 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank