Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

Konnector v2.0: pseudo-long reads from paired-end sequencing data

BMC Medical Genomics(2015)10.1186/1755-8794-8-s3-s1Source: DataRank Database

Konnector v2.0: pseudo-long reads from paired-end sequencing data is a research paper published in BMC Medical Genomics (2015). On theSindex it has a DataRank of 0.730. It has been cited 23 times, with 11 citing works in its 1-hop citation network.

N/A

0.730DataRank · unranked

0.730

Open Access23 citations · base score 3.2

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

BackgroundReading the nucleotides from two ends of a DNA fragment is called paired-end tag (PET) sequencing. When the fragment length is longer than the combined read length, there remains a gap of unsequenced nucleotides between read pairs. If the target in such experiments is sequenced at a level to provide redundant coverage, it may be possible to bridge these gaps using bioinformatics methods. Konnector is a local de novo assembly tool that addresses this problem. Here we report on version 2.0 of our tool.ResultsKonnector uses a probabilistic and memory-efficient data structure called Bloom filter to represent a k-mer spectrum - all possible sequences of length k in an input file, such as the collection of reads in a PET sequencing experiment. It performs look-ups to this data structure to construct an implicit de Bruijn graph, which describes (k-1) base pair overlaps between adjacent k-mers. It traverses this graph to bridge the gap between a given pair of flanking sequences.ConclusionsHere we report the performance of Konnector v2.0 on simulated and experimental datasets, and compare it against other tools with similar functionality. We note that, representing k-mers with 1.5 bytes of memory on average, Konnector can scale to very large genomes. With our parallel implementation, it can also process over a billion bases on commodity hardware.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (1/2)

Open Access

Interoperable (0/2)

Reusable (0/3)

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

Run a calibrated FAIR evaluation for this paper →

DataRank Breakdown

Base Score 65%Citation Network 35%

Base Score Contribution

0.477

From this paper's citation signal

Citation Network Contribution

0.253

From 10 citing papers with measurable signal

Learn more about DataRank methodology →

Top 5 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

Fast and accurate short read alignment with Burrows–Wheeler transform
Bioinformatics200962,117 citationsDataRank 1.7
BEDTools: a flexible suite of utilities for comparing genomic features
Bioinformatics201030,023 citationsDataRank 1.5
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
Genome Biology200922,843 citationsDataRank 1.5
FLASH: fast length adjustment of short reads to improve genome assemblies
Bioinformatics201115,526 citationsDataRank 1.4
QUAST: quality assessment tool for genome assemblies
Bioinformatics201311,134 citationsDataRank 1.4

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 65% comes from its base citations and 35% from the citation network (10 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank