Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

The complete sequence of a human Y chromosome

(2022)10.1101/2022.12.01.518724Source: DataRank Database

The complete sequence of a human Y chromosome is a dataset (2022). On theSindex it has a DataRank of 0.931, placing it in the top 42.6% of the data-sharing corpus. It has been cited 42 times, with 25 citing works in its 1-hop citation network. Its calibrated FAIR score is 48/100.

Top 43%percentile

0.931DataRank

0.931Top 43%

Dataset Open Access42 citations · base score 3.8

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure including long palindromes, tandem repeats, and segmental duplications 1–3 . As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished 4, 5 . Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029 base pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, revealing the complete ampliconic structures of TSPY , DAZ , and RBMY gene families; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a prior assembly of the CHM13 genome 4 and mapped available population variation, clinical variants, and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (1/2)

Open Access

Interoperable (0/2)

Reusable (1/3)

Dataset classification

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

48FAIR score

F Findable

A Accessible

I Interoperable

R Reusable

Top 55% by FAIRLLM-assessed✓ full text read

Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →

DataRank Breakdown

Base Score 61%Citation Network 39%

Base Score Contribution

0.564

From this paper's citation signal

Citation Network Contribution

0.367

From 19 citing papers with measurable signal

Learn more about DataRank methodology →

Top 5 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

Basic local alignment search tool
Journal of Molecular Biology199078,740 citationsDataRank 1.7
Fast and accurate short read alignment with Burrows–Wheeler transform
Bioinformatics200962,117 citationsDataRank 1.7
Fast gapped-read alignment with Bowtie 2
Nature Methods201259,681 citationsDataRank 1.6
RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies
Bioinformatics201433,987 citationsDataRank 1.6
BEDTools: a flexible suite of utilities for comparing genomic features
Bioinformatics201030,023 citationsDataRank 1.5

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 61% comes from its base citations and 39% from the citation network (19 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank

Authors (84)

Sergey NurkORCID,Savannah J. HoytORCID,Dylan J. TaylorORCID,Nicolas AltemoseORCID,Paul W. HookORCID