The complete sequence of a human Y chromosome
The complete sequence of a human Y chromosome is a dataset (2022). On theSindex it has a DataRank of 0.931, placing it in the top 42.6% of the data-sharing corpus. It has been cited 42 times, with 25 citing works in its 1-hop citation network. Its calibrated FAIR score is 48/100.
Abstract
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure including long palindromes, tandem repeats, and segmental duplications 1–3 . As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished 4, 5 . Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029 base pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, revealing the complete ampliconic structures of TSPY , DAZ , and RBMY gene families; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a prior assembly of the CHM13 genome 4 and mapped available population variation, clinical variants, and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.
›Data sources & pipeline
FAIR Checklist
Context only (not used in score)- Has DOI
- Open Access
- Dataset classification
FAIR checklist signals are shown for context only and do not affect DataRank scoring.
Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →
DataRank Breakdown
Base Score Contribution
0.564
From this paper's citation signal
Citation Network Contribution
0.367
From 19 citing papers with measurable signal
Top 5 citers driving the network score
Ranked by citation count — the same ordering the engine uses when summing log1p(Cq) over citers.
- Basic local alignment search toolJournal of Molecular Biology199078,740 citationsDataRank 1.7
- Fast and accurate short read alignment with Burrows–Wheeler transformBioinformatics200962,117 citationsDataRank 1.7
- Fast gapped-read alignment with Bowtie 2Nature Methods201259,681 citationsDataRank 1.6
- RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogeniesBioinformatics201433,987 citationsDataRank 1.6
- BEDTools: a flexible suite of utilities for comparing genomic featuresBioinformatics201030,023 citationsDataRank 1.5
Why this DataRank?
DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 61% comes from its base citations and 39% from the citation network (19 citing papers contributed measurable signal).
- Base score B(p)
- log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
- Network N(p)
- Σ over citers of log1p(Cq) ÷ max(outdegreeq, 1). Being cited by a highly-cited paper with few references counts most.
- Damping factor d = 0.85
- DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
- Self-citations excluded
- Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.
Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.
Click a node to highlight its connections. Use scroll to zoom. Drag to pan.