🏆 Finalist — NIH Data Sharing Index (“S-Index”) Challenge
Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

The complete sequence of a human Y chromosome

(2022)10.1101/2022.12.01.518724Source: DataRank Database

The complete sequence of a human Y chromosome is a dataset (2022). On theSindex it has a DataRank of 0.931, placing it in the top 42.6% of the data-sharing corpus. It has been cited 42 times, with 25 citing works in its 1-hop citation network. Its calibrated FAIR score is 48/100.

Top 43%percentile
0.931DataRank
0.931Top 43%
Dataset Open Access42 citations · base score 3.8
Cite:
datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure including long palindromes, tandem repeats, and segmental duplications 1–3 . As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished 4, 5 . Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029 base pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, revealing the complete ampliconic structures of TSPY , DAZ , and RBMY gene families; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a prior assembly of the CHM13 genome 4 and mapped available population variation, clinical variants, and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.

Data sources & pipeline
Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring
Enrichment:Pending

FAIR Checklist

Context only (not used in score)
Findable (1/2)
  • Has DOI
Accessible (1/2)
  • Open Access
Interoperable (0/2)
    Reusable (1/3)
    • Dataset classification

    FAIR checklist signals are shown for context only and do not affect DataRank scoring.

    48FAIR score
    F Findable
    45
    A Accessible
    68
    I Interoperable
    38
    R Reusable
    42
    Top 55% by FAIRLLM-assessed✓ full text read

    Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →

    DataRank Breakdown

    Base Score 61%Citation Network 39%

    Base Score Contribution

    0.564

    From this paper's citation signal

    Citation Network Contribution

    0.367

    From 19 citing papers with measurable signal

    Learn more about DataRank methodology →

    Top 5 citers driving the network score

    Ranked by citation count — the same ordering the engine uses when summing log1p(Cq) over citers.

    1. Basic local alignment search tool
      Journal of Molecular Biology199078,740 citationsDataRank 1.7
    2. Fast gapped-read alignment with Bowtie 2
      Nature Methods201259,681 citationsDataRank 1.6
    Why this DataRank?

    DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 61% comes from its base citations and 39% from the citation network (19 citing papers contributed measurable signal).

    Base score B(p)
    log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
    Network N(p)
    Σ over citers of log1p(Cq) ÷ max(outdegreeq, 1). Being cited by a highly-cited paper with few references counts most.
    Damping factor d = 0.85
    DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
    Self-citations excluded
    Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

    Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

    Read the full methodology →

    Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

    Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank

    Authors (84)