🏆 Finalist — NIH Data Sharing Index (“S-Index”) Challenge
Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

A global reference for human genetic variation

Nature(2015)10.1038/nature15393Source: DataRank Database

A global reference for human genetic variation is a dataset published in Nature (2015). On theSindex it has a DataRank of 11.1, placing it in the top 18.9% of the data-sharing corpus. It has been cited 19,823 times, with 109 citing works in its 1-hop citation network. Its calibrated FAIR score is 72/100.

Top 19%percentile
11.1DataRank
11.1Top 19%
Dataset Open Access19823 citations · base score 9.9
Cite:
datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

Data sources & pipeline
Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring
Enrichment:Pending

FAIR Checklist

Context only (not used in score)
Findable (2/2)
  • Has DOI
  • Indexed in repositories
Accessible (1/2)
  • Open Access
Interoperable (2/2)
  • DataCite relations
  • Linked datasets
Reusable (1/3)
  • Dataset classification

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

72FAIR score
F Findable
100
A Accessible
70
I Interoperable
50
R Reusable
67
Top 2% by FAIRdeterministic✓ full text read

Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →

DataRank Breakdown

Base Score 13%Citation Network 87%

Base Score Contribution

1.5

From this paper's citation signal

Citation Network Contribution

9.7

From 109 citing papers with measurable signal

Learn more about DataRank methodology →

Top 5 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(Cq) over citers.

  1. An integrated encyclopedia of DNA elements in the human genome
    Nature201219,311 citationsDataRank 23.8Top 3%
  2. Twelve years of SAMtools and BCFtools
    GigaScience202115,177 citationsDataRank 1.4
  3. Analysis of protein-coding genetic variation in 60,706 humans
    Nature201610,291 citationsDataRank 15.2Top 12%
  4. An integrated map of genetic variation from 1,092 human genomes
    Nature20128,207 citationsDataRank 25.2Top 2%
Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 13% comes from its base citations and 87% from the citation network (109 citing papers contributed measurable signal).

Base score B(p)
log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p)
Σ over citers of log1p(Cq) ÷ max(outdegreeq, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85
DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded
Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank

Authors (521)

Gonçalo R. AbecasisORCID, Richard M. Durbin ,Panagiotis Deloukas,Aravinda ChakravartiORCID,Peter DonnellyORCID