Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

Performance of Genotype Imputations Using Data from the 1000 Genomes Project

Human Heredity(2012)10.1159/000334084Source: DataRank Database

Performance of Genotype Imputations Using Data from the 1000 Genomes Project is a research paper published in Human Heredity (2012). On theSindex it has a DataRank of 2.5. It has been cited 39 times, with 35 citing works in its 1-hop citation network.

N/A

2.5DataRank · unranked

2.5

39 citations · base score 3.7

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

Genotype imputations based on 1000 Genomes (1KG) Project data have the advantage of imputing many more SNPs than imputations based on HapMap data. It also provides an opportunity to discover associations with relatively rare variants. Recent investigations are increasingly using 1KG data for genotype imputations, but only limited evaluations of the performance of this approach are available. In this paper, we empirically evaluated imputation performance using 1KG data by comparing imputation results to those using the HapMap Phase II data that have been widely used. We used three reference panels: the CEU panel consisting of 120 haplotypes from HapMap II and 1KG data (June 2010 release) and the EUR panel consisting of 566 haplotypes also from 1KG data (August 2010 release). We used Illumina 324,607 autosomal SNPs genotyped in 501 individuals of European ancestry. Our most important finding was that both 1KG reference panels provided much higher imputation yield than the HapMap II panel. There were more than twice as many successfully imputed SNPs as there were using the HapMap II panel (6.7 million vs. 2.5 million). Our second most important finding was that accuracy using both 1KG panels was high and almost identical to accuracy using the HapMap II panel. Furthermore, after removing SNPs with MACH Rsq <0.3, accuracy for both rare and low frequency SNPs was very high and almost identical to accuracy for common SNPs. We found that imputation using the 1KG-EUR panel had advantages in successfully imputing rare, low frequency and common variants. Our findings suggest that 1KG-based imputation can increase the opportunity to discover significant associations for SNPs across the allele frequency spectrum. Because the 1KG Project is still underway, we expect that later versions will provide even better imputation performance.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (2/2)

Has DOI
Indexed in repositories

Accessible (0/2)

Interoperable (2/2)

DataCite relations
Linked datasets

Reusable (0/3)

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

Run a calibrated FAIR evaluation for this paper →

DataRank Breakdown

Base Score 22%Citation Network 78%

Base Score Contribution

0.553

From this paper's citation signal

Citation Network Contribution

2.0

From 31 citing papers with measurable signal

Learn more about DataRank methodology →

Top 5 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

Performance of Genotype Imputation for Low Frequency and Rare Variants from the 1000 Genomes
PLOS ONE201570 citationsDataRank 2.9
Improving accuracy of rare variant imputation with a two-step imputation approach
European Journal of Human Genetics201560 citationsDataRank 2.1
Assessment of Genotype Imputation Performance Using 1000 Genomes in African American Studies
PLoS ONE201252 citationsDataRank 2.3
Genotype Imputation for Latinos Using the HapMap and 1000 Genomes Project Reference Panels
Frontiers in Genetics201221 citationsDataRank 1.2
Genotype Imputation for <scp>A</scp>frican <scp>A</scp>mericans Using Data From <scp>H</scp>ap<scp>M</scp>ap Phase <scp>II</scp> Versus 1000 <scp>G</scp>enomes <scp>P</scp>rojects
Genetic Epidemiology201214 citationsDataRank 0.932

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 22% comes from its base citations and 78% from the citation network (31 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank