Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

False positive findings during genome-wide association studies with imputation: influence of allele frequency and imputation accuracy

Human Molecular Genetics(2021)10.1093/hmg/ddab203Source: DataRank Database

False positive findings during genome-wide association studies with imputation: influence of allele frequency and imputation accuracy is a research paper published in Human Molecular Genetics (2021). On theSindex it has a DataRank of 0.860. It has been cited 25 times, with 25 citing works in its 1-hop citation network. Its calibrated FAIR score is 49/100.

N/A

0.860DataRank · unranked

0.860

25 citations · base score 3.3

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

Abstract Genotype imputation is widely used in genetic studies to boost the power of GWAS, to combine multiple studies for meta-analysis and to perform fine mapping. With advances of imputation tools and large reference panels, genotype imputation has become mature and accurate. However, the uncertain nature of imputed genotypes can cause bias in the downstream analysis. Many studies have compared the performance of popular imputation approaches, but few investigated bias characteristics of downstream association analyses. Herein, we showed that the imputation accuracy is diminished if the real genotypes contain minor alleles. Although these genotypes are less common, which is particularly true for loci with low minor allele frequency, a large discordance between imputed and observed genotypes significantly inflated the association results, especially in data with a large portion of uncertain SNPs. The significant discordance of P-values happened as the P-value approached 0 or the imputation quality was poor. Although elimination of poorly imputed SNPs can remove false positive (FP) SNPs, it sacrificed, sometimes, more than 80% true positive (TP) SNPs. For top ranked SNPs, removing variants with moderate imputation quality cannot reduce the proportion of FP SNPs, and increasing sample size in reference panels did not greatly benefit the results as well. Additionally, samples with a balanced ratio between cases and controls can dramatically improve the number of TP SNPs observed in the imputation based GWAS. These results raise concerns about results from analysis of association studies when rare variants are studied, particularly when case–control studies are unbalanced.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (0/2)

Interoperable (0/2)

Reusable (0/3)

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

49FAIR score

F Findable

100

A Accessible

I Interoperable

R Reusable

Top 55% by FAIRdeterministic⚠ abstract only

Estimated from the abstract only. The agent couldn't read this paper's full text, so body-dependent criteria (data-availability statement, formats, license) are inferred. For a confident score, upload the PDF or supply full text →

Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →

DataRank Breakdown

Base Score 57%Citation Network 43%

Base Score Contribution

0.489

From this paper's citation signal

Citation Network Contribution

0.371

From 14 citing papers with measurable signal

Learn more about DataRank methodology →

Top 2 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

A global reference for human genetic variation
Nature201519,823 citationsDataRank 11.1Top 19%
Robust relationship inference in genome-wide association studies
Bioinformatics20103,850 citationsDataRank 1.2

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 57% comes from its base citations and 43% from the citation network (14 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank

Authors (5)

Xiangjun Xiao,Wen ZhouORCID,Dakai ZhuORCID,Christopher I Amos,Zhihui ZhangORCID