Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

Required sample size and nonreplicability thresholds for heterogeneous genetic associations

Proceedings of the National Academy of Sciences(2008)10.1073/pnas.0705554105Source: DataRank Database

Required sample size and nonreplicability thresholds for heterogeneous genetic associations is a research paper published in Proceedings of the National Academy of Sciences (2008). On theSindex it has a DataRank of 4.3. It has been cited 105 times, with 85 citing works in its 1-hop citation network.

N/A

4.3DataRank · unranked

4.3

Open Access105 citations · base score 4.7

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

Many gene-disease associations proposed to date have not been consistently replicated across different populations. Nonreplication often reflects false positives in the original claims. However, occasionally, nonreplication may be due to heterogeneity due to biases or even genuine diversity of the genetic effects in different populations. Here, we propose methods for estimating the required sample size to replicate an association across many studies with different amounts of between-study heterogeneity, when data are summarized through metaanalysis. We demonstrate thresholds of between-study heterogeneity (tau(0)(2)) above which one cannot reach adequate power to replicate a proposed association at a specified level of statistical significance when k studies are performed (regardless of how large these studies are). Based on empirical evidence from 91 proposed gene-disease associations (50 on candidate genes and 41 from genome-wide association efforts), the observed between-study heterogeneity is often close to or even surpasses nonreplicability thresholds. With more modest between-study heterogeneity, the required sample size increases considerably compared with when no between-study heterogeneity exists. Increases are steep as tau(0)(2) is approached. Therefore, some true associations may not be practically possible to replicate with consistency, no matter how large studies are conducted. Efforts should be made to minimize between-study heterogeneity in targeted genetic effects.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (1/2)

Open Access

Interoperable (0/2)

Reusable (0/3)

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

Run a calibrated FAIR evaluation for this paper →

DataRank Breakdown

Base Score 16%Citation Network 84%

Base Score Contribution

0.700

From this paper's citation signal

Citation Network Contribution

3.6

From 74 citing papers with measurable signal

Learn more about DataRank methodology →

Top 5 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

Meta-analysis in clinical trials
Controlled Clinical Trials198638,994 citationsDataRank 1.6
Principal components analysis corrects for stratification in genome-wide association studies
Nature Genetics200610,642 citationsDataRank 1.4
Genome-wide association studies for complex traits: consensus, uncertainty and challenges
Nature Reviews Genetics20082,954 citationsDataRank 1.2
Plea for routinely presenting prediction intervals in meta-analysis
BMJ Open20162,088 citationsDataRank 1.1
Can trial sequential monitoring boundaries reduce spurious inferences from meta-analyses?
International Journal of Epidemiology2008862 citationsDataRank 1.0

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 16% comes from its base citations and 84% from the citation network (74 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank