Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

Examining Population Stratification via Individual Ancestry Estimates versus Self-Reported Race

Cancer Epidemiology, Biomarkers & Prevention(2005)10.1158/1055-9965.epi-04-0832Source: DataRank Database

Examining Population Stratification via Individual Ancestry Estimates versus Self-Reported Race is a research paper published in Cancer Epidemiology, Biomarkers & Prevention (2005). On theSindex it has a DataRank of 5.2. It has been cited 83 times, with 74 citing works in its 1-hop citation network.

N/A

5.2DataRank · unranked

5.2

83 citations · base score 4.4

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

Abstract Population stratification has the potential to affect the results of genetic marker studies. Estimating individual ancestry provides a continuous measure to assess population structure in case-control studies of complex disease, instead of using self-reported racial groups. We estimate individual ancestry using the Federal Bureau of Investigation CODIS Core short tandem repeat set of 13 loci using two different analysis methods in a case-control study of early-onset lung cancer. Individual ancestry proportions were estimated for “European” and “West African” groups using published allele frequencies. The majority of Caucasian, non-Hispanics had &gt;50% European ancestry, whereas the majority of African Americans had &lt;20% European ancestry, regardless of ancestry estimation method, although significant overlap by self-reported race and ancestry also existed. When we further investigated the effect of ancestry and self-reported race on the frequency of a lung cancer risk genotype, we found that the frequency of the GSTM1 null genotype varies by individual European ancestry and case-control status within self-reported race (particularly for African Americans). Genetic risk models showed that adjusting for individual European ancestry provided a better fit to the data compared with the model with no group adjustment or adjustment for self-reported race. This study suggests that significant population substructure differences exist that self-reported race alone does not capture and that individual ancestry may be confounded with disease status and/or a candidate gene risk genotype.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (0/2)

Interoperable (0/2)

Reusable (0/3)

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

Run a calibrated FAIR evaluation for this paper →

DataRank Breakdown

Base Score 13%Citation Network 87%

Base Score Contribution

0.665

From this paper's citation signal

Citation Network Contribution

4.5

From 66 citing papers with measurable signal

Learn more about DataRank methodology →

Top 5 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

Inference of Population Structure Using Multilocus Genotype Data
Genetics200034,058 citationsDataRank 1.6
Inference of Population Structure Using Multilocus Genotype Data: Linked Loci and Correlated Allele Frequencies
Genetics20038,028 citationsDataRank 1.3
The effects of human population structure on large genetic association studies
Nature Genetics2004898 citationsDataRank 1.0
Skin pigmentation, biogeographical ancestry and admixture mapping
Human Genetics2003551 citationsDataRank 14.5
Polymorphisms in CYP1A1, GSTM1, GSTT1 and lung cancer below the age of 45 years
International Journal of Epidemiology2003111 citationsDataRank 5.0

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 13% comes from its base citations and 87% from the citation network (66 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank

Authors (4)

Ranajit Chakraborty,Thomas A. SellersORCID,Ann G. SchwartzORCID,Jill S. Barnholtz-Sloan

Related Papers (1)

Admixture-matched case-control study: a practical approach for genetic association studies in admixed populations

N/A

2.4DataRank · unranked

Human Genetics(2006)

co-cited

10.1007/s00439-005-0080-2