Disease Risk Factors Identified Through Shared Genetic Architecture and Electronic Medical Records
Disease Risk Factors Identified Through Shared Genetic Architecture and Electronic Medical Records is a research paper published in Science Translational Medicine (2014). On theSindex it has a DataRank of 2.8. It has been cited 57 times, with 49 citing works in its 1-hop citation network.
Abstract
Genome-wide association studies have identified genetic variants for thousands of diseases and traits. We evaluated the relationships between specific risk factors (for example, blood cholesterol level) and diseases on the basis of their shared genetic architecture in a comprehensive human disease-single-nucleotide polymorphism association database (VARIMED), analyzing the findings from 8962 published association studies. Similarity between traits and diseases was statistically evaluated on the basis of their association with shared gene variants. We identified 120 disease-trait pairs that were statistically similar, and of these, we tested and validated five previously unknown disease-trait associations by searching electronic medical records (EMRs) from three independent medical centers for evidence of the trait appearing in patients within 1 year of first diagnosis of the disease. We validated that the mean corpuscular volume is elevated before diagnosis of acute lymphoblastic leukemia; both have associated variants in the gene IKZF1. Platelet count is decreased before diagnosis of alcohol dependence; both are associated with variants in the gene C12orf51. Alkaline phosphatase level is elevated in patients with venous thromboembolism; both share variants in ABO. Similarly, we found that prostate-specific antigen and serum magnesium levels were altered before the diagnosis of lung cancer and gastric cancer, respectively. Disease-trait associations identify traits that could serve as future prognostics, if validated through EMR and subsequent prospective trials.
›Data sources & pipeline
FAIR Checklist
Context only (not used in score)- Has DOI
- Open Access
FAIR checklist signals are shown for context only and do not affect DataRank scoring.
DataRank Breakdown
Base Score Contribution
0.609
From this paper's citation signal
Citation Network Contribution
2.2
From 42 citing papers with measurable signal
Top 5 citers driving the network score
Ranked by citation count — the same ordering the engine uses when summing log1p(Cq) over citers.
- Seventh Report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood PressureHypertension200313,259 citationsDataRank 1.4
- Prediction of Coronary Heart Disease Using Risk Factor CategoriesCirculation19989,594 citationsDataRank 1.4
- Potential etiologic and functional implications of genome-wide association loci for human diseases and traitsProceedings of the National Academy of Sciences20094,159 citationsDataRank 1.2
- Clinical assessment incorporating a personal genomeThe Lancet2010676 citationsDataRank 0.978
- Network-Based Elucidation of Human Disease Similarities Reveals Common Functional Modules Enriched for Pluripotent Drug TargetsPLoS Computational Biology2010340 citationsDataRank 0.875
Why this DataRank?
DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 22% comes from its base citations and 78% from the citation network (42 citing papers contributed measurable signal).
- Base score B(p)
- log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
- Network N(p)
- Σ over citers of log1p(Cq) ÷ max(outdegreeq, 1). Being cited by a highly-cited paper with few references counts most.
- Damping factor d = 0.85
- DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
- Self-citations excluded
- Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.
Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.
Click a node to highlight its connections. Use scroll to zoom. Drag to pan.