Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

ENABLING INTEGRATIVE GENOMIC ANALYSIS OF HIGH-IMPACT HUMAN DISEASES THROUGH TEXT MINING

Biocomputing 2008(2007)10.1142/9789812776136_0056Source: DataRank Database

ENABLING INTEGRATIVE GENOMIC ANALYSIS OF HIGH-IMPACT HUMAN DISEASES THROUGH TEXT MINING is a research paper published in Biocomputing 2008 (2007). On theSindex it has a DataRank of 1.7. It has been cited 36 times, with 22 citing works in its 1-hop citation network.

N/A

1.7DataRank · unranked

1.7

Open Access36 citations · base score 3.6

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

Our limited ability to perform large-scale translational discovery and analysis of disease characterizations from public genomic data repositories remains a major bottleneck in efforts to translate genomics experiments to medicine. Through comprehensive, integrative genomic analysis of all available human disease characterizations we gain crucial insight into the molecular phenomena underlying pathogenesis as well as intra- and inter-disease differentiation. Such knowledge is crucial in the development of improved clinical diagnostics and the identification of molecular targets for novel therapeutics. In this study we build on our previous work to realize the next important step in large-scale translational discovery and analysis, which is to automatically identify those genomic experiments in which a disease state is compared to a normal control state. We present an automated text mining method that employs Natural Language Processing (NLP) techniques to automatically identify disease-related experiments in the NCBI Gene Expression Omnibus (GEO) that include measurements for both disease and normal control states. In this manner, we find that 62% of disease-related experiments contain sample subsets that can be automatically identified as normal controls. Furthermore, we calculate that the identified experiments characterize diseases that contribute to 30% of all human disease-related mortality in the United States. This work demonstrates that we now have the necessary tools and methods to initiate large-scale translational bioinformatics inquiry across the broad spectrum of high-impact human disease.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (1/2)

Open Access

Interoperable (0/2)

Reusable (0/3)

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

Run a calibrated FAIR evaluation for this paper →

DataRank Breakdown

Base Score 31%Citation Network 69%

Base Score Contribution

0.542

From this paper's citation signal

Citation Network Contribution

1.2

From 18 citing papers with measurable signal

Learn more about DataRank methodology →

Top 1 citer driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

Translational Bioinformatics for Genomic Medicine
Genomic and Personalized Medicine20136 citationsDataRank 0.634

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 31% comes from its base citations and 69% from the citation network (18 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank