Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery

Bioinformatics(2009)10.1093/bioinformatics/btp220Source: DataRank Database

A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery is a research paper published in Bioinformatics (2009). On theSindex it has a DataRank of 3.4. It has been cited 96 times, with 71 citing works in its 1-hop citation network. Its calibrated FAIR score is 68/100.

N/A

3.4DataRank · unranked

3.4

Open Access96 citations · base score 4.6

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

Functional relationships between proteins that do not share global structure similarity can be established by detecting their ligand-binding-site similarity. For a large-scale comparison, it is critical to accurately and efficiently assess the statistical significance of this similarity. Here, we report an efficient statistical model that supports local sequence order independent ligand-binding-site similarity searching. Most existing statistical models only take into account the matching vertices between two sites that are defined by a fixed number of points. In reality, the boundary of the binding site is not known or is dependent on the bound ligand making these approaches limited. To address these shortcomings and to perform binding-site mapping on a genome-wide scale, we developed a sequence-order independent profile-profile alignment (SOIPPA) algorithm that is able to detect local similarity between unknown binding sites a priori. The SOIPPA scoring integrates geometric, evolutionary and physical information into a unified framework. However, this imposes a significant challenge in assessing the statistical significance of the similarity because the conventional probability model that is based on fixed-point matching cannot be applied. Here we find that scores for binding-site matching by SOIPPA follow an extreme value distribution (EVD). Benchmark studies show that the EVD model performs at least two-orders faster and is more accurate than the non-parametric statistical method in the previous SOIPPA version. Efficient statistical analysis makes it possible to apply SOIPPA to genome-based drug discovery. Consequently, we have applied the approach to the structural genome of Mycobacterium tuberculosis to construct a protein-ligand interaction network. The network reveals highly connected proteins, which represent suitable targets for promiscuous drugs.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (1/2)

Open Access

Interoperable (0/2)

Reusable (0/3)

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

68FAIR score

F Findable

100

A Accessible

I Interoperable

R Reusable

Top 5% by FAIRdeterministic✓ full text read

Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →

DataRank Breakdown

Base Score 20%Citation Network 80%

Base Score Contribution

0.686

From this paper's citation signal

Citation Network Contribution

2.7

From 63 citing papers with measurable signal

Learn more about DataRank methodology →

Top 5 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

The Protein Data Bank
Nucleic Acids Research200039,606 citationsDataRank 32.3Top 1%
SuperTarget and Matador: resources for exploring drug-target relationships
Nucleic Acids Research2007663 citationsDataRank 13.2Top 15%
Structural Evolution of the Protein Kinase–Like Superfamily
PLoS Computational Biology2005269 citationsDataRank 0.840
Detecting evolutionary relationships across existing fold space, using sequence order-independent profile–profile alignments
Proceedings of the National Academy of Sciences2008251 citationsDataRank 0.829
A Machine Learning-Based Method To Improve Docking Scoring Functions and Its Application to Drug Repurposing
Journal of Chemical Information and Modeling2011229 citationsDataRank 0.816

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 20% comes from its base citations and 80% from the citation network (63 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank