Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

Domain-based small molecule binding site annotation

BMC Bioinformatics(2006)10.1186/1471-2105-7-152Source: DataRank Database

Domain-based small molecule binding site annotation is a dataset published in BMC Bioinformatics (2006). On theSindex it has a DataRank of 1.6, placing it in the top 37.3% of the data-sharing corpus. It has been cited 27 times, with 25 citing works in its 1-hop citation network. Its calibrated FAIR score is 38/100.

Top 37%percentile

1.6DataRank

1.6Top 37%

Dataset Open Access27 citations · base score 3.3

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

BackgroundAccurate small molecule binding site information for a protein can facilitate studies in drug docking, drug discovery and function prediction, but small molecule binding site protein sequence annotation is sparse. The Small Molecule Interaction Database (SMID), a database of protein domain-small molecule interactions, was created using structural data from the Protein Data Bank (PDB). More importantly it provides a means to predict small molecule binding sites on proteins with a known or unknown structure and unlike prior approaches, removes large numbers of false positive hits arising from transitive alignment errors, non-biologically significant small molecules and crystallographic conditions that overpredict ion binding sites.DescriptionUsing a set of co-crystallized protein-small molecule structures as a starting point, SMID interactions were generated by identifying protein domains that bind to small molecules, using NCBI's Reverse Position Specific BLAST (RPS-BLAST) algorithm. SMID records are available for viewing at http://smid.blueprint.org. The SMID-BLAST tool provides accurate transitive annotation of small-molecule binding sites for proteins not found in the PDB. Given a protein sequence, SMID-BLAST identifies domains using RPS-BLAST and then lists potential small molecule ligands based on SMID records, as well as their aligned binding sites. A heuristic ligand score is calculated based on E-value, ligand residue identity and domain entropy to assign a level of confidence to hits found. SMID-BLAST predictions were validated against a set of 793 experimental small molecule interactions from the PDB, of which 472 (60%) of predicted interactions identically matched the experimental small molecule and of these, 344 had greater than 80% of the binding site residues correctly identified. Further, we estimate that 45% of predictions which were not observed in the PDB validation set may be true positives.ConclusionBy focusing on protein domain-small molecule interactions, SMID is able to cluster similar interactions and detect subtle binding patterns that would not otherwise be obvious. Using SMID-BLAST, small molecule targets can be predicted for any protein sequence, with the only limitation being that the small molecule must exist in the PDB. Validation results and specific examples within illustrate that SMID-BLAST has a high degree of accuracy in terms of predicting both the small molecule ligand and binding site residue positions for a query protein.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (1/2)

Open Access

Interoperable (0/2)

Reusable (1/3)

Dataset classification

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

38FAIR score

F Findable

A Accessible

I Interoperable

R Reusable

Top 81% by FAIRLLM-assessed✓ full text read

Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →

DataRank Breakdown

Base Score 31%Citation Network 69%

Base Score Contribution

0.500

From this paper's citation signal

Citation Network Contribution

1.1

From 24 citing papers with measurable signal

Learn more about DataRank methodology →

Top 4 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

Revised effective ionic radii and systematic studies of interatomic distances in halides and chalcogenides
Acta Crystallographica Section A197663,862 citationsDataRank 1.7
van der Waals Volumes and Radii
The Journal of Physical Chemistry196419,138 citationsDataRank 1.5
The Protein Data Bank
Acta Crystallographica Section D Biological Crystallography20022,626 citationsDataRank 20.7Top 4%
Target Profiling of Small Molecules
Protein Targeting with Small Molecules20092 citationsDataRank 0.165

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 31% comes from its base citations and 69% from the citation network (24 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank