Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

High‐throughput parallel proteogenomics: A bacterial case study

PROTEOMICS(2014)10.1002/pmic.201400185Source: DataRank Database

High‐throughput parallel proteogenomics: A bacterial case study is a research paper published in PROTEOMICS (2014). On theSindex it has a DataRank of 1.1. It has been cited 22 times, with 21 citing works in its 1-hop citation network.

N/A

1.1DataRank · unranked

1.1

22 citations · base score 3.1

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

In recent years, a new paradigm for genome annotation has emerged, termed "proteogenomics," that leverages peptide MS to annotate a genome. This is achieved by mapping peptides to a six‐frame translation of a genome, including available splice databases, which may suggest refinements to gene models. Using this approach, it is possible to refine gene regions such as exon boundaries, novel genes, gene boundaries, frame shifts, reverse strands, translated UTRs, and novel splice junctions. One of the challenges of proteogenomics is how best to (1) tackle assigning confidence to any resulting annotation and (2) apply these gene model refinements, either through manual annotation or through an automated process via training gene prediction tools. This is not a straightforward process, as many gene prediction tools have their defined suitability for niche genomes (either eukaryotic or prokaryotic) trained on and refined with model organisms such as Arabidopsis thaliana and Escherichia coli , and varying degrees of features that can leverage the use of external evidence. In this study, we outline a suitable approach toward preprocessing mass spectra and optimizing the MS/MS search for a given dataset. We also discuss future challenges, which continue to pose a problem in the field of proteogenomics, and better strategies to successfully tackle them with, using existing tools. We use Bradyrhizobium diazoefficiens (Nitrogen‐fixing bacteria), with a 9.1 Mb genome as a case study, utilizing the latest in second‐generation proteogenomics tools with multiple gene models for cross‐validation of proteogenomics annotations.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (0/2)

Interoperable (0/2)

Reusable (0/3)

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

Run a calibrated FAIR evaluation for this paper →

DataRank Breakdown

Base Score 41%Citation Network 59%

Base Score Contribution

0.470

From this paper's citation signal

Citation Network Contribution

0.675

From 19 citing papers with measurable signal

Learn more about DataRank methodology →

Top 5 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

Prodigal: prokaryotic gene recognition and translation initiation site identification
BMC Bioinformatics201012,778 citationsDataRank 1.4
The RAST Server: Rapid Annotations using Subsystems Technology
BMC Genomics200811,729 citationsDataRank 1.4
MUSCLE: a multiple sequence alignment method with reduced time and space complexity
BMC Bioinformatics20049,309 citationsDataRank 1.4
Proteomics studies confirm the presence of alternative protein isoforms on a large scale
Genome Biology200869 citationsDataRank 3.3
A proteogenomic update to Yersinia: enhancing genome annotation
BMC Genomics201055 citationsDataRank 2.3

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 41% comes from its base citations and 59% from the citation network (19 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank