Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

Evidence for a novel overlapping coding sequence in POLG initiated at a CUG start codon

BMC Genetics(2020)10.1186/s12863-020-0828-7Source: DataRank Database

Evidence for a novel overlapping coding sequence in POLG initiated at a CUG start codon is a research paper published in BMC Genetics (2020). On theSindex it has a DataRank of 1.1. It has been cited 41 times, with 24 citing works in its 1-hop citation network.

N/A

1.1DataRank · unranked

1.1

Open Access41 citations · base score 3.7

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

BackgroundPOLG, located on nuclear chromosome 15, encodes the DNA polymerase γ(Pol γ). Pol γ is responsible for the replication and repair of mitochondrial DNA (mtDNA). Pol γ is the only DNA polymerase found in mitochondria for most animal cells. Mutations in POLG are the most common single-gene cause of diseases of mitochondria and have been mapped over the coding region of the POLG ORF.ResultsUsing PhyloCSF to survey alternative reading frames, we found a conserved coding signature in an alternative frame in exons 2 and 3 of POLG, herein referred to as ORF-Y that arose de novo in placental mammals. Using the synplot2 program, synonymous site conservation was found among mammals in the region of the POLG ORF that is overlapped by ORF-Y. Ribosome profiling data revealed that ORF-Y is translated and that initiation likely occurs at a CUG codon. Inspection of an alignment of mammalian sequences containing ORF-Y revealed that the CUG codon has a strong initiation context and that a well-conserved predicted RNA stem-loop begins 14 nucleotides downstream. Such features are associated with enhanced initiation at near-cognate non-AUG codons. Reanalysis of the Kim et al. (2014) draft human proteome dataset yielded two unique peptides that map unambiguously to ORF-Y. An additional conserved uORF, herein referred to as ORF-Z, was also found in exon 2 of POLG. Lastly, we surveyed Clinvar variants that are synonymous with respect to the POLG ORF and found that most of these variants cause amino acid changes in ORF-Y or ORF-Z.ConclusionsWe provide evidence for a novel coding sequence, ORF-Y, that overlaps the POLG ORF. Ribosome profiling and mass spectrometry data show that ORF-Y is expressed. PhyloCSF and synplot2 analysis show that ORF-Y is subject to strong purifying selection. An abundance of disease-correlated mutations that map to exons 2 and 3 of POLG but also affect ORF-Y provides potential clinical significance to this finding.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (1/2)

Open Access

Interoperable (0/2)

Reusable (0/3)

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

Run a calibrated FAIR evaluation for this paper →

DataRank Breakdown

Base Score 52%Citation Network 48%

Base Score Contribution

0.561

From this paper's citation signal

Citation Network Contribution

0.511

From 22 citing papers with measurable signal

Learn more about DataRank methodology →

Top 5 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

MUSCLE: multiple sequence alignment with high accuracy and high throughput
Nucleic Acids Research200446,160 citationsDataRank 1.6
Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. Cohen
Journal of Molecular Biology200112,937 citationsDataRank 1.4
EMBOSS: The European Molecular Biology Open Software Suite
Trends in Genetics20009,778 citationsDataRank 1.4
GENCODE 2021
Nucleic Acids Research20201,452 citationsDataRank 9.7Top 22%
PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions
Bioinformatics20111,073 citationsDataRank 1.0

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 52% comes from its base citations and 48% from the citation network (22 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank