Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

An expanded reference catalog of translated open reading frames for biomedical research

(2025)10.1101/2025.07.03.662928Source: DataRank Database

An expanded reference catalog of translated open reading frames for biomedical research is a dataset (2025). On theSindex it has a DataRank of 0.299, placing it in the top 53% of the data-sharing corpus. It has been cited 7 times, with 3 citing works in its 1-hop citation network. Its calibrated FAIR score is 41/100.

Top 53%percentile

0.299DataRank

0.299Top 53%

Dataset Open Access7 citations · base score 1.9

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

Non-canonical (i.e., unannotated) open reading frames (ncORFs) have until recently been omitted from reference genome annotations, despite evidence of their translation, limiting their incorporation into biomedical research. To address this, in 2022, we initiated the TransCODE consortium and built the first community-driven consensus catalog of human ncORFs, which was openly distributed to the research community via Ensembl-GENCODE. While this catalog represented a starting point for reference ncORF annotation, major technical and scientific issues remained. In particular, this initial catalogue had no standardized framework to judge the evidence of translation for individual ncORFs. Here, we present an expanded and refined catalog of the human reference annotation of ncORFs. By incorporating more datasets and by lifting constraints on ORF length and start-codon, we define a comprehensive set of 28,359 ncORFs that is nearly four times the size of the previous catalog. Furthermore, to aid users who wish to work with ncORFs with the strongest and most reproducible signals of translation, we utilized a data-driven framework (i.e. translation signature scores) to assess the accumulated evidence for any individual ncORF. Using this approach, we derive a subset of 7,888 ncORFs with translation evidence on par with canonical protein-coding genes, which we refer to as the Primary set. This set can serve as a reliable reference for downstream analyses and validation, with a particular emphasis on high quality. Overall, this update reflects continual community-driven efforts to make ncORFs accessible and actionable to the broader research public and further iterations of the catalog will continue to expand and refine this resource.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (1/2)

Open Access

Interoperable (0/2)

Reusable (1/3)

Dataset classification

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

41FAIR score

F Findable

A Accessible

I Interoperable

R Reusable

Top 79% by FAIRLLM-assessed✓ full text read

Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →

DataRank Breakdown

Base Score 98%Citation Network 2%

Base Score Contribution

0.292

From this paper's citation signal

Citation Network Contribution

6.62 × 10⁻³

From 1 citing papers with measurable signal

Learn more about DataRank methodology →

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 98% comes from its base citations and 2% from the citation network (1 citing paper contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank

Authors (46)

Jorge Ruiz-OreraORCID,Jack A. S. TierneyORCID,Jim ClauwaertORCID,Eric W. DeutschORCID,M. Mar AlbàORCID