Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

Methods in Comparative Genomics: Genome Correspondence, Gene Identification and Regulatory Motif Discovery

Journal of Computational Biology(2004)10.1089/1066527041410319Source: DataRank Database

Methods in Comparative Genomics: Genome Correspondence, Gene Identification and Regulatory Motif Discovery is a research paper published in Journal of Computational Biology (2004). On theSindex it has a DataRank of 5.0. It has been cited 105 times, with 94 citing works in its 1-hop citation network. Its calibrated FAIR score is 61/100.

N/A

5.0DataRank · unranked

5.0

105 citations · base score 4.7

Cite:

datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

In Kellis et al. (2003), we reported the genome sequences of S. paradoxus, S. mikatae, and S. bayanus and compared these three yeast species to their close relative, S. cerevisiae. Genomewide comparative analysis allowed the identification of functionally important sequences, both coding and noncoding. In this companion paper we describe the mathematical and algorithmic results underpinning the analysis of these genomes. (1) We present methods for the automatic determination of genome correspondence. The algorithms enabled the automatic identification of orthologs for more than 90% of genes and intergenic regions across the four species despite the large number of duplicated genes in the yeast genome. The remaining ambiguities in the gene correspondence revealed recent gene family expansions in regions of rapid genomic change. (2) We present methods for the identification of protein-coding genes based on their patterns of nucleotide conservation across related species. We observed the pressure to conserve the reading frame of functional proteins and developed a test for gene identification with high sensitivity and specificity. We used this test to revisit the genome of S. cerevisiae, reducing the overall gene count by 500 genes (10% of previously annotated genes) and refining the gene structure of hundreds of genes. (3) We present novel methods for the systematic de novo identification of regulatory motifs. The methods do not rely on previous knowledge of gene function and in that way differ from the current literature on computational motif discovery. Based on genomewide conservation patterns of known motifs, we developed three conservation criteria that we used to discover novel motifs. We used an enumeration approach to select strongly conserved motif cores, which we extended and collapsed into a small number of candidate regulatory motifs. These include most previously known regulatory motifs as well as several noteworthy novel motifs. The majority of discovered motifs are enriched in functionally related genes, allowing us to infer a candidate function for novel motifs. Our results demonstrate the power of comparative genomics to further our understanding of any species. Our methods are validated by the extensive experimental knowledge in yeast and will be invaluable in the study of complex genomes like that of the human.

›Data sources & pipeline

Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring

Enrichment:Pending

FAIR Checklist

Context only (not used in score)

Findable (1/2)

Has DOI

Accessible (0/2)

Interoperable (0/2)

Reusable (0/3)

FAIR checklist signals are shown for context only and do not affect DataRank scoring.

61FAIR score

F Findable

100

A Accessible

I Interoperable

R Reusable

Top 10% by FAIRdeterministic⚠ abstract only

Estimated from the abstract only. The agent couldn't read this paper's full text, so body-dependent criteria (data-availability statement, formats, license) are inferred. For a confident score, upload the PDF or supply full text →

Calibrated FAIR score — a parallel quality metric, independent of the DataRank citation score. See the full evaluation →

DataRank Breakdown

Base Score 14%Citation Network 86%

Base Score Contribution

0.700

From this paper's citation signal

Citation Network Contribution

4.3

From 80 citing papers with measurable signal

Learn more about DataRank methodology →

Top 5 citers driving the network score

Ranked by citation count — the same ordering the engine uses when summing log1p(C_q) over citers.

Basic local alignment search tool
Journal of Molecular Biology199093,553 citationsDataRank 1.7
Initial sequencing and analysis of the human genome
Nature200124,542 citationsDataRank 17.1Top 10%
Initial sequencing and comparative analysis of the mouse genome
Nature20027,236 citationsDataRank 16.2Top 10%
Sequencing and comparison of yeast species to identify genes and regulatory elements
Nature20031,791 citationsDataRank 19.3Top 6%
Natural history and evolutionary principles of gene duplication in fungi
Nature2007629 citationsDataRank 0.967

Why this DataRank?

DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 14% comes from its base citations and 86% from the citation network (80 citing papers contributed measurable signal).

Base score B(p): log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
Network N(p): Σ over citers of log1p(C_q) ÷ max(outdegree_q, 1). Being cited by a highly-cited paper with few references counts most.
Damping factor d = 0.85: DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
Self-citations excluded: Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

Read the full methodology →

Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank