🏆 Finalist — NIH Data Sharing Index (“S-Index”) Challenge
Demo corpus. Scores are computed on a select set of biomedical paper/datasets and may be inaccurate for papers outside this corpus — DataRank relies on network effects that improve with scale. We aim to expand this into a fully open resource pending additional funding.

The ENCODE (ENCyclopedia Of DNA Elements) Project

Science(2004)10.1126/science.1105136Source: DataRank Database

The ENCODE (ENCyclopedia Of DNA Elements) Project is a dataset published in Science (2004). On theSindex it has a DataRank of 23.1, placing it in the top 3.1% of the data-sharing corpus. It has been cited 2,487 times, with 200 citing works in its 1-hop citation network.

Top 3%percentile
23.1DataRank
23.1Top 3%
Dataset2487 citations · base score 7.8
Cite:
datarank_citation_only_1hop_v6· scope data_onlyMethodology

Abstract

The ENCyclopedia Of DNA Elements (ENCODE) Project aims to identify all functional elements in the human genome sequence. The pilot phase of the Project is focused on a specified 30 megabases (approximately 1%) of the human genome sequence and is organized as an international consortium of computational and laboratory-based scientists working to develop and apply high-throughput approaches for detecting all sequence elements that confer biological function. The results of this pilot phase will guide future efforts to analyze the entire human genome.

Data sources & pipeline
Pipeline:MetadataData-paper checkEnrichmentCitation networkScoring
Enrichment:Pending

FAIR Checklist

Context only (not used in score)
Findable (1/2)
  • Has DOI
Accessible (0/2)
    Interoperable (0/2)
      Reusable (1/3)
      • Dataset classification

      FAIR checklist signals are shown for context only and do not affect DataRank scoring.

      DataRank Breakdown

      Base Score 5%Citation Network 95%

      Base Score Contribution

      1.2

      From this paper's citation signal

      Citation Network Contribution

      22.0

      From 200 citing papers with measurable signal

      Learn more about DataRank methodology →

      Top 5 citers driving the network score

      Ranked by citation count — the same ordering the engine uses when summing log1p(Cq) over citers.

      1. Initial sequencing and analysis of the human genome
        Nature200124,542 citationsDataRank 17.1Top 10%
      2. An integrated encyclopedia of DNA elements in the human genome
        Nature201219,311 citationsDataRank 23.8Top 3%
      3. The Sequence of the Human Genome
        Science200113,648 citationsDataRank 18.7Top 7%
      4. Initial sequencing and comparative analysis of the mouse genome
        Nature20027,236 citationsDataRank 16.2Top 10%
      5. A haplotype map of the human genome
        Nature20055,917 citationsDataRank 29.2Top 1%
      Why this DataRank?

      DataRank blends this paper's own citation count with the influence of the papers that cite it. Here, roughly 5% comes from its base citations and 95% from the citation network (200 citing papers contributed measurable signal).

      Base score B(p)
      log1p(citation_count) — grows sub-linearly, so a paper with 1,000 citations is not 10× a paper with 100.
      Network N(p)
      Σ over citers of log1p(Cq) ÷ max(outdegreeq, 1). Being cited by a highly-cited paper with few references counts most.
      Damping factor d = 0.85
      DataRank = (1−d)·B(p) + d·N(p) — the two cards above are each already multiplied by their share.
      Self-citations excluded
      Citers sharing any OpenAlex author ID with this paper are filtered out before the network sum.

      Citers are pulled from OpenAlex sorted by cited_by_count:descand capped per paper, so when the cap binds we keep the highest-signal references and the score is reproducible across reruns.

      Read the full methodology →

      Click a node to highlight its connections. Use scroll to zoom. Drag to pan.

      Node colors:CenterData PaperData + Open AccessNon-dataSelected & links| Node size = percentile rank