DNA Walk Map

Overview

DNA Walk is a vectorial representation of DNA sequences transformed into a planer trajectory. Two pairs of complementary nucleotides (A-T, G-C) is suitable for two dimensional vectorization, so the DNA sequence is moved upwards for A, downwards for T, to right for G, and to left for C, visualizing the trajectory.

DNA Walk makes patterns in the genomes sequences apparent. Clustering of repeats, palindromes, horizontally transferred genes, telomeres, and GC skew can be easily spotted using this visualization approach. Following is the DNA Walk of Escherichia coli, which is highly skewed in GC vector, and leading/lagging strands can be quickly identified from this diagram.

DNA Walk

Displayed Objects

DNA Walk

DNA Walk is drawn by moving every nucleotide in a given sequence one pixel in the direction depicted by the following diagram. DNA Walk directions

Origin of DNA Walk is marked by the cross-section of gray axes, and nucleotides change color from red to green as the position of the given nucleotide progresses within the sequence. Several areas in genome may have unusual structures, for example, the one below in Escherichia coli genome forming a hairpin-like structure having reversed nucleotide composition.

Region in E.coli DNA Walk

GC skew and DNA Walk

GC skew is the excess of C over G in certain regions, formulated as (C-G)/(C+G). In bacterial genomes, replicational selection prefers Guanine over Cytosine in leading strands, therefore positive GC skew value is typically observed in leading strands, and negative in lagging strands. In fact, GC skew is often utilized to define the positions of replication origin and terminus in bacterial genomes. In many bacterial genome projects, the position 1 in genome flatfiles correspond to the putative replication origin.

DNA Walk is therefore the integrated representation of GC skew and AT skew. Conversely, GC skew can be considered as the projection of DNA Walk in GC vector.

Reference:

  • Lobry, J.R., 1996. Asymmetric substitution patterns in the two DNA strands of bacteria. Mol. Biol. Evol. 13, 660-665

In this way, genomes with high GC skew becomes extremely linear V-shaped graph as in the following example of Clostridium perfringens,

DNA Walk of Clostridium

or highly random when GC skew is not observable, as in the genome of Gloeobacter violaceus.

Gloebacter

Displayed Annotations

Upon searching, search results are shown as pins on the map, or as text shown in collapsible window on the right-most side. Clicking on each of the pins or text result entries will bring up a dialogue baloon, which shows the following information:

  • gene name
  • product description
  • Gene Ontology terms if available
  • 3D structure if PDB entry was found
  • Links to UniProt, KEGG, NCBI RefSeq, and PDB (if link was found)

Displayed Annotations

dna_walk_map_view.txt · Last modified: 2009/04/19 15:41 (external edit)