User Tools

Site Tools


gcskewanalysisenglish

This is an old revision of the document!


Introduction

As a beginning of this series of tutorial (‘GCskew Basics’), we analyzed GC skew by using only existing G-language GAE methods. In this tutorial as a next step, we are going to analyze actual GC skew by taking advantage of G-language GAE and Perl scripts.

STEP 1 - Loading G and subroutines

Let’s start with writing Perl scripts. First create new G instance as shown in previous tutorial.

  use G;
  my $gb = load("ecoli");

There will be no need to suffer from loading genome files, parsing it, or searching for which functions to use when G-language GAE is selected. The above two-line script is it. Therefore users can focus on what they want not programming itself.

Here we are going to make script with GC skew subroutine to avoid re-build same program. The subroutine returns value of GC skew as output. Setting secondary argument as window size makes script much more resourcefull.

  sub gcskew{ 
     my $gb = shift; 
     my $window = shift; 
     my @gcskew = (); 
 
     return @gcskew; 
  } 

STEP 2 - Calculation and output of GC skews

Calculation of GC skew is very simple. Divide genome into the numbers of window and for each window calculate (C-G) / (C+G) then plot each value in the window. Let’s save the output by CSV so that graphs can be drawn with other applications.

All genome sequences are stored in $gb→{SEQ} in G instance. Other genomic information are stored in same way such as $gb→{LOCUS}, $gb→{BASE_COUNT}. Utilizing for statement is suitable for executing some processes in each window. length ($gb→{SEQ}) count up whole length of genome and $a = $sequence =~ tr/a/a/ count up a number of whole adenine in the genome.

The script should look like this:

  my @location = ();  # Store window position
  my $i = 0;  # initialize counter
  for ($i = 0; $i * $window < length($gb->{SEQ}); $i ++){ # Execute while Window overflow genome size
        my $sequence = substr($gb->{SEQ}, $i * $window, $window);  # Cut out sequences within the window
        my $c = $sequence =~ tr/c/c/;   # count up Cs
        my $g = $sequence =~ tr/g/g/;   # count up Gs
        my $skew = ($c-$g)/($c+$g);     # Calculate GC skew
        push (@location, $i * $window); # Store window position
        push (@gcskew, $skew);          # Store GC skew
  } 

Following script saves the output as 'gcskew.csv'.

  my $j = 0; 
  open(OUT, '>gcskew.csv'); 
  print OUT "location,GC skew¥n"; 
  for ($j = 0; $j <= $i; $j++){ 
       print OUT $location[$j], ",", $gcskew[$j], "¥n"; 
  } 
  close(OUT); 

STEP 3 – Generating graphs

gcskewanalysisenglish.1285935064.txt.gz · Last modified: 2014/01/18 07:44 (external edit)