User Tools

Site Tools


gcskewanalysisenglish

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
gcskewanalysisenglish [2010/10/01 11:57]
ike created
gcskewanalysisenglish [2014/01/18 07:44] (current)
Line 2: Line 2:
  
 As a beginning of this series of tutorial (‘GCskew Basics’), we analyzed GC skew by using only existing G-language GAE methods. In this tutorial as a next step, we are going to analyze actual GC skew by taking advantage of G-language GAE and Perl scripts. As a beginning of this series of tutorial (‘GCskew Basics’), we analyzed GC skew by using only existing G-language GAE methods. In this tutorial as a next step, we are going to analyze actual GC skew by taking advantage of G-language GAE and Perl scripts.
 +====== STEP 1 - Loading G and subroutines ======
  
 +Let’s start with writing Perl scripts. First create new G instance as shown in previous tutorial.
 +<code perl>
 +  use G;
 +  my $gb = load("​ecoli"​);​
 +</​code>​
 +
 +There will be no need to suffer from loading genome files, parsing it, or searching for which functions to use when G-language GAE is selected. The above two-line script is it. Therefore users can focus on what they want not programming itself.
 +
 +Here we are going to make script with GC skew subroutine to avoid re-build same program. The subroutine returns value of GC skew as output. Setting secondary argument as window size makes script much more resourcefull.
 +
 +<code perl>
 +  sub gcskew{ ​
 +     my $gb = shift; ​
 +     my $window = shift; ​
 +     my @gcskew = (); 
 +      ​
 +     ​return @gcskew; ​
 +  } 
 +</​code>​
 +====== STEP 2 - Calculation and output of GC skews ======
 +
 +Calculation of GC skew is very simple. Divide genome into the numbers of window and for each window calculate (C-G) / (C+G) then plot each value in the window. Let’s save the output by CSV so that graphs can be drawn with other applications.
 +
 +All genome sequences are stored in $gb->​{SEQ} in G instance. Other genomic information are stored in same way such as $gb->​{LOCUS},​ $gb->​{BASE_COUNT}. Utilizing for statement is suitable for executing some processes in each window. length ($gb→{SEQ}) count up whole length of genome and $a = $sequence =~ tr/a/a/ count up a number of whole adenine in the genome.
 +
 +The script should look like this:
 +
 +<code perl>
 +  my @location = ();  # Store window position
 +  my $i = 0;  # initialize counter
 +  for ($i = 0; $i * $window < length($gb->​{SEQ});​ $i ++){ # Execute while Window overflow genome size
 +        my $sequence = substr($gb->​{SEQ},​ $i * $window, $window); ​ # Cut out sequences within the window
 +        my $c = $sequence =~ tr/​c/​c/; ​  # count up Cs
 +        my $g = $sequence =~ tr/​g/​g/; ​  # count up Gs
 +        my $skew = ($c-$g)/​($c+$g); ​    # Calculate GC skew
 +        push (@location, $i * $window); # Store window position
 +        push (@gcskew, $skew); ​         # Store GC skew
 +  } 
 +</​code>​
 +
 +Following script saves the output as '​gcskew.csv'​.
 +
 +<code perl>
 +  my $j = 0; 
 +  open(OUT, '>​gcskew.csv'​); ​
 +  print OUT "​location,​GC skew¥n"; ​
 +  for ($j = 0; $j <= $i; $j++){ ​
 +       print OUT $location[$j],​ ",",​ $gcskew[$j],​ "​¥n"; ​
 +  } 
 +  close(OUT); ​
 +</​code>​
 +====== STEP 3 – Generating graphs ======
 +
 +Primary principle of G-langugae GAE is to make analysis more efficient. So G-language GAE provides a method grapher() to automatically draw a graph which save up user’s time. 
 +
 +
 +<​code>​
 +  grapher(¥@x-axis,​ ¥@value1, ¥@value2 ...)
 +  option description
 +  x x-axis label
 +  y y-axis label
 +  x1, x2, ... @value1, @value2, ...
 +  filename Filename
 +  title title
 +</​code>​
 +
 +There are many more options available. For further options see reference.
 +This is it for generating a graph.
 +
 +
 +Following script opens a graph with much more colorfully generated by gnuplot with attached gimv viewer.
 +====== STEP 4 - Done ======
 +<code perl>
 +  use strict;
 +  use warnings;
 +  use G; 
 +   
 +  my $gb = load("​ecoli"​); ​
 +   
 +  my @gcskew = &​gcskew($gb,​ 10000); ​
 +   
 +  sub gcskew{ ​
 +     my $gb = shift; ​
 +     my $window = shift; ​
 +     my @gcskew = (); 
 +   
 +     my @location = ();  ​
 +     my $i = 0;   
 +     for ($i = 0; $i * $window < length($gb->​{SEQ});​ $i ++){      ​
 +           my $sequence = substr($gb->​{SEQ},​ $i * $window, $window); ​  
 +           my $c = $sequence =~ tr/​c/​c/;  ​
 +           my $g = $sequence =~ tr/​g/​g/; ​  
 +           my $skew = ($c-$g)/​($c+$g); ​   ​
 +           push (@location, $i * $window);  ​
 +           push (@gcskew, $skew); ​     ​
 +     ​} ​
 +   
 +     my $j = 0; 
 +     ​open(OUT,​ '>​gcskew.csv'​); ​
 +     print OUT "​location,​GC skew¥n"; ​
 +     for ($j = 0; $j <= $i; $j++){ ​
 +           print OUT $location[$j],​ ",",​ $gcskew[$j],​ "​¥n"; ​
 +     ​} ​
 +     ​close(OUT); ​
 +   
 +   
 +     ​grapher(¥@location,​ ¥@gcskew, -x=>'​bp',​ -y=>'​GC skew', ​
 +                              -title=>'​GC skew', -filename=>'​gcskew.png'​); ​
 +   
 +     ​msg_gimv("​graph/​gcskew.png"​); ​
 +     ​return @gcskew; ​
 +  } 
 +      ​
 +</​code>​
gcskewanalysisenglish.1285934224.txt.gz · Last modified: 2014/01/18 07:44 (external edit)