User Tools

Site Tools


gcskewanalysisenglish

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
gcskewanalysisenglish [2010/10/01 12:01]
ike
gcskewanalysisenglish [2014/01/18 07:44] (current)
Line 4: Line 4:
 ====== STEP 1 - Loading G and subroutines ====== ====== STEP 1 - Loading G and subroutines ======
  
-Let’s start with writing Perl scripts. First load new G instance as shown in previous tutorial.+Let’s start with writing Perl scripts. First create ​new G instance as shown in previous tutorial.
 <code perl> <code perl>
   use G;   use G;
Line 14: Line 14:
 Here we are going to make script with GC skew subroutine to avoid re-build same program. The subroutine returns value of GC skew as output. Setting secondary argument as window size makes script much more resourcefull. Here we are going to make script with GC skew subroutine to avoid re-build same program. The subroutine returns value of GC skew as output. Setting secondary argument as window size makes script much more resourcefull.
  
 +<code perl>
 +  sub gcskew{ ​
 +     my $gb = shift; ​
 +     my $window = shift; ​
 +     my @gcskew = (); 
 +      ​
 +     ​return @gcskew; ​
 +  } 
 +</​code>​
 +====== STEP 2 - Calculation and output of GC skews ======
 +
 +Calculation of GC skew is very simple. Divide genome into the numbers of window and for each window calculate (C-G) / (C+G) then plot each value in the window. Let’s save the output by CSV so that graphs can be drawn with other applications.
 +
 +All genome sequences are stored in $gb->​{SEQ} in G instance. Other genomic information are stored in same way such as $gb->​{LOCUS},​ $gb->​{BASE_COUNT}. Utilizing for statement is suitable for executing some processes in each window. length ($gb→{SEQ}) count up whole length of genome and $a = $sequence =~ tr/a/a/ count up a number of whole adenine in the genome.
 +
 +The script should look like this:
 +
 +<code perl>
 +  my @location = ();  # Store window position
 +  my $i = 0;  # initialize counter
 +  for ($i = 0; $i * $window < length($gb->​{SEQ});​ $i ++){ # Execute while Window overflow genome size
 +        my $sequence = substr($gb->​{SEQ},​ $i * $window, $window); ​ # Cut out sequences within the window
 +        my $c = $sequence =~ tr/​c/​c/; ​  # count up Cs
 +        my $g = $sequence =~ tr/​g/​g/; ​  # count up Gs
 +        my $skew = ($c-$g)/​($c+$g); ​    # Calculate GC skew
 +        push (@location, $i * $window); # Store window position
 +        push (@gcskew, $skew); ​         # Store GC skew
 +  } 
 +</​code>​
 +
 +Following script saves the output as '​gcskew.csv'​.
 +
 +<code perl>
 +  my $j = 0; 
 +  open(OUT, '>​gcskew.csv'​); ​
 +  print OUT "​location,​GC skew¥n"; ​
 +  for ($j = 0; $j <= $i; $j++){ ​
 +       print OUT $location[$j],​ ",",​ $gcskew[$j],​ "​¥n"; ​
 +  } 
 +  close(OUT); ​
 +</​code>​
 +====== STEP 3 – Generating graphs ======
 +
 +Primary principle of G-langugae GAE is to make analysis more efficient. So G-language GAE provides a method grapher() to automatically draw a graph which save up user’s time. 
 +
 +
 +<​code>​
 +  grapher(¥@x-axis,​ ¥@value1, ¥@value2 ...)
 +  option description
 +  x x-axis label
 +  y y-axis label
 +  x1, x2, ... @value1, @value2, ...
 +  filename Filename
 +  title title
 +</​code>​
 +
 +There are many more options available. For further options see reference.
 +This is it for generating a graph.
 +
 +
 +Following script opens a graph with much more colorfully generated by gnuplot with attached gimv viewer.
 +====== STEP 4 - Done ======
 +<code perl>
 +  use strict;
 +  use warnings;
 +  use G; 
 +   
 +  my $gb = load("​ecoli"​); ​
 +   
 +  my @gcskew = &​gcskew($gb,​ 10000); ​
 +   
 +  sub gcskew{ ​
 +     my $gb = shift; ​
 +     my $window = shift; ​
 +     my @gcskew = (); 
 +   
 +     my @location = ();  ​
 +     my $i = 0;   
 +     for ($i = 0; $i * $window < length($gb->​{SEQ});​ $i ++){      ​
 +           my $sequence = substr($gb->​{SEQ},​ $i * $window, $window); ​  
 +           my $c = $sequence =~ tr/​c/​c/;  ​
 +           my $g = $sequence =~ tr/​g/​g/; ​  
 +           my $skew = ($c-$g)/​($c+$g); ​   ​
 +           push (@location, $i * $window);  ​
 +           push (@gcskew, $skew); ​     ​
 +     ​} ​
 +   
 +     my $j = 0; 
 +     ​open(OUT,​ '>​gcskew.csv'​); ​
 +     print OUT "​location,​GC skew¥n"; ​
 +     for ($j = 0; $j <= $i; $j++){ ​
 +           print OUT $location[$j],​ ",",​ $gcskew[$j],​ "​¥n"; ​
 +     ​} ​
 +     ​close(OUT); ​
 +   
 +   
 +     ​grapher(¥@location,​ ¥@gcskew, -x=>'​bp',​ -y=>'​GC skew', ​
 +                              -title=>'​GC skew', -filename=>'​gcskew.png'​); ​
 +   
 +     ​msg_gimv("​graph/​gcskew.png"​); ​
 +     ​return @gcskew; ​
 +  } 
 +      ​
 +</​code>​
gcskewanalysisenglish.1285934494.txt.gz · Last modified: 2014/01/18 07:44 (external edit)