Introduction
As a beginning of this series of tutorial (‘GCskew Basics’), we analyzed GC skew by using only existing G-language GAE methods. In this tutorial as a next step, we are going to analyze actual GC skew by taking advantage of G-language GAE and Perl scripts.
STEP 1 - Loading G and subroutines
Let’s start with writing Perl scripts. First create new G instance as shown in previous tutorial.
use G; my $gb = load("ecoli");
There will be no need to suffer from loading genome files, parsing it, or searching for which functions to use when G-language GAE is selected. The above two-line script is it. Therefore users can focus on what they want not programming itself.
Here we are going to make script with GC skew subroutine to avoid re-build same program. The subroutine returns value of GC skew as output. Setting secondary argument as window size makes script much more resourcefull.
sub gcskew{ my $gb = shift; my $window = shift; my @gcskew = (); return @gcskew; }
STEP 2 - Calculation and output of GC skews
Calculation of GC skew is very simple. Divide genome into the numbers of window and for each window calculate (C-G) / (C+G) then plot each value in the window. Let’s save the output by CSV so that graphs can be drawn with other applications.
All genome sequences are stored in $gb→{SEQ} in G instance. Other genomic information are stored in same way such as $gb→{LOCUS}, $gb→{BASE_COUNT}. Utilizing for statement is suitable for executing some processes in each window. length ($gb→{SEQ}) count up whole length of genome and $a = $sequence =~ tr/a/a/ count up a number of whole adenine in the genome.
The script should look like this:
my @location = (); # Store window position my $i = 0; # initialize counter for ($i = 0; $i * $window < length($gb->{SEQ}); $i ++){ # Execute while Window overflow genome size my $sequence = substr($gb->{SEQ}, $i * $window, $window); # Cut out sequences within the window my $c = $sequence =~ tr/c/c/; # count up Cs my $g = $sequence =~ tr/g/g/; # count up Gs my $skew = ($c-$g)/($c+$g); # Calculate GC skew push (@location, $i * $window); # Store window position push (@gcskew, $skew); # Store GC skew }
Following script saves the output as 'gcskew.csv'.
my $j = 0; open(OUT, '>gcskew.csv'); print OUT "location,GC skew¥n"; for ($j = 0; $j <= $i; $j++){ print OUT $location[$j], ",", $gcskew[$j], "¥n"; } close(OUT);
STEP 3 – Generating graphs
Primary principle of G-langugae GAE is to make analysis more efficient. So G-language GAE provides a method grapher() to automatically draw a graph which save up user’s time.
grapher(¥@x-axis, ¥@value1, ¥@value2 ...) option description x x-axis label y y-axis label x1, x2, ... @value1, @value2, ... filename Filename title title
There are many more options available. For further options see reference. This is it for generating a graph.
Following script opens a graph with much more colorfully generated by gnuplot with attached gimv viewer.
STEP 4 - Done
use strict; use warnings; use G; my $gb = load("ecoli"); my @gcskew = &gcskew($gb, 10000); sub gcskew{ my $gb = shift; my $window = shift; my @gcskew = (); my @location = (); my $i = 0; for ($i = 0; $i * $window < length($gb->{SEQ}); $i ++){ my $sequence = substr($gb->{SEQ}, $i * $window, $window); my $c = $sequence =~ tr/c/c/; my $g = $sequence =~ tr/g/g/; my $skew = ($c-$g)/($c+$g); push (@location, $i * $window); push (@gcskew, $skew); } my $j = 0; open(OUT, '>gcskew.csv'); print OUT "location,GC skew¥n"; for ($j = 0; $j <= $i; $j++){ print OUT $location[$j], ",", $gcskew[$j], "¥n"; } close(OUT); grapher(¥@location, ¥@gcskew, -x=>'bp', -y=>'GC skew', -title=>'GC skew', -filename=>'gcskew.png'); msg_gimv("graph/gcskew.png"); return @gcskew; }