User Tools

Site Tools


gcskewodysseyenglish

Introduction

Sets of subroutines that users made are easily mounted into G-language GAE as standard functions. Same types of analytic tools hold up the development of bioinformatics. So combining sets of subroutines into G-language GAE as an original package is one idea for using it. In this tutorial, we will demonstrate how to combine previously made GC skew subroutines into G-language GAE as standard functions.

STEP 0 - Learn the rules for standard functions

There are several rules for standard functions since it is literally standard for Perl script. These rules are not for regulation purpose but for expanding the usability.

See following documents for detail.

Standard Function Manual

SubOpt API Manual

Messenger API Manual

The essence is as follow:

1: Use SubOpt API to run options
2: All options should possess initial value to abbreviate options for single argument G instance
3: All options should contain “output” and “filename”
4: Use G::Messenger API to run all the outputs
5: One function should consisted of one subroutine
6: Graphs should generated by G::Tools::Graph::grapher
7: All coding should be simple, readable, resourceful, and beautiful

In this tutorial we will skip demonstrating essence #5, 6, and 7.

STEP 1 – Use SubOpt API to run options

SubOpt is specialized for acquiring options in subroutine which is similar to GetOpt API in syntax where it begins in hyphen and ends in a double or single quotation such as –output⇒”show”. This API enhances the resourcefulness of G-language GAE and provoking unity for whole system. Use of SubOpt API is simple. Set a initial value in opt_default(), then fetch the options by opt_get(). In addition, options are referable by opt_val().

In the case of GC skew subroutine, window size is given as option so that scripting as &gcskew($gb, -window⇒"1000"); makes us happy. To achive this, change following script

  sub gcskew { 
    my $gb = shift; 
    my $window = shift;
  } 

to following.

  sub gcskew { 
    my @args = opt_get(@_); 
    my $gb = shift @args; 
    my $window = opt_val("window"); 
  }

Fetching a option by opt_get() and substitute arguments into array @argv other than options. Then extract G instance from the array and also extract options doing opt_val(“window”). It would be nice to substitute a value of option to local variable.

There are some powerful methods ready for SubOpt API. In the case of GC skew subroutine, $gb→{SEQ} is the only required method in G instance. So if users want to analyze partial information on genome or reference much larger sequences to save up some RAM space, it is easily described as follow.

  sub gcskew { 
    my @args = opt_get(@_); 
    my $gb = opt_as_gb(shift @args); 
    my $window = opt_val("window"); 
  }

A function opt_as_gb() will translate $gb→{SEQ} as target sequence whether it is scholar or reference.

STEP 2 – All options should possess initial value to abbreviate options

In the above script, if no option “-window” is set then initial value become zero. To avoid this problem, set initial value by opt_default() as follow.

  sub gcskew{ 
    opt_default("window"=>10000); 
    my @args = opt_get(@_); 
    my $gb = opt_as_gb(shift @args); 
    my $window = opt_val("window"); 
  }

This will set initial value of window size to 10000bp.

STEP 3 – All options should contain “output” and “filename”

The GC skew subroutine made in previous tutorial has two types of output for generating graph. G-language GAE is for efficient programming so that every standard function possesses “-output” option that set to be “f”, “data/”folder as saving directory, “g” for graph output in “graph/” directory and “show” graph automatically in default. The program should be flexible enough to change filename for output. So let’s change the some options and parameters in the script. This change will automatically fix output according to an option.

To be more precise, opt_default() is added, value of opt_val() is subscribed into local variable and file name is switched to $filename to change the output name.

  sub gcskew{ 
     opt_default("window"=>10000, "output"=>"show", "filename"=>"gcskew.png"); 
     my @args = opt_get(@_); 
     my $gb = opt_as_gb(shift @args); 
     my $window = opt_val("window"); 
     my $output = opt_val("output"); 
     my $filename = opt_val("filename");  
     my @gcskew = (); 
     my @location = ();  
 
     my $i = 0;   
     for ($i = 0; $i * $window < length($gb->{SEQ}); $i ++){      
           my $sequence = substr($gb->{SEQ}, $i * $window, $window);   
           my $c = $sequence =~ tr/c/c/;  
           my $g = $sequence =~ tr/g/g/;   
           my $skew = ($c-$g)/($c+$g);    
           push (@location, $i * $window);  
           push (@gcskew, $skew);      
     } 
 
     if ($output eq 'f'){ 
           mkdir ('data', 0777); 
           $filename =~ s/¥.png$/¥.csv$/; 
           my $j = 0; 
           open(OUT, '>data/' . $filename); 
           print OUT "location,GC skew¥n"; 
           for ($j = 0; $j <= $i; $j++){ 
                print OUT $location[$j], ",", $gcskew[$j], "¥n"; 
           } 
           close(OUT); 
     }elsif ($output eq 'g' || $output eq 'show'){ 
           mkdir ('graph', 0777); 
           G::Tools::Graph::grapher(¥@location, ¥@gcskew, -x=>'bp', 
                                           -y=>'GC skew', -title=>'GC skew',        
                                           -filename=>'graph/gcskew.png'); 
 
           system("gimv graph/gcskew.png"); 
     } 
 
     return @gcskew; 
  } 

STEP 4 - Use G::Messenger API to run all the outputs

There is one more important rule for standard function and that is all output should follow G::Messenger API. G-language GAE has multiple interfaces for example loading script into compiler, executing command lines, possessing GUI and managing web applications. In each multiple interfaces, common standard functions are running and this is sustained by G::Messenger API. If some outputs are passed to Messenger, Messenger will indentify which interface user is accessing and puts output to suitable output. Following functions are major functions used in Messenger.

The function msg_send() will put output as standard output, synonymous to “print STDOUT” in Perl script. The function msg_error() will put output as standard error, synonymous to “print STDERR” in Perl script, or system message. The function msg_gimv() will put output into graph which is synonym to system("gimv "). This is it for making the GC skew subroutine into standard function.

  sub gcskew{ 
     opt_default("window"=>10000, "output"=>"show", "filename"=>"gcskew.png"); 
     my @args = opt_get(@_); 
     my $gb = opt_as_gb(shift @args); 
     my $window = opt_val("window"); 
     my $output = opt_val("output"); 
     my $filename = opt_val("filename"); 
 
     my @gcskew = (); 
     my @location = ();  
 
     my $i = 0;   
     for ($i = 0; $i * $window < length($gb->{SEQ}); $i ++){      
           my $sequence = substr($gb->{SEQ}, $i * $window, $window);   
           my $c = $sequence =~ tr/c/c/;  
           my $g = $sequence =~ tr/g/g/;   
           my $skew = ($c-$g)/($c+$g);    
           push (@location, $i * $window);  
           push (@gcskew, $skew);      
     } 
 
     if ($output eq 'f'){ 
           mkdir ('data', 0777); 
           $filename =~ s/¥.png$/¥.csv$/; 
           my $j = 0; 
           open(OUT, '>data/' . $filename); 
           print OUT "location,GC skew¥n"; 
           for ($j = 0; $j <= $i; $j++){ 
                print OUT $location[$j], ",", $gcskew[$j], "¥n"; 
           } 
           close(OUT); 
     }elsif ($output eq 'g' || $output eq 'show'){ 
           mkdir ('graph', 0777); 
           G::Tools::Graph::grapher(¥@location, ¥@gcskew, -x=>'bp', 
                                              -y=>'GC skew', -title=>'GC skew',        
                                             -filename=>'gcskew.png'); 
 
           msg_gimv("graph/gcskew.png"); 
     } 
 
     return @gcskew; 
  } 

STEP 5 - To the higher level

We demonstrated making the GC skew subroutine into standard function but the standard function gcskew() I G-language GAE is much more refined than demonstration in additional AT skew and some other distinguished technical tuning. Check G-language GAE source code for more detail. Some additional information on function standardization is also available on manuals.

We welcome any users to join G-language GAE project and user-made standard function is sharable by simply posting e-mail to glang-devel@lists.sourceforge.jp with subject the name of function and source code as body. Enjoy!

gcskewodysseyenglish.txt · Last modified: 2014/01/18 07:44 (external edit)