lib G
SummaryIncluded librariesPackage variablesSynopsisDescriptionGeneral documentationMethods
Summary
G - G-language Genome Analysis Environment core interface module
Package variables
Privates (from "my" definitions)
@return;
Included modules
G::DB::BDB
G::DB::SDB
G::DynamicLoader
G::Messenger
G::Seq::Align
G::Seq::AminoAcid
G::Seq::Codon
G::Seq::Consensus
G::Seq::GCskew
G::Seq::GenomeMap
G::Seq::Operon
G::Seq::OverLapping
G::Seq::PatSearch
G::Seq::Primitive
G::Seq::Tandem
G::Seq::Util
G::Shell::EUtils
G::Shell::Help
G::Tools::Alignment
G::Tools::COGs
G::Tools::DotE
G::Tools::EMBOSS
G::Tools::GMap
G::Tools::GOA
G::Tools::GPAC
G::Tools::Graph
G::Tools::Statistics
G::Tools::WebServices
Rcmd
SubOpt
Inherit
Exporter G::IO
Synopsis
 use G;                          # Imports G-language GAE module 
   
 $gb = new G("ecoli.gbk");       # Creates G's instance as $gb 
 $gb =  load("ecoli.gbk");       # this line is same as the above.
                                 # At the same time, read in ecoli.gbk. 
                                 # Read the annotation and sequence 
                                 # information 
                                 # See DESCRIPTION for details
   
 $gb->seq_info();                # Prints the basic sequence information.

 find_ori_ter($gb);              # Give $gb as the first argument to 
                                 # most of the analysis functions
Description
 The G-language GAE fully supports most sequence databases.

 Stored annotation information:

 LOCUS  
         $gb->{LOCUS}->{id}              -accession number 
         $gb->{LOCUS}->{length}          -length of sequence  
         $gb->{LOCUS}->{nucleotide}      -type of sequence ex. DNA, RNA  
         $gb->{LOCUS}->{circular}        -1 when the genome is circular.
                                          otherwise 0
         $gb->{LOCUS}->{type}            -type of species ex. BCT, CON  
         $gb->{LOCUS}->{date}            -date of accession 

 HEADER  
         $gb->{HEADER}  
         $gb->{DEFINITION}
         $gb->{ACCESSION}
         $gb->{SOURCE}
         $gb->{ORGANISM}

         $gb->{TAXONOMY}->{all}          -same as $gb->{TAXONOMY}->{1}
         $gb->{TAXONOMY}->{domain}       -same as $gb->{TAXONOMY}->{2}
         $gb->{TAXONOMY}->{phylum}       -same as $gb->{TAXONOMY}->{3}
         $gb->{TAXONOMY}->{class}        -same as $gb->{TAXONOMY}->{4}
         $gb->{TAXONOMY}->{order}}       -same as $gb->{TAXONOMY}->{5}
         $gb->{TAXONOMY}->{family}       -same as $gb->{TAXONOMY}->{6}
         $gb->{TAXONOMY}->{genus}
         $gb->{TAXONOMY}->{species}

 COMMENT  
         $gb->{COMMENT}  

 FEATURE  
         Each FEATURE is numbered(FEATURE1 .. FEATURE1172), and is a 
         hash structure that contains all the keys of Genbank.   
         In other words,  in most cases, FEATURE$i's hash at least 
         contains informations listed below: 
         $gb->{FEATURE$i}->{start}  
         $gb->{FEATURE$i}->{end}  
         $gb->{FEATURE$i}->{direction}
         $gb->{FEATURE$i}->{join}
         $gb->{FEATURE$i}->{note}  
         $gb->{FEATURE$i}->{type}        -CDS,gene,RNA,etc.
         $gb->{FEATURE$i}->{feature}     -same as $i

         To analyze each FEATURE, write: 

         foreach my $feature ($gb->feature()){
               print $gb->{$feature}->{type}, "\n";
         }  

         In the same manner, to analyze all CDS, write:  
 
         foreach my $cds ($gb->cds()){
               print $gb->{$cds}->{gene}, "\n";
         }

         Feature or gene information can also be accessed with CDS numbers:
         $gb->{CDS$i}->{start}

         or with locus_tags or gene names (for CDS, tRNA, and rRNA)
         $gb->{thrL}->{start}
         $gb->{b0001}->{start}

 BASE COUNT  
         $gb->{BASE_COUNT}  

 SEQ  
         $gb->{SEQ}              -sequence data following "ORIGIN" 

         or
 
         $gb->seq()
Methods
loadDescriptionCode
method_listDescriptionCode
opt_list
No description
Code
Methods description
loadcode    nextTop
     Name: load   -   load genome databases

         This funciton is used to load genome databases into memory.
         First option is the filename of the database. Default format is
         the GenBank database. Database format is guessed from the extensions.
         (eg. .gbk => GenBank, .fasta => FASTA, .embl => EMBL)
         Flatfile can be gzipped. If the file extension ends with ".gz",
         load() can automatically handle it as compressed file.

         There are also several sample bacterial genomes included in the system.
         $eco   = load("ecoli");    # Escherichia coli K12 MG1655 - NC_000913
         $bsub  = load("bsub");     # Bacillus subtilis           - NC_000964
         $mgen  = load("mgen");     # Mycoplasma genitalium       - NC_000908
         $cyano = load("cyano");    # Synechococcus sp.           - NC_005070
         $pyro  = load("pyro");     # Pyrococcus furiosus         - NC_003413
         $bbur  = load("bbur");     # Borrelia burgdorferi B31    - NC_001318
         $plasF = load("plasmidf"); # Plasmid F                   - NC_002483

         Data can be automatically donwloaded from public databases using
         Uniform Sequence Address (USA) keys.
         http://emboss.sourceforge.net/docs/themes/UniformSequenceAddress.html
Currently supported database keys are:
swiss, genbank, genpept, embl, refseq
eg.
$gb = load("embl:xlrhodop");
$gb = load("genbank:AY063336")
$gb = load("swiss:ROA1_HUMAN")
Subsequent arguments work as options. Multiple options can be given in any order. 'no msg' suprresses all STDOUT messages printed when loading a database, including the copyright info and sequence statistics. 'no cache' suppresses the use of database caching. By default, databases are cached for optimized performance. (since v.1.6.4) 'force cache' rebuilds database cache. 'multiple locus' this option merges multiple loci in the database and load the information as G-language instance. 'bioperl' this option creates a G instance from a bioperl object. eg. $bp = $bp->next_seq(); # bioperl $gb = load($bp, "bioperl"); # G 'longest ORF annotation' this option predicts genes with longest ORF algorithm (longest frame from start codon to stop codon, with more than 17 amino acids) and annotates the sequence. 'Fasta' this option loads a Fasta format database. 'Fastq' this option loads a FastQ format database. 'EMBL' this option loads a EMBL format database. - following options require bioperl installation - 'swiss' this option loads a swiss format database. 'SCF' this option loads a SCF format database. 'PIR' this option loads a PIR format database. 'GCG' this option loads a GCG format database. 'raw' this option loads a raw format database. 'ace' this option loads a ace format database. 'net GenBank' this option loads a GenBank format database from NCBI database. With this option, the first value to pass to load() function will be the accession number of the database.
method_listcodeprevnextTop
   Name: method_list   -   get the list of availabel G-language GAE functions

   Description:
         Returns an array of available method names. 
         When 1 is supplied as an argument, returns an array of API-related
         method names.

         eg. @methods = method_list();     # contains more than 100 analysis functions
             @APImethods = method_list(1); # contains around 50 API-related methods.

   REST: 
      http://rest.g-language.org/method_list
Methods code
loaddescriptionprevnextTop
sub load {
    return new G(@_);
}
method_listdescriptionprevnextTop
sub method_list {
    my $opt = shift;
    my %system;

    for my $name (qw/
p puts say readFile writeFile
opt_as_gb opt_default opt_get opt_list opt_val
msg_ask_interface msg_error msg_send msg_gimv msg_interface msg_percent msg_progress msg_set_gimv msg_system_console msg_term_console
sdb_exists sdb_load sdb_save _sdb_path _set_sdb_path
db_dbi db_exists db_load db_path db_save db_set_path
pass_send pass_get
/
){ $system{$name} ++;
}
opt_listdescriptionprevnextTop
sub opt_list {
    my $sub = shift;

    SubOpt::opt_default();
    SubOpt::set_opt_list(1);
    eval("&{$sub}");
    SubOpt::set_opt_list(0);

    return opt_val();
}
General documentation
Supported methods of G-language Genome Analysis EnvironmentTop
$gb = new G("genome file")Top
     Name: $gb = new G("genome file")   -   create a G instance

     see "help load" for more information.
$gb->next_locus()Top
   Name: $gb->next_locus()   -   read the next locus and update the G instance

   Description:
         Reads the next locus.
         the G instance is then updated.

         eg. 
           do{
  
           }while($gb->next_locus());
           #  Enables multiple loci analysis.        

   REST: 
      http://rest.g-language.org/NC_000913/next_locus
SEE ALSOTop
G::IO::Handler
AUTHORTop
Kazuharu Arakawa, gaou@sfc.keio.ac.jp