lib
G
Summary
G - G-language Genome Analysis Environment core interface module
Package variables
No package variables defined.
Included modules
Inherit
Synopsis
use G; # Imports G-language GAE module
$gb = new G("ecoli.gbk"); # Creates G's instance as $gb
# At the same time, read in ecoli.gbk.
# Read the annotation and sequence
# information
# See DESCRIPTION for details
$gb->seq_info(); # Prints the basic sequence information.
$find_ori_ter($gb); # Give $gb as the first argument to
# most of the analysis functions
Description
The G-language GAE fully supports most sequence databases.
Stored annotation information:
LOCUS
$gb->{LOCUS}->{id} -accession number
$gb->{LOCUS}->{length} -length of sequence
$gb->{LOCUS}->{nucleotide} -type of sequence ex. DNA, RNA
$gb->{LOCUS}->{circular} -1 when the genome is circular.
otherwise 0
$gb->{LOCUS}->{type} -type of species ex. BCT, CON
$gb->{LOCUS}->{date} -date of accession
HEADER
$gb->{HEADER}
COMMENT
$gb->{COMMENT}
FEATURE
Each FEATURE is numbered(FEATURE1 .. FEATURE1172), and is a
hash structure that contains all the keys of Genbank.
In other words, in most cases, FEATURE$i's hash at least
contains informations listed below:
$gb->{FEATURE$i}->{start}
$gb->{FEATURE$i}->{end}
$gb->{FEATURE$i}->{direction}
$gb->{FEATURE$i}->{join}
$gb->{FEATURE$i}->{note}
$gb->{FEATURE$i}->{type} -CDS,gene,RNA,etc.
To analyze each FEATURE, write:
foreach my $feature ($gb->feature()){
print $gb->{$feature}->{type}, "\n";
}
Each CDS is stored in a similar manner.
There are
$gb->{CDS$i}->{start}
$gb->{CDS$i}->{end}
$gb->{CDS$i}->{direction}
$gb->{CDS$i}->{join}
$gb->{CDS$i}->{feature} -number $n for $gb->{FEATURE$n}
where "CDS$i" = "FEATURE$n"
In the same manner, to analyze all CDS, write:
foreach my $cds ($gb->cds()){
print $gb->{$cds}->{gene}, "\n";
}
BASE COUNT
$gb->{BASE_COUNT}
SEQ
$gb->{SEQ} -sequence data following "ORIGIN"
Methods description
None available.
Methods code
No methods available.
General documentation
| Supported methods of G-language Genome Analysis Environment | Top |
| $gb = new G("genome file") | Top |
Name: $gb = new G("genome file") - create a G instance
Creates a G instance.
First option is the filename of the database. Default format is
the GenBank database. Database format is guessed from the extensions.
(eg. .gbk => GenBank, .fasta => FASTA, .embl => EMBL)
There are also several sample bacterial genomes included in the system.
$eco = new G("ecoli"); # Escherichia coli K12 MG1655 - NC_000913
$bsub = new G("bsub"); # Bacillus subtilis - NC_000964
$mgen = new G("mgen"); # Mycoplasma genitalium - NC_000908
$cyano = new G("cyano"); # Synechococcus sp. - NC_005070
$pyro = new G("pyro"); # Pyrococcus furiosus - NC_003413
Second option specifies detailed actions.
'no msg' suprresses all STDOUT messages printed
when loading a database, including the
copyright info and sequence statistics.
'no cache' suppresses the use of database caching.
By default, databases are cached for
optimized performance. (since v.1.6.4)
'force cache' rebuilds database cache.
'multiple locus' this option merges multiple loci in the
database and load the information
as G-language instance.
'bioperl' this option creates a G instance from
a bioperl object.
eg. $bp = $bp->next_seq(); # bioperl
$gb = new G($bp, "bioperl"); # G
'longest ORF annotation' this option predicts genes with longest ORF
algorithm (longest frame from start codon
to stop codon, with more than 17 amino
acids) and annotates the sequence.
'glimmer annotation' this option predicts genes using glimmer2,
a gene prediction software for microbial
genomes available from TIGR.
http://www.tigr.org/softlab/
Local installation of glimmer2 and setting
of PATH environment value is required.
- following options require bioperl installation -
'Fasta' this option loads a Fasta format database.
'EMBL' this option loads a EMBL format database.
'swiss' this option loads a swiss format database.
'SCF' this option loads a SCF format database.
'PIR' this option loads a PIR format database.
'GCG' this option loads a GCG format database.
'raw' this option loads a raw format database.
'ace' this option loads a ace format database.
'net GenBank' this option loads a GenBank format database from
NCBI database. With this option, the first value to
pass to new() function will be the accession
number of the database.
Name: $gb->output() - output the G instance data to file
Description:
Given a filename and an option, outputs the G-language data object
to the specified file in a flat-file database of a given format.
The options are the same as those of new(). Default format is 'GenBank'.
eg. $gb->output("my_genome.embl", "EMBL");
$gb->output("my_genome.gbk"); # with GenBank you can ommit the option.
Name: complement - get the complementary nucleotide sequence
Description:
Given a sequence, returns its complement.
eg. complement('atgc'); # returns 'gcat'
Name: translate - translate a nucleotide sequence to amino acid sequence
Description:
Given a sequence, returns its translated sequence.
Regular codon table is used.
eg. translate('ctggtg'); # returns 'LV'
Name: $gb->seq() - get the sequence data from G instance
Description:
Returns the entire sequence. Same as $gb->{SEQ};
Name: $gb->seq_info() - display basic statistics about the data
Description:
Prints the basic information of the genome to STDOUT.
Name: $gb->getseq() - get nucleotide sequence of the given positions (Perl coordinates)
Description:
Given the start and end positions (starting from 0 as in Perl),
returns the sequence specified.
eg. $gb->getseq(1,3); # returns the 2nd, 3rd, and 4th nucleotides.
Name: $gb->get_gbkseq() - get nucleotide sequence of the given positions (GenBank coordinates)
Description:
Given the start and end positions (starting from 1 as in
Genbank), returns the sequence specified.
eg. $gb->get_gbkseq(1,3); # returns the 1st, 2nd, and 3rd nucleotides.
Name: $gb->get_cdsseq() - get nucleotide sequence of the given CDS
Description:
Given a CDS ID, returns the CDS sequence.
'complement' is properly parsed.
eg. $gb->get_cdsseq('CDS1'); # returns the 'CDS1' sequence.
Name: $gb->get_geneseq() - get nucleotide sequence of the given gene
Description:
Given a CDS ID, returns the CDS sequence, or the exon sequence
If introns are present.
'complement' is properly parsed, and introns are spliced out.
eg. $gb->get_geneseq('CDS1'); # returns the 'CDS1' sequence or exon.
Name: $gb->feature() - get a list of feature IDs
Description:
Returns the array of all feature IDs.
Features are ignored when $gb->{$feature}->{on} is 0.
eg.
foreach ($gb->feature()){
$gb->get_cdsseq($_);
}
#prints all feature sequences.
Optionally, feature type can be supplied to return only the
specifies features.
eg. $gb->feature("tRNA"); # returns feature IDs only for tRNAs
Option of "all" always returns all features regardless of the
value of $gb->{$feature}->{on}.
Name: $gb->cds() - get a list of CDS IDs
Description:
Returns the array of all feature IDs of CDS.
Features are ignored when $gb->{FEATURE$i}->{on} OR
$gb->{CDS$i}->{on} is 0.
!CAUTION! the object name is actually the FEATURE ID,
to enable access to all feature values. However, most of the
time you do not need to be aware of this difference.
eg.
foreach ($gb->cds()){
$gb->get_geneseq($_);
}
#prints all gene sequences.
Option of "all" always returns all features regardless of the
value of $gb->{$feature}->{on}.
Name: $gb->tRNA() - get a list of feature IDs of tRNAs
Description:
Returns the array of all feature IDs of tRNAs.
Name: $gb->rRNA() - get a list of feature IDs of rRNAs
Description:
Returns the array of all feature IDs of rRNAs.
Name: $gb->intergenic() - get a list of IDs of intergenic regions
Description:
Returns the array of all IDs of intergenic regions.
Name: $gb->gene() - get a list of feature IDs of genes
Description:
Returns the array of all feature IDs of genes.
Name: $gb->next_feature() - get the next feature ID
Description:
Given a feature ID, returns the ID of the next feature.
Second argument can be used to specify the type of the
next feature.
eg. $gb->next_feature(FEATURE1234); # returns 'FEATURE1235'
$gb->next_feature(FEATURE1234, 'tRNA');
# returns next feature ID whose type is 'tRNA'
Name: $gb->next_cds() - get the feature ID of next CDS
Description:
Given a feature ID, returns the ID of the next cds.
This is same as $gb->next_feature($featureID, 'CDS');
| $gb->previous_feature() | Top |
Name: $gb->previous_feature() - get the previous feature ID
Description:
Given a feature ID, returns the ID of the previous feature.
Second argument can be used to specify the type of the
next feature.
eg. $gb->previous_feature(FEATURE1234); # returns 'FEATURE1233'
$gb->previous_feature(FEATURE1234, 'tRNA');
# returns previous feature ID whose type is 'tRNA'
Name: $gb->previous_cds() - get the feature ID of previous CDS
Description:
Given a feature ID, returns the ID of the previous cds.
This is same as $gb->previous_feature($featureID, 'CDS');
Name: $gb->startcodon() - get the start codon of the given CDS
Description:
Given a CDS ID, returns the start codon.
eg. $gb->startcodon("FEATURE$i"); # returns 'atg'
Name: $gb->stopcodon() - get the stop codon of the given CDS
Description:
Given a CDS ID, returns the stop codon.
eg. $gb->stopcodon("FEATURE$i"); # returns 'tag'
| $gb->before_startcodon() | Top |
Name: $gb->before_startcodon() - get the upstream sequence of the given CDS
Description:
Given a CDS ID and length, returns the sequence upstream of
start codon.
eg. $gb->before_startcodon('CDS1', 100);
# returns 100 bp sequence upstream of the start codon of 'CDS1'.
| $gb->after_startcodon() | Top |
Name: $gb->after_startcodon() - get the sequence downstream of start codon of the given CDS
Description:
Given a CDS ID and length, returns the sequence downstream of
start codon.
eg. $gb->after_startcodon('CDS1', 100);
# returns 100 bp sequence downstream of the start codon of 'CDS1'.
| $gb->before_stopcodon() | Top |
Name: $gb->before_stopcodon() - get the sequence upstream of stop codon of the given CDS
Description:
Given a CDS ID and length, returns the sequence upstream of
stop codon.
eg. $gb->before_stopcodon('CDS1', 100);
# returns 100 bp sequence upstream of the stop codon of 'CDS1'.
| $gb->after_stopcodon() | Top |
Name: $gb->after_stopcodon() - get the downstream sequence of the given CDS
Description:
Given a CDS ID and length, returns the sequence downstream of
stop codon.
eg. $gb->after_stopcodon('CDS1', 100);
# returns 100 bp sequence downstream of the stop codon of 'CDS1'.
Name: $gb->get_exon() - get a list of exon sequences of the given CDS
Description:
Given a CDS ID, returns the exon sequence.
'complement' is properly parsed, and introns are spliced out.
eg. $gb->get_exon('CDS1'); returns the 'CDS1' exon.
Name: $gb->intron() - get a list of intron sequences of the given CDS
Description:
Given a CDS ID, returns the intron sequences as array of
sequences.
eg. $gb->get_intron('CDS1');
# returns ($1st_intron, $2nd_intron,..)
Name: $gb->pos2feature() - get a feature ID from position
Description:
Given a GenBank position (sequence starting from position 1)
returns the feature ID (ex. FEATURE123) of the feature at
the given position. If multiple features exist for the given
position, the first feature to appear is returned. Returns
NULL if no feature exists.
Name: $gb->pos2gene() - get a feature ID of CDS from position
Description:
Given a GenBank position (sequence starting from position 1)
returns the feature ID (ex. FEATURE123) of the gene at
the given position. If multiple genes exists for the given
position, the first gene to appear is returned. Returns
NULL if no gene exists.
Name: $gb->gene2id() - get a feature ID from canonical gene name
Description:
Given a GenBank gene name, returns the feature ID (ex. FEATURE123).
Returns NULL if no gene exists.
Name: $gb->next_locus() - read the next locus and update the G instance
Description:
Reads the next locus.
the G instance is then updated.
eg.
do{
}while($gb->next_locus());
# Enables multiple loci analysis.
Name: $gb->clone() - create a copy of the G instance
Description:
Returns cloned G instance, which is a new G instance with
identical data.
Name: $gb->del_key() - delete a data object from G instance
Description:
Given a object, deletes it from the G instance structure
eg. $gb->del_key('FEATURE1'); # deletes 'FEATURE1' hash