G::Seq

Codon

Included libraries Package variables General documentation Methods

Package variables top
Globals (from use vars definitions)
@EXPORT
$VERSION
@EXPORT_OK
Privates (from my definitions)
$tnc; ### total number of codons in the gene my %tbox;
$Sc; ### species richness of codon (i.e. number of distinct codons) foreach (keys %count){ if($del_key && $_ = ~ /$del_key/){ delete $count{$_}; next; } if($_ =~ /[A-Z]$/){ $tna += $count{$_}; } if($_ =~ /[A-Z][a-z]{3}/){ $tnc += $count{$_}
%rank;
%max;
%AA3LTTR = ('A'=>'Ala', 'C'=>'Cys', 'D'=>'Asp', 'E'=>'Glu', 'F'=>'Phe', 'G'=>'Gly', 'H'=>'His', 'I'=>'Ile', 'K'=>'Lys', 'L'=>'Leu', 'M'=>'Met', 'N'=>'Asn', 'P'=>'Pro', 'Q'=>'Gln', 'R'=>'Arg', 'S'=>'Ser', 'T'=>'Thr', 'V'=>'Val', 'W'=>'Trp', 'Y'=>'Tyr', '/'=>' - ', 'U'=>'Sec')
$codon = uc(substr($_, 1, 3))
$value;
$gene = ($id =~ /FEATURE/) ? $gb->{$id}->{gene} : "{$id}"
($me, $n_lt, $n_eq);
@keysAll = sort keys %{$gb->{All}->{$type}}
%Parkeretal = ('A'=> 2.1, 'C'=> 1.4, 'D'=>10, 'E'=> 7.8, 'F'=>-9.2, 'G'=> 5.7, 'H'=> 2.1, 'I'=>-8.0, 'K'=> 5.7, 'L'=>-9.2, 'M'=>-4.2, 'N'=> 7.0, 'P'=> 2.1, 'Q'=> 6.0, 'R'=> 4.2, 'S'=> 6.5, 'T'=> 5.2, 'V'=>-3.7, 'W'=>-10, 'Y'=>-1.9)
%tbox; ### total number of codons in each amino acid box my $Sc;
%HoppWoods = ('A'=>-0.5, 'C'=>-1.0, 'D'=> 3.0, 'E'=> 3.0, 'F'=>-2.5, 'G'=> 0.0, 'H'=>-0.5, 'I'=>-1.8, 'K'=> 3.0, 'L'=>-1.8, 'M'=>-1.3, 'N'=> 0.2, 'P'=> 0.0, 'Q'=> 0.2, 'R'=> 3.0, 'S'=> 0.3, 'T'=>-0.4, 'V'=>-1.5, 'W'=>-3.4, 'Y'=>-2.3)
%degeneracy;
$total;
%Table = ('gca'=>'A','gcc'=>'A','gcg'=>'A','gct'=>'A', 'tgc'=>'C','tgt'=>'C', 'gac'=>'D','gat'=>'D', 'gaa'=>'E','gag'=>'E', 'ttc'=>'F','ttt'=>'F', 'gga'=>'G','ggc'=>'G','ggg'=>'G','ggt'=>'G', 'cac'=>'H','cat'=>'H', 'ata'=>'I','atc'=>'I','att'=>'I', 'aaa'=>'K','aag'=>'K', 'cta'=>'L','ctc'=>'L','ctg'=>'L','ctt'=>'L','tta'=>'L','ttg'=>'L', 'atg'=>'M', 'aac'=>'N','aat'=>'N', 'cca'=>'P','ccc'=>'P','ccg'=>'P','cct'=>'P', 'caa'=>'Q','cag'=>'Q', 'aga'=>'R','agg'=>'R','cga'=>'R','cgc'=>'R','cgg'=>'R','cgt'=>'R', 'agc'=>'S','agt'=>'S','tca'=>'S','tcc'=>'S','tcg'=>'S','tct'=>'S', 'aca'=>'T','acc'=>'T','acg'=>'T','act'=>'T', 'gta'=>'V','gtc'=>'V','gtg'=>'V','gtt'=>'V', 'tgg'=>'W', 'tac'=>'Y','tat'=>'Y', 'taa'=>'/','tag'=>'/','tga'=>'/')
%usage = %count
%KyteDoolittle = ('A'=> 1.8, 'C'=> 2.5, 'D'=>-3.5, 'E'=>-3.5, 'F'=> 2.8, 'G'=>-0.4, 'H'=>-3.2, 'I'=> 4.5, 'K'=>-3.9, 'L'=> 3.8, 'M'=> 1.9, 'N'=>-3.5, 'P'=>-1.6, 'Q'=>-3.5, 'R'=>-4.5, 'S'=>-0.8, 'T'=>-0.7, 'V'=> 4.2, 'W'=>-0.9, 'Y'=>-1.3)
$tna; ### total number of amino acids my $tnc;
Included modulestop
G::Messenger
G::Seq::Util
GD(1)
GD(2)
SubOpt
strict
Inherit top
AutoLoader Exporter
Synopsistop
No synopsis!
Descriptiontop
No description!
Methodstop
_distance_cuDescriptionCode
codon_compilerDescriptionCode
newNo descriptionCode

Methods description

_distance_cucodetopprevnext
 _distance_cu ver.20030401
Author: Haruo Suzuki (haruo@g-language.org)
Usage: scalar = &_distance_cu(pointer HASH_1, pointer HASH_2, metric);
Options:
-usage1 pointer HASH of vectors of one group of genes
-usage2 pointer HASH of vectors of a second group of genes
-metric distance metric (default:'manhattan')
'1-r': one minus correlation coefficient between the two vectors
'euclidean': square distance between the two vectors
'manhattan': absolute distance between the two vectors
'codon': normalized absolute distance for codon usage analysis
(running from 0 to 2)
Description:
This program measures distance between genes based on the
differences in amino acid and codon usage
codon_compilercodetopprevnext
    
codon_compiler ver.20030401
Author: Haruo Suzuki (haruo@g-language.org)
Usage: pointer HASH = &codon_compiler(pointer G instance);
Options:
-output output option (default: 'show')
-filename output filename (default: 'usage.csv')
-id ID of a group of genes or a single gene (default:'All')
-translate 1 when translates using regular codon table (default: 0)
-del_key regular expression to delete amino acid and codon keys
(default:'[^ACDEFGHIKLMNPQRSTVWYatgcU]')
-type type of amino acid and codon usage (default:'C0')
'A0': Number of amino acid
'A1': Relative use of amino acid
'C0': Number of codon
'C1': Relative use of codon normalized by total number of codons
'C2': Relative use of codon normalized by each amino acid totals
'C3': Relative synonymous codon usage (RSCU)
'C4': Ratio to maximum of minor codon
'C5': Maximum (1) or minor (0) codon
-rank 1 when rank ordering (default: 0)
Description:
This program characterizes various types of amino acid and codon
usage, and inputs in the G instance. e.g. usage of Met codon ATG
will be accessible at $gb->{$id}->{$type}->{'Matg'}
Informations stored in the G instance:
$gb->{$id}->{tnc}: total number of codons excluding start & stop codons
$gb->{$id}->{Sc}: species richness of codon (i.e. number of distinct codons)
$gb->{degeneracy}->{*}: degree of codon degeneracy for amino acid '*'
$gb->{"legend_codon_compiler_$type"}
$gb->{errormsg_codon_compiler}
Normalization procedures:
Five types of normalization procedures, C1, C2, C3, C4, and C5,
are applied to describe codon usage. First, to exclude the effect
of the gene size, data are normalized for the pooled totals (C1).
Second, to exclude the effect of the amino acid composition,
data are normalized for each individual amino acid (C2). Third,
to exclude the effect of the synonymous codon number in each amino
acid box, the relative synonymous codon usage (RSCU) is calculated.
The RSCU is the observed codon frequency divided by that expected
when all codons for the same amino acid are used equally (C3).
Fourth, the relative adaptiveness (W) of each codon is calculated.
The W is the ratio of the usage of each codon, to that of the most
abundant codon for the same amino acid (C4). Fifth, binary number,
1 or 0 refering to maximum or minor codon, is assigned (C5).
History:
20030401 integration of codon_counter, amino_counter, codon_usage,
_codon_amino_printer, and _codon_usage_printer written by
Koya Mori (mory@g-language.org)

Methods code

_distance_cudescriptiontopprevnext
sub _distance_cu {
    &opt_default(usage1=>'', usage2=>'', metric=>'manhattan');
    my @args = opt_get(@_);
    my $usage1 = opt_val("usage1");
    my $usage2 = opt_val("usage2");
    my $metric = opt_val("metric");
    my $dist;

    if(!$usage1 || !$usage2){ &msg_error("ERROR: given pointer HASH is empty.\n"); return; }

    my (%key, $aa, $tot_1, $one_1, %tbox1, %tbox2, %either, %both);
    foreach (keys %{$usage1}){
	$key{$_} ++;
	$tot_1 += $usage1->{$_};
	next if(length($_) != 4);
	$one_1 ++ if($usage1->{$_} == 1);
	$tbox1{substr($_, 0, 1)} += $usage1->{$_};
    }
    foreach (keys %{$usage2}){
	$key{$_} ++;
	next if(length($_) != 4);
	$tbox2{substr($_, 0, 1)} += $usage2->{$_};
    }
    foreach (keys %key){
	next if(length($_) != 4);
	$aa = substr($_, 0, 1);
	if($tbox1{$aa} || $tbox2{$aa}){ $either{$aa} = 1; }
	if($tbox1{$aa} && $tbox2{$aa}){ $both{$aa} = 1; }
	elsif($metric =~ /!/ && int($tot_1 + 0.5) != 1){ delete $key{$_}; }
    }

    if($metric =~ /1-r/){ ## 1 - r (where r is a correlation coefficient)
my ($N, $M1, $M2, $S11, $S22, $S12); foreach (keys %key){ $M1 += $usage1->{$_}; $N ++; } foreach (keys %key){ $M2 += $usage2->{$_}; } $M1 /= $N; $M2 /= $N; foreach (keys %key){ $S11 += ($usage1->{$_} - $M1) ** 2; $S22 += ($usage2->{$_} - $M2) ** 2; $S12 += ($usage1->{$_} - $M1) * ($usage2->{$_} - $M2); } $dist = 1 - $S12 / sqrt($S11 * $S22);
} if($metric =~ /euclidean/){ ## Euclidean distance
foreach (keys %key){ $dist += ($usage1->{$_} - $usage2->{$_}) ** 2; } $dist = sqrt($dist); } if($metric =~ /manhattan/){ ## Manhattan distance
foreach (keys %key){ $dist += abs($usage1->{$_} - $usage2->{$_}); } } if($metric =~ /codon/){ ## Normalized Manhattan distance
if(int($tot_1 + 0.5) == 1) { &msg_error("ERROR: Invalid distance measure (type A1 or C1)\n"); return; } if($one_1 >= scalar keys %tbox1) { &msg_error("ERROR: Invalid distance measure (type A0, C0, C4 or C5)\n"); return; } foreach (keys %key){ $dist += abs($usage1->{$_} - $usage2->{$_}); } my $total; foreach $aa (keys %both){ $total += $tbox1{$aa}; } if($total == scalar keys %both){ ## type C2
if($metric =~ /!/){ $dist /= scalar keys %both; }
else {
$dist /= scalar keys %either; } } if($total > scalar keys %both){ $dist /= scalar keys %key; } # type C3
if(
$dist > 2){ &msg_error("ERROR: Invalid distance measure (type C0)\n"); return; }
}

return
$dist;
} =head2 shannon_cu shannon_cu ver.20030401 Author: Haruo Suzuki (haruo@g-language.org) Usage: NULL = &shannon_cu(pointer G instance); Options: -output output option (default: 'stdout') -filename output filename (default: "shannon_cu.csv") -id ID of a group of genes or a single gene (default:'All') -translate 1 when translates using regular codon table (default: 0) -del_key regular expression to delete amino acid and codon keys (default:'[^ACDEFGHIKLMNPQRSTVWYatgcU]') Description: This program calculates Shannon information-based indices of amino acid and codon usage, and inputs in the G instance. Informations stored in the G instance: $gb->{$id}->{Haau}: entropy in amino acid usage of protein $gb->{$id}->{Hcu}: entropy in codon usage $gb->{$id}->{D_}: D value proposed by Konopka (1984) $gb->{$id}->{Dsyn}: Dsyn value proposed by Konopka (1984) $gb->{"legend_shannon_cu"} Reference: Konopka (1984) J.Theor.Biol. 107:697-705. Requirements: sub codon_compiler(); =cut sub shannon_cu { &opt_default(output=>'stdout', filename=>"shannon_cu.csv", id=>'All', translate=>0, del_key=>'[^ACDEFGHIKLMNPQRSTVWYatgcU]'); my @args = opt_get(@_); my $gb = opt_as_gb(shift @args); my $output = opt_val("output"); my $filename = opt_val("filename"); my $id = opt_val("id"); my $translate = opt_val("translate"); my $del_key = opt_val("del_key"); my $aacu = &codon_compiler($gb, -output=>'n', -id=>$id, -type=>'A1C1', -translate=>$translate, -del_key=>$del_key); my $gene = ($id =~ /FEATURE/) ? $gb->{$id}->{gene} : "{$id}"; my ($Saa, $Haau, $He, $Sc, $Hcu); foreach (keys %{$aacu}){ if($_ =~ /^[A-Z]$/){ $Saa ++; $Haau += - $aacu->{$_} * log($aacu->{$_}) / log(2);
$He += - $aacu->{$_} * log($aacu->{$_} / $gb->{degeneracy}->{$_}) / log(2); } if($_ =~ /^[A-Z][a-z]{3}$/){ $Sc ++; $Hcu += - $aacu->{$_} * log($aacu->{$_}) / log(2);
} } if($Saa){ ## input in the G instance
$gb->{$id}->{Haau} = sprintf "%.4f", $Haau; $gb->{$id}->{Hcu} = sprintf "%.4f", $Hcu; $gb->{$id}->{D_} = sprintf "%.4f", (6 - $Hcu) / (6 - $Haau);
$gb->{$id}->{Dsyn} = sprintf "%.4f", ($He - $Hcu) / ($He - $Haau);
} if($output eq 'stdout'){ unless($gb->{"legend_shannon_cu"}){ msg_send("\n\n", "Shannon information theoretic computation of amino acid and codon usage\n", " Lp: protein length\n", " Saa: species richness of amino acid (i.e. number of distinct amino acids)\n", " Sc: species richness of codon (i.e. number of distinct codons)\n", " Haau: Shannon uncertainty (entropy) in amino acid usage of protein\n", " Hcu: Shannon uncertainty (entropy) in codon usage\n", " D_: Relative deviation from equiprobability in codon usage\n", " Dsyn: Relative deviation from equiprobability in synonymous codon usage\n", "-----------------------------------------------------------------------\n", "Lp\tSaa\tSc\tHaau\tHcu\tD_\tDsyn\tgene\n", "-----------------------------------------------------------------------\n"); $gb->{"legend_shannon_cu"} = 1; } msg_send( $gb->{$id}->{tnc}, "\t", "$Saa\t$Sc\t", $gb->{$id}->{Haau}, "\t", $gb->{$id}->{Hcu}, "\t", $gb->{$id}->{D_}, "\t", $gb->{$id}->{Dsyn}, "\t", $gene, "\n"); } if($output eq 'f'){ open(FILE,">>$filename"); unless($gb->{"legend_shannon_cu"}){ print FILE "\n\n", "Shannon information theoretic computation of amino acid and codon usage\n", " Lp: protein length\n", " Saa: species richness of amino acid (i.e. number of distinct amino acids)\n", " Sc: species richness of codon (i.e. number of distinct codons)\n", " Haau: Shannon uncertainty (entropy) in amino acid usage of protein\n", " Hcu: Shannon uncertainty (entropy) in codon usage\n", "Lp,Saa,Sc,Haau,Hcu,D_,Dsyn,gene,product\n"; $gb->{"legend_shannon_cu"} = 1; } print FILE $gb->{$id}->{tnc}, ",", "$Saa,$Sc,", $gb->{$id}->{Haau}, ",", $gb->{$id}->{Hcu}, ",", $gb->{$id}->{D_}, ",", $gb->{$id}->{Dsyn}, ",", $gene, ",", $gb->{$id}->{product}, "\n"; close(FILE); } } =head2 enc enc ver.20030401 Author: Haruo Suzuki (haruo@g-language.org) Usage: scalar = &enc(pointer G instance); Options: -id ID of a group of genes or a single gene (default:'All') -translate 1 when translates using regular codon table (default: 0) -del_key regular expression to delete amino acid and codon keys (default:'[^ACDEFGHIKLMNPQRSTVWYatgcU]') Description: This program calculates the 'effective number of codons' (Nc) Reference: Wright F (1990) Gene 87:23-29. Requirements: sub codon_compiler(); =cut sub enc { &opt_default(id=>'All', translate=>0, del_key=>'[^ACDEFGHIKLMNPQRSTVWYatgcU]'); my @args = opt_get(@_); my $gb = opt_as_gb(shift @args); my $id = opt_val("id"); my $translate = opt_val("translate"); my $del_key = opt_val("del_key"); my %tbox; ## total number of codons in each amino acid box
my $aa; ## amino acid
my %relative; ## relative use of each codon
my %F; ## homozygosity (F) for each aa
my %cntaa; ## Number of members (aa) for each SF type in the gene
my %numaa; ## Number of members (aa) for each SF type in all genes
my %sumF; ## sum of F value in the SF type
my %aveF; ## average F value in the SF type
my $enc; ## effective number of codons
my $count = &codon_compiler($gb, -output=>'n', -id=>$id, -type=>'C0', -translate=>$translate, -del_key=>$del_key); foreach (keys %{$count}){ $tbox{substr($_, 0, 1)} += $count->{$_}; } foreach (keys %{$count}){ $aa = substr($_, 0, 1); if($tbox{$aa} - 1 == 0){ delete $tbox{$aa}; } else { $relative{$_} = $count->{$_} / $tbox{$aa}; }
}

foreach (keys
%relative){ $F{substr($_, 0, 1)} += $relative{$_} ** 2; }
foreach
$aa (keys %F){
$F{$aa} = ($tbox{$aa} * $F{$aa} - 1) / ($tbox{$aa} - 1); } foreach $aa (keys %F){ if($F{$aa} > 0.0000001){ $sumF{$gb->{degeneracy}->{$aa}} += $F{$aa}; $cntaa{$gb->{degeneracy}->{$aa}} ++; } } foreach (values %{$gb->{degeneracy}}){ $numaa{$_} ++; } foreach (keys %numaa){ if($_ == 1){ $aveF{$_} = 1; } elsif($cntaa{$_} && $sumF{$_}){ $aveF{$_} = $sumF{$_} / $cntaa{$_};
} elsif($_ == 3 && $sumF{2} && $sumF{4}){ $aveF{3} = ($sumF{2} / $cntaa{2} + $sumF{4} / $cntaa{4}) / 2;
} else { $enc = -999; } } unless($enc){ foreach (keys %numaa){ $enc += $numaa{$_} / $aveF{$_}; }}
return (
$enc > 61) ? 61.00 : sprintf "%.2f", $enc;
} =head2 cbi cbi ver.20030401 Author: Haruo Suzuki (haruo@g-language.org) Usage: scalar = &cbi(pointer G instance); Options: -id ID of a group of genes or a single gene (default:'All') -translate 1 when translates using regular codon table (default: 0) -del_key regular expression to delete amino acid and codon keys (default:'[^ACDEFGHIKLMNPQRSTVWYatgcU]') Description: This program calculates the Codon Bias Index (CBI) Reference: Morton BR (1993) J.Mol.Evol. 37:273-280. Requirements: sub codon_compiler(); =cut sub cbi { &opt_default(id=>'All', translate=>0, del_key=>'[^ACDEFGHIKLMNPQRSTVWYatgcU]'); my @args = opt_get(@_); my $gb = opt_as_gb(shift @args); my $id = opt_val("id"); my $translate = opt_val("translate"); my $del_key = opt_val("del_key"); my $aacu = &codon_compiler($gb, -output=>'n', -id=>$id, -type=>'A1C4', -translate=>$translate, -del_key=>$del_key); my %bias; ## biases of individual amino acids
my $cbi; foreach (keys %{$gb->{All}->{A1C4}}){ my $aa = substr($_, 0, 1); $bias{$aa} += ($aacu->{$_} - 1) ** 2 / ($gb->{degeneracy}->{$aa} - 1) if($aacu->{$aa} && length($_) == 4 && $gb->{degeneracy}->{$aa} > 1);
} foreach (keys %bias){ $cbi += $aacu->{$_} * $bias{$_}; } return sprintf "%.4f", $cbi; } =head2 icdi icdi ver.20030401 Author: Haruo Suzuki (haruo@g-language.org) Usage: scalar = &icdi(pointer G instance); Options: -id ID of a group of genes or a single gene (default:'All') -translate 1 when translates using regular codon table (default: 0) -del_key regular expression to delete amino acid and codon keys (default:'[^ACDEFGHIKLMNPQRSTVWYatgcU]') Description: This program calculates the Intrinsic Codon Deviation Index (ICDI) Reference: Freire-Picos MA et al. (1994) Gene 139:43-49. Requirements: sub codon_compiler(); =cut sub icdi { &opt_default(id=>'All', translate=>0, del_key=>'[^ACDEFGHIKLMNPQRSTVWYatgcU]'); my @args = opt_get(@_); my $gb = opt_as_gb(shift @args); my $id = opt_val("id"); my $translate = opt_val("translate"); my $del_key = opt_val("del_key"); my $rscu; ## relative synonymous codon usage (RSCU) values
my %tbox; ## total in each amino acid box
my %Sk; ## Sk values of individual amino acids
my $icdi; ## intrinsic codon deviation index (ICDI)
my $Saa; ## species richness of amino acid
$rscu = &codon_compiler($gb, -output=>'n', -id=>$id, -type=>'C3', -translate=>$translate, -del_key=>$del_key); foreach (keys %{$gb->{All}->{C3}}){ $tbox{substr($_, 0, 1)} += $rscu->{$_}; } foreach (keys %{$gb->{All}->{C3}}){ my $aa = substr($_, 0, 1); $Sk{$aa} += ($rscu->{$_} - 1) ** 2 if($tbox{$aa} > 1); ## exclude absent or nondegenerative aa
} next unless(%Sk); ## e.x. NSP1 in S_cerevisiae/X.gbk
foreach (keys %Sk){ my $k = $gb->{degeneracy}->{$_}; $icdi += $Sk{$_} / ($k * ($k - 1));
$Saa ++; } return sprintf "%.4f", $icdi /= $Saa;
} =head2 fop fop ver.20030401 Author: Haruo Suzuki (haruo@g-language.org) Usage: NULL = &fop(pointer G instance); Options: -output output option (default: 'stdout') -filename output filename (default: 'fop.csv') -binary pointer HASH of vectors of binary number, 1 or 0 refering to the optimal or nonoptimal codon. By default, the optimal codons of E. coli are assumed. -translate 1 when translates using regular codon table (default: 0) Description: This program calculates the Frequency of optimal codons (Fop), and inputs in the G instance. i.e. Fop values will be accessible at $gb->{"FEATURE$i"}->{fop} Reference: Ikemura (1981) J.Mol.Biol. 151: 389-409. Requirements: none. =cut sub fop { &opt_default(output=>'stdout', filename=>'fop.csv', translate=>0); my @args = opt_get(@_); my $gb = opt_as_gb(shift @args); my $output = opt_val("output"); my $filename = opt_val("filename"); my $translate = opt_val("translate"); my %ecoli = ('Lcta'=>0,'Lctc'=>0,'Lctg'=>1,'Lctt'=>0,'Ltta'=>0,'Lttg'=>0, 'Raga'=>0,'Ragg'=>0,'Rcga'=>0,'Rcgc'=>1,'Rcgg'=>0,'Rcgt'=>1, 'Sagc'=>0,'Sagt'=>0,'Stca'=>0,'Stcc'=>0,'Stcg'=>0,'Stct'=>0, 'Agca'=>1,'Agcc'=>0,'Agcg'=>1,'Agct'=>1, 'Ggga'=>0,'Gggc'=>1,'Gggg'=>0,'Gggt'=>1, 'Pcca'=>0,'Pccc'=>0,'Pccg'=>1,'Pcct'=>0, 'Taca'=>0,'Tacc'=>1,'Tacg'=>0,'Tact'=>1, 'Vgta'=>1,'Vgtc'=>0,'Vgtg'=>1,'Vgtt'=>1, 'Iata'=>0,'Iatc'=>1,'Iatt'=>0, 'Naac'=>1,'Naat'=>0, 'Dgac'=>1,'Dgat'=>0, 'Ctgc'=>0,'Ctgt'=>0, 'Qcaa'=>0,'Qcag'=>1, 'Egaa'=>1,'Egag'=>0, 'Hcac'=>1,'Hcat'=>0, 'Kaaa'=>1,'Kaag'=>0, 'Fttc'=>1,'Fttt'=>0, 'Ytac'=>1,'Ytat'=>0); my $binary = opt_val("binary") ? opt_val("binary") :\% ecoli; my %degeneracy; foreach (keys %{$binary}){ $degeneracy{substr($_, 0, 1)} ++; } if($output eq 'stdout'){ msg_send("\n\n", "optimal (1) and nonoptimal (0) codons\n", "-------------------------------------\n" ); foreach (sort keys %$binary){ msg_send("$_\t"); } print "\n"; foreach (sort keys %$binary){ msg_send("$binary->{$_}\t"); } msg_send("\n\n", "Frequency of optimal codons (Fop; ratio of optimal codons to synonymous codons)\n", " Lp: length in amino acid residues of protein\n", " Ls: length in silent sites (number of synonymous codons)\n", "--------------------------------------------\n", "Lp\tLs\tstart\tstop\tFop\tgene\n", "--------------------------------------------\n"); } if($output eq 'f'){ open(FILE,">>$filename"); print FILE "Frequency of optimal codons (Fop; ratio of optimal codons to synonymous codons)\n", " Lp: length in amino acid residues of protein\n", " Ls: length in silent sites (number of synonymous codons)\n", "Lp,Ls,start codon,stop codon,Fop,gene,product\n"; } foreach my $cds ($gb->cds()){ my $seq = substr(lc($gb->get_geneseq($cds)), $gb->{$cds}->{codon_start} - 1 + 3, -3); my $aaseq; my $i; if($translate){ while($i <= length($seq) - 3){ my $codon = substr($seq, $i, 3); $aaseq .= ($codon =~ /[^atgc]/) ? '?' : $Table{$codon}; $i += 3; } } else { $aaseq = substr($gb->{$cds}->{translation}, 1); } my $Lp = length($aaseq); my $Ls; ## length in silent sites
my $fop; ## frequency of optimal codons
while($aaseq){ my $aa = substr($aaseq, 0, 1); substr($aaseq, 0, 1) = ''; if($degeneracy{$aa} > 1){ $Ls ++; $fop += $binary->{$aa.substr($seq, $i, 3)}; } $i += 3; } $gb->{$cds}->{fop} = sprintf "%.4f", $fop / $Ls if($Ls);
if($output eq 'stdout'){ msg_send("$Lp\t$Ls\t", $gb->startcodon($cds), "\t", $gb->stopcodon($cds), "\t", $gb->{$cds}->{fop}, "\t", $gb->{$cds}->{gene}, "\n"); } if($output eq 'f'){ print FILE "$Lp,$Ls,", $gb->startcodon($cds), ",", $gb->stopcodon($cds), ",", $gb->{$cds}->{fop}, ",", $gb->{$cds}->{gene}, ",", $gb->{$cds}->{product}, "\n"; } } close(FILE); } =head2 w_value w_value ver.20030401 Author: Haruo Suzuki (haruo@g-language.org) Usage: pointer HASH_w_values = &w_value(pointer G instance); Options: -in regular expression to include genes in a reference set a reference set in several studies are in-built 1: Nakamura & Tabata, 2: Sharp & Li, 3: Sakai et al. (default: 1) -ex regular expression to exclude genes from a reference set (default: '[Mm]itochondrial') Description: Calculates the W value necessary for CAI analysis. Returned value is a pointer HASH of W values. Reference: Sharp and Li (1987) Nucleic Acids Res. 15:1281-1295. Nakamura and Tabata (1997) Microb.Comp.Genomics 2:299-312. Sakai et al. (2001) J.Mol.Evol. 52:164-170. History: 20030401 addition of -in & -out option 20010830 first release (Author: Kazuharu "Gaou" Arakawa) Requirements: none. =cut sub w_value { &opt_default(output=>'stdout', filename=>"w_value.csv", in=>'ribosomal.*protein', ex=>'[Mm]itochondrial'); my @args=opt_get(@_); my $gb = opt_as_gb(shift @args); my $output = opt_val("output"); my $filename = opt_val("filename"); my $include; my $exclude = opt_val("ex"); &msg_error("CAI bases its measure on reference codon usage, i.e. usually that of ribosomoal proteins, and therefore is not usually applicable to organisms other than prokaryotes.") if($gb->{HEADER} !~ /Archaea|Bacteria/); if(opt_val("in") == 1){ $include = 'ribosomal.*protein'; } elsif(opt_val("in") == 2){ $include = 'ribosomal.*protein|outer membrane protein|^elongation factor'; } elsif(opt_val("in") == 3){ $include = 'ribosom.*protein|outer.*membrane|elongation|heat.*shock|RNA.*polymerase.*subunit'; } else{ $include = opt_val("in"); } if($output eq 'stdout'){ msg_send("\n", "Reference set of highly expressed genes\n", "-----------------------------------------\n", "length\tgene\tproduct\n", "-----------------------------------------\n"); } if($output eq 'f'){ open(FILE,">>$filename"); print FILE "Reference set of highly expressed genes\n", "length,gene,product\n"; } my $aaseq; my $seq; my $aa; my $codon; my %count; my $i; my %max; my %w_val; my $cnt; foreach my $cds ($gb->cds()){ next if ($gb->{$cds}->{note} !~ /$include/i && $gb->{$cds}->{product} !~ /$include/i); next if ($gb->{$cds}->{product} =~ /$exclude/); $cnt ++; $aaseq = substr($gb->{$cds}->{translation}, 1); $seq = substr(lc($gb->get_geneseq($cds)), $gb->{$cds}->{codon_start} - 1 + 3, -3); while($aaseq){ $aa = substr($aaseq, 0, 1); substr($aaseq, 0, 1) = ''; $codon = substr($seq, $i, 3); $count{$aa.$codon} ++; $i += 3; } $i = 0; if($output eq 'stdout'){ msg_send(length($gb->{$cds}->{translation}) - 1, "\t", $gb->{$cds}->{gene}, "\t", $gb->{$cds}->{product}, "\n"); } if($output eq 'f'){ print FILE length($gb->{$cds}->{translation}) - 1, ",", $gb->{$cds}->{gene}, ",", $gb->{$cds}->{product}, "\n"; } } die("\nCould not define w values. No reference proteins found.") unless($cnt); foreach (keys %count){ $aa = substr($_, 0, 1); $max{$aa} = $count{$_} if($count{$_} > $max{$aa}); } foreach (keys %count){ $w_val{$_} = sprintf "%.4f", $count{$_} / $max{substr($_, 0, 1)};
} if($output eq 'stdout'){ msg_send("\nRelative adaptiveness (W) of each codon\n", "-----------------------------------------\n", "aa\tcodon\tW value\n", "-----------------------------------------\n"); foreach (sort keys %count){ $aa = substr($_, 0, 1); $codon = substr($_, 1, 3); msg_send("$aa\t$codon\t$w_val{$_}\n"); } } if($output eq 'f'){ print FILE "\nRelative adaptiveness (W) of each codon\n", "amino acid,codon,W value\n"; foreach (sort keys %count){ $aa = substr($_, 0, 1); $codon = substr($_, 1, 3); print FILE "$aa,$codon,$w_val{$_}\n"; } close(FILE); } return\% w_val; } =head2 cai cai ver.20030401 Author: Haruo Suzuki (haruo@g-language.org) Usage: NULL = &cai(pointer G instance); Options: -output output option (default: stdout) -filename output filename (default: "cai.csv") -translate 1 when translates using regular codon table (default: 0) -w_output output option for W value (default: stdout) -w_filename filename to output the W values (default: "w_value.csv") -w_values pointer HASH of W values used in CAI calculation -w_absent W value of codons absent from a reference set (default: 0) a value of 0.5 was suggested by Sharp and Li (1987) Description: This program calculates Codon Adaptation Index (CAI) for each genes, and inputs in the G instance. i.e. CAI values will be accessible at $gb->{"FEATURE$i"}->{cai}; History: 20030401 addition of -w_absent option 20010830 first release (Author: Kazuharu "Gaou" Arakawa) Requirements: sub w_value() =cut sub cai { &opt_default(output=>'stdout', filename=>"cai.csv", translate=>0, w_output=>'stdout', w_filename=>"w_value.csv", w_absent=>0); my @args = opt_get(@_); my $gb = opt_as_gb(shift @args); my $output = opt_val("output"); my $filename = opt_val("filename"); my $translate = opt_val("translate"); my $w_output = opt_val("w_output"); my $w_fname = opt_val("w_filename"); my $w_abs = opt_val("w_absent"); my $w_val; if(opt_val("w_values")){ $w_val = opt_val("w_values"); }else{ $w_val = &w_value($gb, -output=>$w_output, -filename=>$w_fname); } if($output eq 'stdout'){ msg_send("\n", "Codon Adaptation Index (CAI; geometric mean of W values)\n", "--------------------------------------------------------\n", "length\tstart\tstop\tCAI\tgene\n", "--------------------------------------------------------\n"); } if($output eq 'f'){ open(FILE,">>$filename"); print FILE "Codon Adaptation Index (CAI; geometric mean of W values)\n", "length,start codon,stop codon,CAI,gene,product\n"; } foreach my $cds ($gb->cds()){ my $seq = substr(lc($gb->get_geneseq($cds)), $gb->{$cds}->{codon_start} - 1 + 3, -3); my $aaseq; my $i; my $codon; if($translate){ while($i <= length($seq) - 3){ $codon = substr($seq, $i, 3); $aaseq .= ($codon =~ /[^atgc]/) ? '?' : $Table{$codon}; $i += 3; } $i = 0; } else { $aaseq = substr($gb->{$cds}->{translation}, 1); } my $Lp = length($aaseq); my $cai; while($aaseq){ my $aa = substr($aaseq, 0, 1); substr($aaseq, 0, 1) = ''; $codon = substr($seq, $i * 3, 3); if($w_val->{$aa.$codon}){ $cai += log($w_val->{$aa.$codon}); } elsif($w_abs){ $cai += log($w_abs); } else { $cai = 0; last; } $i ++; } $gb->{$cds}->{cai} = ($cai) ? sprintf "%.4f", exp($cai / $Lp) : 0 if($Lp);
if($output eq 'stdout'){ msg_send("$Lp\t", $gb->startcodon($cds), "\t", $gb->stopcodon($cds), "\t", $gb->{$cds}->{cai}, "\t", $gb->{$cds}->{gene}, "\n"); } if($output eq 'f'){ print FILE "$Lp,", $gb->startcodon($cds), ",", $gb->stopcodon($cds), ",", $gb->{$cds}->{cai}, ",", $gb->{$cds}->{gene}, ",", $gb->{$cds}->{product}, "\n"; } } close(FILE); } =head2 phx phx ver.20030401 Author: Haruo Suzuki (haruo@g-language.org) Usage: NULL = &phx(pointer G instance); Options: -output output option (default: 'stdout') -filename output filename (default: "phx.csv") -usage pointer HASH of vectors of amino acid and codon usage of a set of highly expressed genes. By default, the ribosomal protein genes are used. -translate 1 when translates using regular codon table (default: 0) -del_key regular expression to delete amino acid and codon keys (default:'[^ACDEFGHIKLMNPQRSTVWYatgcU]') Description: This program calculates codon usage differences between gene classes for characterizing predicted highly expressed (PHX) genes, and inputs in the G instance. e.g. PEL values will be accessible at $gb->{"FEATURE$i"}->{pel} Informations stored in the G instance: $gb->{"FEATURE$i"}->{cudC}: B(g|C) $gb->{"FEATURE$i"}->{cudG}: B(g|G) $gb->{"FEATURE$i"}->{pel}: E(g) [B(g|C) / B(g|G)]
$gb->{"FEATURE$i"}->{phx}: 1 if a gene is PHX [i.e. E(g) > 1.05]
References:
Karlin and Mrazek (2000) J.Bacteriol. 182(18):5238-5250.
Jansen et al. (2003) Nucleic Acids Res. 31(8):2242-2251.
Requirements: sub codon_compiler()

=cut

sub phx {
&opt_default(output=>'stdout', filename=>"phx.csv", usage=>"",
translate=>0, del_key=>'[^ACDEFGHIKLMNPQRSTVWYatgcU]');
my @args = opt_get(@_); my $gb = opt_as_gb(shift @args); my $output = opt_val("output"); my $filename = opt_val("filename"); my $translate = opt_val("translate"); my $del_key = opt_val("del_key"); my $usage = opt_val("usage"); my $cds; unless($usage){ foreach $cds ($gb->cds()){ $gb->{$cds}->{RP} = $gb->{$cds}->{product} =~ /ribosomal.*protein/i; } $usage = &codon_compiler($gb, -output=>'n', -id=>'RP', -type=>'A1C2', -del_key=>$del_key); } if($output eq 'stdout'){ msg_send("\n\n", "Predicted highly expressed (PHX) genes\n", " B(g|C): codon usage difference relative to collection of all genes\n", " B(g|G): codon usage difference relative to a set of highly expressed genes\n", " E(g): predicted expression level [B(g|C) / B(g|G)]\n", " PHX = 1 if PEL > 1.05\n", "------------------------------------------------------------\n", "length\tstart\tstop\tB(g|C)\tB(g|G)\tE(g)\tPHX\tgene\n", "------------------------------------------------------------\n"); } if($output eq 'f'){ open(FILE,">>$filename"); print FILE "Predicted highly expressed (PHX) genes\n", " B(g|C): codon usage difference relative to collection of all genes\n", " B(g|G): codon usage difference relative to a set of highly expressed genes\n", " E(g): predicted expression level [B(g|C) / B(g|G)]\n", " PHX = 1 if PEL > 1.05\n", "length,start codon,stop codon,B(g|C),B(g|G),E(g),PHX,gene,product\n"; } foreach $cds ($gb->cds()){ my $aacu = &codon_compiler($gb, -output=>'n', -id=>$cds, -type=>'A1C2', -translate=>$translate, -del_key=>$del_key); ## codon usage difference (cud) between gene classes
my %boxC; my %boxG; my $cudC; ## cud relative to the collection of all genes
my $cudG; ## cud relative to the ribosomal protein genes
foreach (keys %{$gb->{All}->{A1C2}}){ if(length($_) == 4){ my $aa = substr($_, 0, 1); $boxC{$aa} += abs($aacu->{$_} - $gb->{All}->{A1C2}->{$_}); $boxG{$aa} += abs($aacu->{$_} - $usage->{$_}); } } foreach (keys %boxC){ $cudC += $aacu->{$_} * $boxC{$_}; $cudG += $aacu->{$_} * $boxG{$_}; } if($cudG){ $gb->{$cds}->{cudC} = sprintf "%.4f", $cudC; $gb->{$cds}->{cudG} = sprintf "%.4f", $cudG; $gb->{$cds}->{pel} = sprintf "%.4f", $cudC / $cudG;
$gb->{$cds}->{phx} = (($gb->{$cds}->{pel} > 1.05) ? 1 : 0); } if($output eq 'stdout'){ msg_send(length(substr($gb->{$cds}->{translation}, 1)), "\t", $gb->startcodon($cds), "\t", $gb->stopcodon($cds), "\t", $gb->{$cds}->{cudC}, "\t", $gb->{$cds}->{cudG}, "\t", $gb->{$cds}->{pel}, "\t", $gb->{$cds}->{phx}, "\t", $gb->{$cds}->{gene}, "\n"); } if($output eq 'f'){ print FILE length(substr($gb->{$cds}->{translation}, 1)), ",", $gb->startcodon($cds), ",", $gb->stopcodon($cds), ",", $gb->{$cds}->{cudC}, ",", $gb->{$cds}->{cudG}, ",", $gb->{$cds}->{pel}, ",", $gb->{$cds}->{phx}, ",", $gb->{$cds}->{gene}, ",", $gb->{$cds}->{product}, "\n"; } } close(FILE); } =head2 bui bui ver.20030401 Author: Haruo Suzuki (haruo@g-language.org) Usage: NULL = &bui(pointer G instance); Options: -output output option (default: 'stdout') -filename output filename (default: 'bui.csv') -id ID of a group of genes or a single gene (default:'All') -position codon position (default: ''); '' when assessing overall base usage of the gene '1' when assessing base usage at 1st position of codons '2' when assessing base usage at 2nd position of codons '3' when assessing base usage at 3rd position of codons '3s' when assessing base usage at 3rd position of synonymous codons (excluding nondegenerative codons) Description: Calculates indices of base usage (excluding start and stop codons), and inputs in the G instance. e.g. G+C content at 3rd position of synonymous codons will be accessible at $gb->{"FEATURE$i"}->{gcc3s} Informations stored in the G instance: $gb->{$id}->{a_}: number of adenine $gb->{$id}->{t_}: number of thymine $gb->{$id}->{g_}: number of guanine $gb->{$id}->{c_}: number of cytosine $gb->{$id}->{gcc}: G+C content [(G+C)/(A+T+G+C)]
$gb->{$id}->{gtc}: G+T content [(G+T)/(A+T+G+C)] $gb->{$id}->{gcs}: GC skew [(G-C)/(G+C)]
$gb->{$id}->{tas}: TA skew [(T-A)/(T+A)] $gb->{$id}->{ryr}: purine/pyrimidine ratio [(A+G)/(T+C)] $gb->{$id}->{Hbu}: entropy in base usage $gb->{legend_bui} Requirements: none. =cut sub bui { &opt_default(output=>'stdout',filename=>'bui.csv',id=>'All',position=>''); my @args = opt_get(@_); my $gb = opt_as_gb(shift @args); my $output = opt_val("output"); my $filename = opt_val("filename"); my $id = opt_val("id"); my $p = opt_val("position"); my $seq; if($id =~ /FEATURE/){ ## for a single gene
$seq = substr(lc($gb->get_geneseq($id)), $gb->{$id}->{codon_start} - 1 + 3, -3); ## excluding start and stop codons
if($p){ my @aaseq; if($p =~ /s/){ @aaseq = split(//, $gb->{$id}->{translation}); shift @aaseq; } my $partseq; for(my $i; $i * 3 <= length($seq) - 3; $i ++){ if($p =~ /s/){ next if($aaseq[$i] eq '' || $aaseq[$i] eq 'M' || $aaseq[$i] eq 'W' && $gb->{LOCUS}->{id} ne 'NC_000908'); } $partseq .= substr($seq, $i * 3 + $p - 1, 1); } $seq = $partseq; } } else { ## for a group of genes
foreach my $cds ($gb->cds()){ next if( $id ne 'All' && !($gb->{$cds}->{$id}) ); my $cdsseq = substr(lc($gb->get_geneseq($cds)), $gb->{$cds}->{codon_start} - 1 + 3, -3); if($p){ my @aaseq; if($p =~ /s/){ @aaseq = split(//, $gb->{$cds}->{translation}); shift @aaseq; } my $partseq; for(my $i; $i * 3 <= length($cdsseq) - 3; $i ++){ if($p =~ /s/){ next if($aaseq[$i] eq '' || $aaseq[$i] eq 'M' || $aaseq[$i] eq 'W' && $gb->{LOCUS}->{id} ne 'NC_000908'); } $partseq .= substr($cdsseq, $i * 3 + $p - 1, 1); } $cdsseq = $partseq; } $seq .= $cdsseq; } } my ($a, $t, $g, $c, $Hbu); if($seq){ $gb->{$id}->{"a_$p"} = $a = $seq =~ tr/a/a/; $gb->{$id}->{"t_$p"} = $t = $seq =~ tr/t/t/; $gb->{$id}->{"g_$p"} = $g = $seq =~ tr/g/g/; $gb->{$id}->{"c_$p"} = $c = $seq =~ tr/c/c/; $gb->{$id}->{"gcc$p"} = sprintf "%.4f", ($g + $c) / ($a+$t+$g+$c);
$gb->{$id}->{"gtc$p"} = sprintf "%.4f", ($g + $t) / ($a+$t+$g+$c);
$gb->{$id}->{"gcs$p"} = sprintf "%+.4f", ($g - $c) / ($g + $c);
$gb->{$id}->{"tas$p"} = sprintf "%+.4f", ($t - $a) / ($t + $a);
$gb->{$id}->{"ryr$p"} = sprintf "%.4f", ($a + $g) / ($t + $c);
foreach($a, $t, $g, $c){ $Hbu -= $_/($a + $t + $g + $c) * log($_/($a + $t + $g + $c)) / log(2) if($_); }
$gb->{$id}->{"Hbu$p"} = sprintf "%.4f", $Hbu;
} my $gene = ($id =~ /FEATURE/) ? $gb->{$id}->{gene} : "{$id}"; if($output eq 'stdout'){ unless($gb->{legend_bui}){ msg_send("\n\n", "Base usage indices (excluding start and stop codons)\n", " ATGC: A$p + T$p + G$p + C$p\n", " GCc: (G$p + C$p)/(A$p + T$p + G$p + C$p)\n", " GTc: (G$p + T$p)/(A$p + T$p + G$p + C$p)\n", " GCs: (G$p - C$p)/(G$p + C$p)\n", " TAs: (T$p - A$p)/(T$p + A$p)\n", " R/Y: (A$p + G$p)/(T$p + C$p)\n", " Hbu: Shannon's uncertainty (entropy) in base usage\n", "-------------------------------------------------------------\n", "ATGC\tGCc\tGTc\tGCs\tTAs\tR/Y\tHbu\tgene\n", "-------------------------------------------------------------\n"); $gb->{legend_bui} = 1; } msg_send($a + $t + $g + $c, "\t", $gb->{$id}->{"gcc$p"}, "\t", $gb->{$id}->{"gtc$p"}, "\t", $gb->{$id}->{"gcs$p"}, "\t", $gb->{$id}->{"tas$p"}, "\t", $gb->{$id}->{"ryr$p"}, "\t", $gb->{$id}->{"Hbu$p"}, "\t", $gene, "\n"); } if($output eq 'f'){ open(FILE,">>$filename"); unless($gb->{legend_bui}){ print FILE "Base usage indices (excluding start and stop codons)\n", " ATGC: A$p + T$p + G$p + C$p\n", " GCc: (G$p + C$p)/(A$p + T$p + G$p + C$p)\n", " GTc: (G$p + T$p)/(A$p + T$p + G$p + C$p)\n", " GCs: (G$p - C$p)/(G$p + C$p)\n", " TAs: (T$p - A$p)/(T$p + A$p)\n", " R/Y: (A$p + G$p)/(T$p + C$p)\n", " Hbu: Shannon's uncertainty (entropy) in base usage\n", "ATGC,GCc,GTc,GCs,TAs,R/Y,Hbu,gene\n"; $gb->{legend_bui} = 1; } print FILE $a + $t + $g + $c, ",", $gb->{$id}->{"gcc$p"}, ",", $gb->{$id}->{"gtc$p"}, ",", $gb->{$id}->{"gcs$p"}, ",", $gb->{$id}->{"tas$p"}, ",", $gb->{$id}->{"ryr$p"}, ",", $gb->{$id}->{"Hbu$p"}, ",", $gene, "\n"; close(FILE); } } =head2 aaui aaui ver.20030401 Author: Haruo Suzuki (haruo@g-language.org) Usage: NULL = &aaui(pointer G instance); Options: -output output option (default: 'stdout') -filename output filename (default: 'aaui.csv') -id ID of a group of genes or a single gene (default:'All') -index hydropathic indices of amino acids (default:1) 1: Kyte & Doolittle, 2: Hopp & Woods, 3: Parker et al Description: Calculates indices of amino acid usage (excluding formylmethionine), and inputs in the G instance. Informations stored in the G instance: $gb->{$id}->{Saa}: species richness of amino acid $gb->{$id}->{Haau}: entropy in amino acid usage of protein $gb->{$id}->{hydropathy}: mean hydropathic indices of each amino acid $gb->{$id}->{aromaticity}: frequency of aromatic amino acids $gb->{legend_bui}; Reference: Lobry and Gautier (1994) Nucleic Acids Res. 22:3174-3180. Requirements: none. =cut sub aaui { &opt_default(output=>'stdout', filename=>'aaui.csv', id=>'All', index=>1); my @args = opt_get(@_); my $gb = opt_as_gb(shift @args); my $output = opt_val("output"); my $filename = opt_val("filename"); my $id = opt_val("id"); my $index = opt_val("index"); my $aaseq; if($id =~ /FEATURE/){ ## for a single gene
$aaseq = substr($gb->{$id}->{translation}, 1); ## excluding formylMet
} else { ## for a group of genes
foreach my $cds ($gb->cds()){ next if( $id ne 'All' && !($gb->{$cds}->{$id}) ); $aaseq .= substr($gb->{$cds}->{translation}, 1); } } my $total = $aaseq =~ tr[ACDEFGHIKLMNPQRSTVWY][ACDEFGHIKLMNPQRSTVWY]; if($aaseq){ my %count; $count{A} = $aaseq =~ tr/A/A/; $count{C} = $aaseq =~ tr/C/C/; $count{D} = $aaseq =~ tr/D/D/; $count{E} = $aaseq =~ tr/E/E/; $count{F} = $aaseq =~ tr/F/F/; $count{G} = $aaseq =~ tr/G/G/; $count{H} = $aaseq =~ tr/H/H/; $count{I} = $aaseq =~ tr/I/I/; $count{K} = $aaseq =~ tr/K/K/; $count{L} = $aaseq =~ tr/L/L/; $count{M} = $aaseq =~ tr/M/M/; $count{N} = $aaseq =~ tr/N/N/; $count{P} = $aaseq =~ tr/P/P/; $count{Q} = $aaseq =~ tr/Q/Q/; $count{R} = $aaseq =~ tr/R/R/; $count{S} = $aaseq =~ tr/S/S/; $count{T} = $aaseq =~ tr/T/T/; $count{V} = $aaseq =~ tr/V/V/; $count{W} = $aaseq =~ tr/W/W/; $count{Y} = $aaseq =~ tr/Y/Y/; my ($Saa, $Haau, $hydro); foreach (qw(A C D E F G H I K L M N P Q R S T V W Y)){ next if($count{$_} == 0); $Saa ++; $Haau += - $count{$_}/$total * log($count{$_}/$total) / log(2);
$hydro += $count{$_} * $KyteDoolittle{$_} if($index == 1); $hydro += $count{$_} * $HoppWoods{$_} if($index == 2); $hydro += $count{$_} * $Parkeretal{$_} if($index == 3); } $gb->{$id}->{Saa} = $Saa; $gb->{$id}->{Haau} = sprintf "%.4f", $Haau; $gb->{$id}->{hydropathy} = sprintf "%+.4f", $hydro / $total;
$gb->{$id}->{aromaticity} = sprintf "%.4f", ($count{F} + $count{Y} + $count{W}) / $total;
} my $gene = ($id =~ /FEATURE/) ? $gb->{$id}->{gene} : "{$id}"; if($output eq 'stdout'){ unless($gb->{legend_aaui}){ msg_send("\n\n", "Indices of amino acid usage (excluding formylmethionine)\n", " Lp: protein length\n", " Saa: Species richness of amino acid (i.e. number of distinct amino acids)\n", " Haau: Shannon's uncertainty (entropy) in amino acid usage\n", "---------------------------------------------\n", "Lp\tSaa\tHaau\thydro-\taroma-\tgene\n", "\t\t\tpathy$index\tticity\n", "---------------------------------------------\n"); $gb->{legend_aaui} = 1; } msg_send( "$total\t", $gb->{$id}->{Saa}, "\t", $gb->{$id}->{Haau}, "\t", $gb->{$id}->{hydropathy}, "\t", $gb->{$id}->{aromaticity}, "\t", $gene, "\n"); } if($output eq 'f'){ open(FILE,">>$filename"); unless($gb->{legend_aaui}){ print FILE "Indices of amino acid usage (excluding formylmethionine)\n", " Lp: protein length\n", " Saa: Species richness of amino acid (i.e. number of distinct amino acids)\n", " Haau: Shannon's uncertainty (entropy) in amino acid usage\n", "Lp,Saa,Haau,hydropathy$index,aromaticity,gene\n"; $gb->{legend_aaui} = 1; } print FILE "$total,", $gb->{$id}->{Saa}, ",", $gb->{$id}->{Haau}, ",", $gb->{$id}->{hydropathy}, ",", $gb->{$id}->{aromaticity}, ",", $gene, "\n"; } close(FILE); } ## Abbreviations
## A Ala Alanine
## C Cys Cysteine
## D Asp Aspartic acid
## E Glu Glutamic acid
## F Phe Phenylalanine
## G Gly Glycine
## H His Histidine
## I Ile Isoleucine
## K Lys Lysine
## L Leu Leucine
## M Met Methionine
## N Asn Asparagine
## P Pro Proline
## Q Gln Glutamine
## R Arg Arginine
## S Ser Serine
## T Thr Threonine
##(U Sec Selenocysteine)
## V Val Valine
## W Trp Tryptophan
## Y Tyr Tyrosine
##(X Others)
##codon_counter ver.20010721-01
##scripting by Koya Mori(mory@g-language.org)
##This program counts codons in CDS.
##options::
##-CDSid CDS ID number(default:'')
##-filename output filename(default:'codon.csv')
##-option "f" for file output, "stdout" for STDOUT output(default:'stdout')
##(pointer result)=&codon_counter(pointer Genome, string key, boolean debug);
sub codon_counter{ &opt_default(CDSid=>"",filename=>'codon.csv',output=>'stdout'); my @args=opt_get(@_); my $gb=opt_as_gb(shift @args); my $key=opt_val("CDSid"); my $filename=opt_val("filename"); my $output=opt_val("output"); my %result; my $seq; my $i=1; my $q; if($key){ if(defined(%{$gb->{"$key"}})){ $seq=$gb->get_geneseq("$key"); for($q=0;$q<length($seq);$q+=3){ $result{substr($seq,$q,3)}++; } } } else{ foreach($gb->cds()){ $seq=$gb->get_geneseq("CDS$i"); for($q=0;$q<length($seq);$q+=3){ $result{substr($seq,$q,3)}++; } $i++; } } if($output eq "f"){ _codon_amino_printer(\%result,-output=>"f",-filename=>"$filename"); } if($output eq "stdout"){ _codon_amino_printer(\%result,-output=>"stdout"); } return\% result; } ##amino_counter ver.20010721-01
##scripting by Koya Mori(mory@g-language.org)
##This program counts amino acid in CDS.
##options::
##-CDSid CDS ID number(default:'')
##-filename output filename(default:'amino.csv')
##-output "f" for file output, "stdout" for STDOUT output(default:'stdout')
##(pointer result)=&amino_counter(pointer Genome, string key, boolean debug);
sub amino_counter{ &opt_default(CDSid=>"",filename=>'amino.csv',output=>'stdout'); my @args=opt_get(@_); my $gb= shift @args; my $key=opt_val("CDSid"); my $filename=opt_val("filename"); my $output=opt_val("output"); my %amino; my @tmp; my $num; if(length $key){ if(defined(%{$gb->{"$key"}})){ $num=$gb->{"$key"}->{feature}; @tmp=split(//,$gb->{"FEATURE$num"}->{translation}); foreach(@tmp){ $amino{$_}++; } } } else{ foreach my $cds ($gb->cds()){ @tmp=split(//,$gb->{$cds}->{translation}); foreach(@tmp){ $amino{$_}++; } } } if($output eq "f"){ _codon_amino_printer(\%amino,-output=>"f",-filename=>"$filename"); } if($output eq "stdout"){ _codon_amino_printer(\%amino,-output=>"stdout"); } return\% amino; } sub codon_amino_counter{ my $gb=shift; my $key=shift; my %result; my $seq; my $amino; my $codon; my $num; my $q; my @tmp; if(defined(%{$gb->{"$key"}})){ $seq=lc($gb->get_geneseq("$key")); $num=$gb->{"$key"}->{feature}; } @tmp=split(//,$gb->{"FEATURE$num"}->{translation}); for($q=0;$q<length($seq);$q+=3){ $amino=shift @tmp; $amino="/" if($amino eq ""); $codon=substr($seq,$q,3); $result{$amino}{$codon}++; } return %result; } ##codon_usage ver.20010721-01
##scripting by Koya Mori(mory@g-language.org)
##This program calculates codon usage in one CDS.
##options::
##-CDSid CDS ID number(default:'')
##-filename output filename(default:'amino.csv')
##-output "f" for file output, "stdout" for STDOUT output(default:'stdout')
##(pointer result)=&codon_usage(pointer Genome, string key, boolean debug);
sub codon_usage{ &opt_default(CDSid=>"",filename=>'codon_usage.csv',output=>'show',version=>1); my @args=opt_get(@_); my $gb=shift @args; my $key=opt_val("CDSid"); my $filename=opt_val("filename"); my $output=opt_val("output"); my $version = opt_val("version"); my $codon; my $amino; my %result; my %usage; if($key){ %usage=codon_amino_counter($gb,$key); } else{ foreach($gb->cds()){ %result=(); %result=codon_amino_counter($gb,$_); foreach $amino (keys(%result)){ foreach $codon (keys(%{$result{$amino}})){ $usage{$amino}{$codon}+=$result{$amino}{$codon}; } } } } if($output eq "f"){ _codon_usage_printer(\%usage,-output=>"f",-filename=>"$filename"); } if($output!~/[fn]/){ _codon_usage_printer(\%usage,-output=>"stdout"); } if ($version == 2){ _codon_table2(\%usage,-output=>"$output") if($output=~/(g|show)/ && $key eq ""); }else{ _codon_table(\%usage,-output=>"$output") if($output=~/(g|show)/ && $key eq ""); } return\% usage; } sub _codon_table{ &opt_default(output=>"show",filename=>"codon_table.png"); my @args=opt_get(@_); my $result=shift @args; my $filename=opt_val("filename"); my $output=opt_val("output"); my $x; my $y; my %amino; my %data; my %per; my $amino_total; my $codon; my $amino; my $v; my $h; my %exception; my $CoDoN; my %color; my $im = new GD::Image(500,550); my $white = $im->colorAllocate(255,255,255); my $black = $im->colorAllocate(0,0,0); my $red = $im->colorAllocate(255,0,0); my $yellow = $im->colorAllocate(200,200,0); my $green = $im->colorAllocate(0,150,0); my $blue = $im->colorAllocate(0,0,255); $color{D}=$yellow; $color{E}=$yellow; $color{R}=$red; $color{K}=$red; $color{H}=$red; $color{N}=$blue; $color{Q}=$blue; $color{S}=$blue; $color{T}=$blue; $color{Y}=$blue; $color{A}=$green; $color{G}=$green; $color{V}=$green; $color{L}=$green; $color{I}=$green; $color{P}=$green; $color{F}=$green; $color{M}=$green; $color{W}=$green; $color{C}=$green; $color{'/'}=$black; foreach((10,50,450,490)){ $x=$_; for($y=10;$y<450;$y++){ $im->setPixel($x,$y,$black); } } foreach((150,250,350)){ $x=$_; for($y=30;$y<450;$y++){ $im->setPixel($x,$y,$black); } } $y=30; for($x=50;$x<450;$x++){ $im->setPixel($x,$y,$black); } foreach((10,50,150,250,350,450)){ $y=$_; for($x=10;$x<490;$x++){ $im->setPixel($x,$y,$black); } } $im->string(gdSmallFont,15,25,"first",$red); $im->string(gdSmallFont,233,15,"second",$green); $im->string(gdSmallFont,455,25,"third",$blue); $im->string(gdSmallFont,30,95,"T",$red); $im->string(gdSmallFont,30,195,"C",$red); $im->string(gdSmallFont,30,295,"A",$red); $im->string(gdSmallFont,30,395,"G",$red); $im->string(gdSmallFont,100,35,"T",$green); $im->string(gdSmallFont,200,35,"C",$green); $im->string(gdSmallFont,300,35,"A",$green); $im->string(gdSmallFont,400,35,"G",$green); $im->string(gdSmallFont,470,65,"T",$blue); $im->string(gdSmallFont,470,85,"C",$blue); $im->string(gdSmallFont,470,105,"A",$blue); $im->string(gdSmallFont,470,125,"G",$blue); $im->string(gdSmallFont,470,165,"T",$blue); $im->string(gdSmallFont,470,185,"C",$blue); $im->string(gdSmallFont,470,205,"A",$blue); $im->string(gdSmallFont,470,225,"G",$blue); $im->string(gdSmallFont,470,265,"T",$blue); $im->string(gdSmallFont,470,285,"C",$blue); $im->string(gdSmallFont,470,305,"A",$blue); $im->string(gdSmallFont,470,325,"G",$blue); $im->string(gdSmallFont,470,365,"T",$blue); $im->string(gdSmallFont,470,385,"C",$blue); $im->string(gdSmallFont,470,405,"A",$blue); $im->string(gdSmallFont,470,425,"G",$blue); foreach $amino (keys(%{$result})){ $amino_total=0; foreach $codon (keys(%{$$result{$amino}})){ $amino_total+=$$result{$amino}{$codon}; } foreach $codon (keys(%{$$result{$amino}})){ if($$result{$amino}{$codon} > $data{$codon}){ if($data{$codon}!=""){ $exception{$codon}{amino}=$amino{$codon}; $exception{$codon}{per}=$per{$codon}; } $data{$codon}=$$result{$amino}{$codon}; $amino{$codon}=$amino; $per{$codon}=sprintf("%.3f",$$result{$amino}{$codon}/$amino_total);
} else{ $exception{$codon}{amino}=$amino; $exception{$codon}{per}=sprintf("%.3f",$$result{$amino}{$codon}/$amino_total);
} } } $im->string(gdSmallFont,60,65,"TTT $amino{ttt} $per{ttt}",$color{$amino{ttt}}); $im->string(gdSmallFont,60,85,"TTC $amino{ttc} $per{ttc}",$color{$amino{ttc}}); $im->string(gdSmallFont,60,105,"TTA $amino{tta} $per{tta}",$color{$amino{tta}}); $im->string(gdSmallFont,60,125,"TTG $amino{ttg} $per{ttg}",$color{$amino{ttg}}); $im->string(gdSmallFont,60,165,"CTT $amino{ctt} $per{ctt}",$color{$amino{ctt}}); $im->string(gdSmallFont,60,185,"CTC $amino{ctc} $per{ctc}",$color{$amino{ctc}}); $im->string(gdSmallFont,60,205,"CTA $amino{cta} $per{cta}",$color{$amino{cta}}); $im->string(gdSmallFont,60,225,"CTG $amino{ctg} $per{ctg}",$color{$amino{ctg}}); $im->string(gdSmallFont,60,265,"ATT $amino{att} $per{att}",$color{$amino{att}}); $im->string(gdSmallFont,60,285,"ATC $amino{atc} $per{atc}",$color{$amino{atc}}); $im->string(gdSmallFont,60,305,"ATA $amino{ata} $per{ata}",$color{$amino{ata}}); $im->string(gdSmallFont,60,325,"ATG $amino{atg} $per{atg}",$color{$amino{atg}}); $im->string(gdSmallFont,60,365,"GTT $amino{gtt} $per{gtt}",$color{$amino{gtt}}); $im->string(gdSmallFont,60,385,"GTC $amino{gtc} $per{gtc}",$color{$amino{gtc}}); $im->string(gdSmallFont,60,405,"GTA $amino{gta} $per{gta}",$color{$amino{gta}}); $im->string(gdSmallFont,60,425,"GTG $amino{gtg} $per{gtg}",$color{$amino{gtg}}); $im->string(gdSmallFont,160,65,"TCT $amino{tct} $per{tct}",$color{$amino{tct}}); $im->string(gdSmallFont,160,85,"TCC $amino{tcc} $per{tcc}",$color{$amino{tcc}}); $im->string(gdSmallFont,160,105,"TCA $amino{tca} $per{tca}",$color{$amino{tca}}); $im->string(gdSmallFont,160,125,"TCG $amino{tcg} $per{tcg}",$color{$amino{tcg}}); $im->string(gdSmallFont,160,165,"CCT $amino{cct} $per{cct}",$color{$amino{cct}}); $im->string(gdSmallFont,160,185,"CCC $amino{ccc} $per{ccc}",$color{$amino{ccc}}); $im->string(gdSmallFont,160,205,"CCA $amino{cca} $per{cca}",$color{$amino{cca}}); $im->string(gdSmallFont,160,225,"CCG $amino{ccg} $per{ccg}",$color{$amino{ccg}}); $im->string(gdSmallFont,160,265,"ACT $amino{act} $per{act}",$color{$amino{act}}); $im->string(gdSmallFont,160,285,"ACC $amino{acc} $per{acc}",$color{$amino{acc}}); $im->string(gdSmallFont,160,305,"ACA $amino{aca} $per{aca}",$color{$amino{aca}}); $im->string(gdSmallFont,160,325,"ACG $amino{acg} $per{acg}",$color{$amino{acg}}); $im->string(gdSmallFont,160,365,"GCT $amino{gct} $per{gct}",$color{$amino{gct}}); $im->string(gdSmallFont,160,385,"GCC $amino{gcc} $per{gcc}",$color{$amino{gcc}}); $im->string(gdSmallFont,160,405,"GCA $amino{gca} $per{gca}",$color{$amino{gca}}); $im->string(gdSmallFont,160,425,"GCG $amino{gcg} $per{gcg}",$color{$amino{gcg}}); $im->string(gdSmallFont,260,65,"TAT $amino{tat} $per{tat}",$color{$amino{tat}}); $im->string(gdSmallFont,260,85,"TAC $amino{tac} $per{tac}",$color{$amino{tac}}); $im->string(gdSmallFont,260,105,"TAA $amino{taa} $per{taa}",$color{$amino{taa}}); $im->string(gdSmallFont,260,125,"TAG $amino{tag} $per{tag}",$color{$amino{tag}}); $im->string(gdSmallFont,260,165,"CAT $amino{cat} $per{cat}",$color{$amino{cat}}); $im->string(gdSmallFont,260,185,"CAC $amino{cac} $per{cac}",$color{$amino{cac}}); $im->string(gdSmallFont,260,205,"CAA $amino{caa} $per{caa}",$color{$amino{caa}}); $im->string(gdSmallFont,260,225,"CAG $amino{cag} $per{cag}",$color{$amino{cag}}); $im->string(gdSmallFont,260,265,"AAT $amino{aat} $per{aat}",$color{$amino{aat}}); $im->string(gdSmallFont,260,285,"AAC $amino{aac} $per{aac}",$color{$amino{aac}}); $im->string(gdSmallFont,260,305,"AAA $amino{aaa} $per{aaa}",$color{$amino{aaa}}); $im->string(gdSmallFont,260,325,"AAG $amino{aag} $per{aag}",$color{$amino{aag}}); $im->string(gdSmallFont,260,365,"GAT $amino{gat} $per{gat}",$color{$amino{gat}}); $im->string(gdSmallFont,260,385,"GAC $amino{gac} $per{gac}",$color{$amino{gac}}); $im->string(gdSmallFont,260,405,"GAA $amino{gaa} $per{gaa}",$color{$amino{gaa}}); $im->string(gdSmallFont,260,425,"GAG $amino{gag} $per{gag}",$color{$amino{gag}}); $im->string(gdSmallFont,360,65,"TGT $amino{tgt} $per{tgt}",$color{$amino{tgt}}); $im->string(gdSmallFont,360,85,"TGC $amino{tgc} $per{tgc}",$color{$amino{tgc}}); $im->string(gdSmallFont,360,105,"TGA $amino{tga} $per{tga}",$color{$amino{tga}}); $im->string(gdSmallFont,360,125,"TGG $amino{tgg} $per{tgg}",$color{$amino{tgg}}); $im->string(gdSmallFont,360,165,"CGT $amino{cgt} $per{cgt}",$color{$amino{cgt}}); $im->string(gdSmallFont,360,185,"CGC $amino{cgc} $per{cgc}",$color{$amino{cgc}}); $im->string(gdSmallFont,360,205,"CGA $amino{cga} $per{cga}",$color{$amino{cga}}); $im->string(gdSmallFont,360,225,"CGG $amino{cgg} $per{cgg}",$color{$amino{cgg}}); $im->string(gdSmallFont,360,265,"AGT $amino{agt} $per{agt}",$color{$amino{agt}}); $im->string(gdSmallFont,360,285,"AGC $amino{agc} $per{agc}",$color{$amino{agc}}); $im->string(gdSmallFont,360,305,"AGA $amino{aga} $per{aga}",$color{$amino{aga}}); $im->string(gdSmallFont,360,325,"AGG $amino{agg} $per{agg}",$color{$amino{agg}}); $im->string(gdSmallFont,360,365,"GGT $amino{ggt} $per{ggt}",$color{$amino{ggt}}); $im->string(gdSmallFont,360,385,"GGC $amino{ggc} $per{ggc}",$color{$amino{ggc}}); $im->string(gdSmallFont,360,405,"GGA $amino{gga} $per{gga}",$color{$amino{gga}}); $im->string(gdSmallFont,360,425,"GGG $amino{ggg} $per{ggg}",$color{$amino{ggg}}); $im->string(gdSmallFont,15,465,"yellow minus charge",$yellow); $im->string(gdSmallFont,165,465,"red plus charge",$red); $im->string(gdSmallFont,285,465,"blue noncharge",$blue); $im->string(gdSmallFont,400,465,"green nonpolar",$green); $im->string(gdSmallFont,20,485,"exception",$black); $v=485; $h=100; foreach(sort keys(%exception)){ $color{$exception{$_}{amino}}=$black if($color{$exception{$_}{amino}}==""); $CoDoN=uc $_; $im->string(gdSmallFont,$h,$v,"$CoDoN $exception{$_}{amino} $exception{$_}{per}",$color{$exception{$_}{amino}}); $v+=20; if($v == 545){ $v=485; $h+=100; } } mkdir ("graph",0777); open(OUT,'>graph/'."$filename"); binmode OUT; print OUT $im->png; close(OUT); msg_gimv("graph/$filename") if($output eq "show"); } ##_codon_table2 ver.20020902-01
##scripting by Kazuharu Gaou Arakawa(gaou@g-language.org)
##Inspire interface for the codon_usage()
sub _codon_table2{ &opt_default(output=>"show",filename=>"codon_table.html"); my @args = opt_get(@_); my $result = shift @args; my $filename = opt_val("filename"); my $output = opt_val("output"); my (%amino, %data, %per, $amino_total, $amino); my (%exception, %color); foreach my $tmp (qw(D E)){ $color{$tmp} = 'gold' } foreach my $tmp (qw(R K H)){ $color{$tmp} = 'red' } foreach my $tmp (qw(N Q S T Y)){ $color{$tmp} = 'blue' } foreach my $tmp (qw(A G V L I P F M W C)){ $color{$tmp} = 'green' } $color{'/'} = 'black'; foreach $amino (keys(%{$result})){ $amino_total = 0; foreach my $codon (keys(%{$$result{$amino}})){ $amino_total += $$result{$amino}{$codon}; } foreach my $codon (keys(%{$$result{$amino}})){ if ($$result{$amino}{$codon} > $data{$codon}){ if ($data{$codon} != ''){ $exception{$codon}{amino} = $amino{$codon}; $exception{$codon}{per} = $per{$codon}; } $data{$codon} = $$result{$amino}{$codon}; $amino{$codon} = $amino; $per{$codon} = sprintf("%.3f", $$result{$amino}{$codon} / $amino_total);
} else{ $exception{$codon}{amino} = $amino; $exception{$codon}{per} = sprintf("%.3f",$$result{$amino}{$codon} / $amino_total);
} } } mkdir ('graph', 0777); open(FILE, '>graph/' . $filename); print FILE qq( <HTML> <HEAD><TITLE>Codon Table v.2</TITLE> <style type="text/css"> <!-- FONT{font-size:8pt} --> </style> </HEAD> <BODY bgcolor="white"> <table border=0 cellpadding=0 cellspacing=0><tr><td align=center valign=top> <table border rules=all cellspacing=0 cellpadding=6> <tr><td align=center rowspan=2><font color="red">first</font></td> <td colspan=4 align=center><font color="green">second</font></td> <td align=center rowspan=2><font color="blue">third</font></td> </tr><tr> <td align=center><font color="green">T</font></td> <td align=center><font color="green">C</font></td> <td align=center><font color="green">A</font></td> <td align=center><font color="green">G</font></td> </tr> ); foreach my $first (qw(T C A G)){ print FILE qq(<tr><td align=center><font color="red">$first</font></td>); foreach my $second (qw(T C A G)){ print FILE qq(<td>); foreach my $third (qw(T C A G)){ my $triplet = $first . $second . $third; print FILE qq(<table border=0 cellpadding=4 width="100\%"><tr>); print FILE '<td align=center><font color="', $color{$amino{lc($triplet)}}; print FILE '">', $triplet, '</td>', "\n"; print FILE '<td align=center><font color="', $color{$amino{lc($triplet)}}; print FILE '">', $amino{lc($triplet)}, '</td>', "\n"; print FILE '<td align=center><font color="', $color{$amino{lc($triplet)}}; print FILE '">', $per{lc($triplet)}, '</td>', "\n"; } print FILE qq(</tr></table></td>); } print FILE qq( <td align=center> <table border=0 cellpadding=4> <tr><td><font color="blue">T</font></td></tr> <tr><td><font color="blue">C</font></td></tr> <tr><td><font color="blue">A</font></td></tr> <tr><td><font color="blue">G</font></td></tr> </table> </td> ); } print FILE qq( </table><BR> <font color="gold" style="font-size:9pt">yellow minus charge</font>\& nbsp;\&nbsp;\&nbsp;\&nbsp;\&nbsp; <font color="red" style="font-size:9pt">red plus charge</font>\& nbsp;\&nbsp;\&nbsp;\&nbsp;\&nbsp; <font color="blue" style="font-size:9pt">blue noncharge</font>\& nbsp;\&nbsp;\&nbsp;\&nbsp;\&nbsp; <font color="green" style="font-size:9pt">green nonpolar</font>\& nbsp;\&nbsp;\&nbsp;\&nbsp;\&nbsp; <a href="http://prowl.rockefeller.edu/aainfo/contents.htm" style="font-size:9pt" target="_blank">(learn more here)</a>
<BR><BR>
<div align=left>exceptions:</div
> <table border=0 cellpadding=10 cellspacing=0><tr> ); my $i = 0; foreach my $triplet (sort keys(%exception)){ $color{$exception{$triplet}{amino}} = 'black' if($color{$exception{$triplet}{amino}} eq ''); print FILE '<td><font color="', $color{$exception{$triplet}{amino}}; print FILE '">', uc $triplet, '&nbsp;&nbsp;&nbsp;', $exception{$triplet}{amino}, '&nbsp;&nbsp;&nbsp;'; print FILE $exception{$triplet}{per}, '</font></td>'; if ($i == 4){ print FILE '</tr><tr>'; $i = -1; } $i ++; } print FILE qq( </tr></table> <BR> <form method="GET" action="http://www.kazusa.or.jp/codon/cgi-bin/spsearch.cgi"> Search <a href="http://www.kazusa.or.jp/codon/">Kazusa Codon Usage Database</a>: <input size="20" name="species"> <input type=hidden name="c" value="i"> <input type=submit value="Submit"> </form> ); print FILE "</td></tr></table></body></html>"; close(FILE); msg_gimv("graph/$filename") if($output eq "show"); } =head2 _codon_usage_table _codon_usage_table ver.20030401 Author: Haruo Suzuki (haruo@g-language.org) Usage: NULL = &_codon_usage_table(pointer HASH); Options: -output output option (default: 'show') -filename output filename (default: 'codon_usage_table.png') Description: This program constructs codon use table History: 20030401 minor change 20010721 first release (Author: Koya Mori) =cut sub _codon_usage_table { &opt_default(output=>'show', filename=>'codon_usage_table.png'); my @args = opt_get(@_); my $usage = shift @args; my $output = opt_val("output"); my $filename = opt_val("filename"); my $im = new GD::Image(500, 550); my $white = $im->colorAllocate(255, 255, 255); my $black = $im->colorAllocate(0, 0, 0); my $blue = $im->colorAllocate(0, 0, 255); my $red = $im->colorAllocate(255, 0, 0); my $yellow = $im->colorAllocate(200, 200, 0); my $green = $im->colorAllocate(0, 150, 0); my $x; my $y; foreach $x ((10,50,450,490)){ $im->line($x,10,$x,450,$black); } foreach $x ((150,250,350)){ $im->line($x,30,$x,450,$black); } $im->line(50,30,450,30,$black); foreach $y ((10,50,150,250,350,450)){ $im->line(10,$y,490,$y,$black); } $im->string(gdSmallFont, 15, 25, "first", $black); $im->string(gdSmallFont, 233, 15, "second", $black); $im->string(gdSmallFont, 455, 25, "third", $black); $y = 95; foreach(qw(U C A G)){ $im->string(gdSmallFont,30,$y,$_,$black);$y += 100; } $x = 90; foreach(qw(U C A G)){ $im->string(gdSmallFont,$x,35,$_,$black);$x += 100; } $y = 65; foreach(1..4){ foreach(qw(U C A G)){ $im->string(gdSmallFont,470,$y,$_,$black);$y += 20; } $y += 20; } my ($codon, $p1st, $p2nd, $p3rd, $aa, $value, $color); foreach $codon (keys %Table){ $p1st = substr($codon, 0, 1); $y = 65 if($p1st eq 't'); $y = 165 if($p1st eq 'c'); $y = 265 if($p1st eq 'a'); $y = 365 if($p1st eq 'g'); $p2nd = substr($codon, 1, 1); $x = 60 if($p2nd eq 't'); $x = 160 if($p2nd eq 'c'); $x = 260 if($p2nd eq 'a'); $x = 360 if($p2nd eq 'g'); $p3rd = substr($codon, 2, 1); $y += 20 if($p3rd eq 'c'); $y += 40 if($p3rd eq 'a'); $y += 60 if($p3rd eq 'g'); $aa = $Table{$codon}; if($codon eq 'taa'){ $value = 'Ochre'; } elsif($codon eq 'tag'){ $value = 'Amber'; } elsif($codon eq 'tga'){ $value = 'Opal'; } else{ $value = $usage->{$aa.$codon}; $value = sprintf( (int($value) - $value) ? "%.3f" : "%d", $value ); } $codon = uc($codon); $codon =~ s/T/U/g; if($aa =~ /[DE]/){ $color = $blue; } elsif($aa =~ /[RKH]/){ $color = $red; } elsif($aa =~ /[NQSTY]/){ $color = $yellow; } elsif($aa =~ /[AGVLIPFMWC]/){ $color = $green; } else { $color = $black; } $im->string(gdSmallFont,$x,$y,"$AA3LTTR{$aa} $codon $value",$color); } $im->string(gdSmallFont,20,465,"blue acidic",$blue); $im->string(gdSmallFont,100,465,"red basic",$red); $im->string(gdSmallFont,170,465,"yellow neutral & polar",$yellow); $im->string(gdSmallFont,315,465,"green neutral & hydrophobic",$green); $im->string(gdSmallFont,20,485,"exception",$black); my $h = 100; my $v = 485; my $regular = 'Agca|Agcc|Agcg|Agct|Ctgc|Ctgt|Dgac|Dgat|Egaa|Egag|Fttc|Fttt|Ggga|Gggc|Gggg|Gggt|Hcac|Hcat|Iata|Iatc|Iatt|Kaaa|Kaag|Lcta|Lctc|Lctg|Lctt|Ltta|Lttg|Matg|Naac|Naat|Pcca|Pccc|Pccg|Pcct|Qcaa|Qcag|Raga|Ragg|Rcga|Rcgc|Rcgg|Rcgt|Sagc|Sagt|Stca|Stcc|Stcg|Stct|Taca|Tacc|Tacg|Tact|Vgta|Vgtc|Vgtg|Vgtt|Wtgg|Ytac|Ytat'; foreach(sort keys %{$usage}){ next if($_ =~ /$regular/); $codon = uc(substr($_, 1, 3)); $codon =~ s/T/U/g; $value = sprintf((int($usage->{$_}) - $usage->{$_}) ? "%.3f" : "%d", $usage->{$_}); $aa = substr($_, 0, 1); $im->string(gdSmallFont,$h,$v,"$AA3LTTR{$aa} $codon $value",$black); $v += 20; if($v == 545){ $h += 100; $v = 485; } } mkdir('graph', 0777); open(OUT, ">graph/$filename"); binmode OUT; print OUT $im->png; close(OUT); msg_gimv("graph/$filename") if($output eq 'show'); } ##codon_amino_printer ver.20010722-01
##scripting by Koya Mori(mory@g-language.org)
##This program prints result of codon or amino counter.
##&codon_amino_printer(pointer codon_counter, boolean debug);
sub _codon_amino_printer{ &opt_default(output=>"stdout"); my @args=opt_get(@_); my $result=shift @args; my $printer=opt_val("output"); my $filename=opt_val("filename"); my $total; if($printer eq "f"){ open(FILE,">>$filename"); foreach(sort keys(%$result)){ print FILE $_,",",$result->{$_},"\n"; $total+=$result->{$_}; } print FILE "total,",$total,"\n"; print FILE "\n\n"; close(FILE); } else{ foreach(sort keys(%$result)){ &msg_send($_," -> ",$result->{$_},"\n"); $total+=$result->{$_}; } &msg_send("total -> ",$total,"\n"); } } ##codon_usage_printer ver.20010722-01
##scripting by Koya Mori(mory@g-language.org)
##This program prints result of codon usage.
##&codon_usage_printer(pointer codon_usage, boolean debug);
sub _codon_usage_printer{ &opt_default(output=>"stdout"); my @args=opt_get(@_); my $result=shift @args; my $printer=opt_val("output"); my $filename=opt_val("filename"); my $amino_total; my $key; my $key2; my $total; if($printer eq "f"){ open(FILE,">>$filename"); foreach $key (sort keys(%$result)){ $amino_total=0; foreach $key2 (sort keys(%{$$result{$key}})){ $amino_total+=$$result{$key}{$key2}; } foreach $key2 (sort keys(%{$$result{$key}})){ print FILE $key,",",$key2,",",$$result{$key}{$key2},",",sprintf("%.3f",$$result{$key}{$key2}/$amino_total),"\n";
$total+=$$result{$key}{$key2}; } } print FILE "total,$total\n"; print FILE "\n\n"; close(FILE); } else{ foreach $key (sort keys(%$result)){ $amino_total=0; foreach $key2 (sort keys(%{$$result{$key}})){ $amino_total+=$$result{$key}{$key2}; } foreach $key2 (sort keys(%{$$result{$key}})){ &msg_send($key," -> ",$key2," -> ",$$result{$key}{$key2}," ",sprintf("%.3f",$$result{$key}{$key2}/$amino_total),"\n");
$total+=$$result{$key}{$key2}; } } &msg_send("total -> $total\n"); } } sub DESTROY { my $self = shift; } 1; __END__ ## Below is the stub of documentation for your module. You better edit it!
=head1 NAME G::Seq::Codon - Perl extension for blah blah blah =head1 SYNOPSIS use G::Seq::Codon; blah blah blah =head1 DESCRIPTION Stub documentation for G::Seq::Codon was created by h2xs. It looks like the author of the extension was negligent enough to leave the stub unedited. Blah blah blah. =head1 AUTHOR A. U. Thor, a.u.thor@a.galaxy.far.far.away =head1 SEE ALSO perl(1). =cut
}
codon_compilerdescriptiontopprevnext
sub codon_compiler {
    &opt_default(output=>'show',filename=>'usage.csv',id=>'All',translate=>0,
		 del_key=>'[^ACDEFGHIKLMNPQRSTVWYatgcU]',type=>'C0',rank=>0);
    my @args = opt_get(@_);
    my $gb = opt_as_gb(shift @args);
    my $output = opt_val("output");
    my $filename = opt_val("filename");
    my $id = opt_val("id");
    my $type = opt_val("type");
    my $rank = opt_val("rank");
    my $translate = opt_val("translate");
    my $del_key = opt_val("del_key");

    if($type !~ /A[01]|C[01]|^C[2-5]/){ &msg_error("$type: undefined type of amino acid and codon usage.\n"); return; }

    my $lena = ($type =~ /(A+)/) ? length($&) : 1;
    my $lenc = ($type =~ /(C+)/) ? length($&) : 1;

    my $seq;
    my $i;
    my $aaseq;
    my $aa;
    my $codon;
    my %count;
    my $omit;
    
    if($id =~ /FEATURE/){ ## for a single gene
$seq = substr(lc($gb->get_geneseq($id)), $gb->{$id}->{codon_start} - 1 + 3, -3); ## excluding start and stop codons
if($translate){ while($i <= length($seq) - 3){ my $codon = substr($seq, $i, 3); $aaseq .= ($codon =~ /[^atgc]/) ? '?' : $Table{$codon}; $i += 3; } $i = 0; } else { $aaseq = substr($gb->{$id}->{translation}, 1); } while($aaseq){ if($type =~ /A/ && length($aaseq) >= $lena){ $count{substr($aaseq, 0, $lena)} ++; } $codon = substr($seq, $i, 3 * $lenc); if($type =~ /C/ && length($codon) == 3 * $lenc){ $count{substr($aaseq, 0, $lenc).$codon} ++; } substr($aaseq, 0, 1) = ''; $i += 3; } } else { ## for a group of genes
my $errormsg; foreach my $cds ($gb->cds()){ next if( $id ne 'All' && !($gb->{$cds}->{$id}) ); $seq = substr(lc($gb->get_geneseq($cds)), $gb->{$cds}->{codon_start} - 1 + 3, -3); ## omitting CDS with illegal characteristics
if($type =~ /C/ && $id eq 'All'){ if(length($seq) % 3){ unless($gb->{errormsg_codon_compiler}){ $errormsg ++; &msg_error("$cds:\tillegal length\n"); } $omit .= "$cds|"; next; } if($gb->{$cds}->{note} =~ /Protein sequence is in conflict with the conceptual translation$/){ unless($gb->{errormsg_codon_compiler}){ $errormsg ++; &msg_error("$cds:\t", $gb->{$cds}->{note}, "\n"); } $omit .= "$cds|"; next; } } ## result of 'A0' is not necessarily equal to that of 'A0C0'
if($translate){ while($i <= length($seq) - 3){ my $codon = substr($seq, $i, 3); $aaseq .= ($codon =~ /[^atgc]/) ? '?' : $Table{$codon}; $i += 3; } $i = 0; } else { $aaseq = substr($gb->{$cds}->{translation}, 1); if($type =~ /C/ && $id eq 'All' && length($seq) / 3 != length($aaseq) && !($gb->{errormsg_codon_compiler})){
$errormsg ++; &msg_error("$cds:\t", length($aaseq), " amino acids, but ", length($seq)/3, " codons\n"); } } while($aaseq){ if($type =~ /A/ && length($aaseq) >= $lena){ $count{substr($aaseq, 0, $lena)} ++; } $codon = substr($seq, $i, 3 * $lenc); if($type =~ /C/ && length($codon) == $lenc * 3){ $count{substr($aaseq, 0, $lenc).$codon} ++; } substr($aaseq, 0, 1) = ''; $i += 3; } $i = 0; } $gb->{errormsg_codon_compiler} = 1 if($errormsg);
}
newdescriptiontopprevnext
sub new {
    my $pkg = shift;
    my $filename = shift;
    my $option = shift;
    my $this;

    return $this;
}

General documentation

No general documentation available.