User Tools

Site Tools


tutorialgcskewenglish

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
tutorialgcskewenglish [2007/10/22 23:50]
gaou
tutorialgcskewenglish [2014/01/18 07:44] (current)
Line 1: Line 1:
 ====== Introduction ====== ====== Introduction ======
  
-GC skew is a parameter that represents the amount bias of G and C in a single DNA molecule strand and its formula is (C-G)/​(C+G). ​ Chargaff'​s salt distribution law's second term defines that the quantity of G and C becomes almost the same,  this phenomena occurs ​ in the hole genome, but bias can be seen in small set  regions. In fact in some bacterium ​this tendency can be seen to be shifting elegantly in different places, moreover it is known that those places match replication/​ending points. There is two theories about how GC skew phenomena occurs, these are different mutation probability in leading strand and lagging strand and mutation bias by usage of codons. ​  +GC skew is a parameter that represents the amount bias of G and C in a single DNA molecule strand and its formula is (C-G)/​(C+G). ​ Chargaff'​s salt distribution law's second term defines that the quantity of G and C becomes almost the same,  this phenomena occurs ​ in the hole genome, but bias can be seen in small set  regions. In fact in some bacteria ​this tendency can be seen to be shifting elegantly in different places, moreover it is known that those places match replication ​origin/terminus. There are two theories about how GC skew phenomena occurs, these are different mutation probability in leading strand and lagging strand and mutation bias by usage of codons. ​  
  
  
Line 28: Line 28:
  
 >​filename:​ test.pl >​filename:​ test.pl
 +<code perl>
   use G;    use G; 
   $gb = new G("​ecoli"​); ​   $gb = new G("​ecoli"​); ​
 +</​code>​
  
 For testing execute the next script. Out put as the following will appear. For testing execute the next script. Out put as the following will appear.
Line 49: Line 50:
 To use local genome flatfile other than Genbank (Fasta, EMBL, swiss, SCF, PIR, GCG, raw, ace, etc), To use local genome flatfile other than Genbank (Fasta, EMBL, swiss, SCF, PIR, GCG, raw, ace, etc),
  
 +<code perl>
    use G;     use G; 
    $gb = new G("​ecoli.fasta",​ "​Fasta"​); ​    $gb = new G("​ecoli.fasta",​ "​Fasta"​); ​
 +</​code>​
  
 specify the data format as second argument. specify the data format as second argument.
Line 63: Line 66:
  
 > filename: test.pl > filename: test.pl
 +<code perl>
   use G;    use G; 
   $gb = new G("​ecoli"​); ​   $gb = new G("​ecoli"​); ​
   gcskew($gb); ​   gcskew($gb); ​
 +</​code>​
  
 a graph as the below will appear. a graph as the below will appear.
Line 76: Line 80:
 Data loaded while G-language System startup is all stored in $gb. For example all the base sequence is Data loaded while G-language System startup is all stored in $gb. For example all the base sequence is
  
 +<code perl>
   $gb->​{SEQ} ​   $gb->​{SEQ} ​
 +</​code>​
  
 inside the above. The majority of standard functions can function by giving it this $gb. inside the above. The majority of standard functions can function by giving it this $gb.
Line 102: Line 108:
  
 options is options is
 +<code perl>
   gcskew($gb, -window=>​50000,​ -at=>​1); ​   gcskew($gb, -window=>​50000,​ -at=>​1); ​
 +</​code>​
 as the above "​-"​ is put on the head of the option name , and the value is connected with "​=>"​. as the above "​-"​ is put on the head of the option name , and the value is connected with "​=>"​.
  
Line 112: Line 118:
  
 > filename: test.pl > filename: test.pl
 +<code perl>
   use G;    use G; 
   $gb = new G("​ecoli"​); ​   $gb = new G("​ecoli"​); ​
   gcskew($gb, -window=>​50000,​ -filename=>"​gcskew50k.gif"​); ​   gcskew($gb, -window=>​50000,​ -filename=>"​gcskew50k.gif"​); ​
   gcskew($gb, -window=>​50000,​ -at=>1, -filename=>"​atskew50k.gif"​); ​   gcskew($gb, -window=>​50000,​ -at=>1, -filename=>"​atskew50k.gif"​); ​
 +</​code>​
 the graph is as the following the graph is as the following
  
Line 134: Line 140:
 === Exercise 2: === === Exercise 2: ===
  See the difference in GC skew and AT skew tendencies between different species. Also see if something different can be seen when changing the window size.  See the difference in GC skew and AT skew tendencies between different species. Also see if something different can be seen when changing the window size.
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
  
 ====== Step 3-GC skew seen from different angles ​ ====== ====== Step 3-GC skew seen from different angles ​ ======
Line 152: Line 148:
  
 Cumulative GC skew will be useful to identify shift points. Cumulative GC skew accumulates GC skew value per window, remarkable shift point apparition is its feature. In G-language System this cumulative GC skew can be calculated using the option -cumulative=>​1. Cumulative GC skew will be useful to identify shift points. Cumulative GC skew accumulates GC skew value per window, remarkable shift point apparition is its feature. In G-language System this cumulative GC skew can be calculated using the option -cumulative=>​1.
 +<code perl>
    ​gcskew($gb,​ -window=>​50000,​ -cumulative=>​1); ​    ​gcskew($gb,​ -window=>​50000,​ -cumulative=>​1); ​
 +</​code>​
 when it is as the above it is possible to execute. For example it should have been displayed as below. ​ when it is as the above it is possible to execute. For example it should have been displayed as below. ​
  
Line 162: Line 158:
  
 So now, lets predict the replication origin/​terminus using cumulative GC skew. G-language System is equipped in standard with function find_ori_ter(),​ which uses cumulative GC skew to predict replication origin/​terminus inside it. So now, lets predict the replication origin/​terminus using cumulative GC skew. G-language System is equipped in standard with function find_ori_ter(),​ which uses cumulative GC skew to predict replication origin/​terminus inside it.
 +<code perl>
    ​find_ori_ter($gb,​ -window=>​500); ​    ​find_ori_ter($gb,​ -window=>​500); ​
 +</​code>​
 It is used as the above. It is used as the above.
  
Line 183: Line 179:
 |-application |It is an application to show image(in default it is set to gimp)| |-application |It is an application to show image(in default it is set to gimp)|
  
 +<code perl>
    ​genomicskew($gb,​ -divide=>​250); ​    ​genomicskew($gb,​ -divide=>​250); ​
 +</​code>​
  
 lets execute it as the above. Notice that the first argument is $gb. lets execute it as the above. Notice that the first argument is $gb.
Line 210: Line 208:
  
 For example when looking for the relation of each gene's occurence and second codon, usage of GC skew to detect the tendency near it, is something that can be thought. I will write the way to express this using the G-language System as one example. For example when looking for the relation of each gene's occurence and second codon, usage of GC skew to detect the tendency near it, is something that can be thought. I will write the way to express this using the G-language System as one example.
 +<code perl>
    use G;     use G; 
    $gb = new G("​ecoli"​); ​    $gb = new G("​ecoli"​); ​
Line 225: Line 223:
       }        } 
     }      } 
 +</​code>​
 First start up the G-language System and load the genome data base. Using the cai() formula, it inserts the CAI value(Codon Adaptation Index: It is a parameter of translation efficiencies but can be used as a parameter for gene development amount) . Bias of codon usage value, the W value can be obtained by w_value() and store it to $w_val. First start up the G-language System and load the genome data base. Using the cai() formula, it inserts the CAI value(Codon Adaptation Index: It is a parameter of translation efficiencies but can be used as a parameter for gene development amount) . Bias of codon usage value, the W value can be obtained by w_value() and store it to $w_val.
  
 +<code perl>
    ​foreach $cds ($gb->​cds()){ ​    ​foreach $cds ($gb->​cds()){ ​
        
    ​} ​    ​} ​
 +</​code>​
  
 is the most basic way to process every CDS using the G-language System. $gb->​cds() returns all names inside the genome data base in $gb. In other words, by doing foreach, it is possible to analyse all CDS. is the most basic way to process every CDS using the G-language System. $gb->​cds() returns all names inside the genome data base in $gb. In other words, by doing foreach, it is possible to analyse all CDS.
Line 240: Line 240:
 structure body such as the above stored with FEATURE information. It is to say that each CDS information is in a structure body with a name as CDS+number, and information is accessed to each hierarchicaly as $gb->​{CDS534}->​{start}. structure body such as the above stored with FEATURE information. It is to say that each CDS information is in a structure body with a name as CDS+number, and information is accessed to each hierarchicaly as $gb->​{CDS534}->​{start}.
  
 +<code perl>
       $secondcodon = $gb->​after_startcodon($cds,​ 3);        $secondcodon = $gb->​after_startcodon($cds,​ 3); 
       $w_second = $$w_val{$secondcodon}; ​       $w_second = $$w_val{$secondcodon}; ​
       $cai = $gb->​{$cds}->​{cai}; ​       $cai = $gb->​{$cds}->​{cai}; ​
 +</​code>​
  
 this part, first takes three letters after the start codon of $cds with standard formula after_startcodon(),​ and inputs it to $secondcodon. Also it acquires the W value and the CAI value of that gene as well. this part, first takes three letters after the start codon of $cds with standard formula after_startcodon(),​ and inputs it to $secondcodon. Also it acquires the W value and the CAI value of that gene as well.
  
 +<code perl>
       if ($w_second < 0.5 && $cai > 0.8){        if ($w_second < 0.5 && $cai > 0.8){ 
          ​$afterstart = $gb->​after_startcodon($cds,​ 99);           ​$afterstart = $gb->​after_startcodon($cds,​ 99); 
          ​gcskew(\$afterstart,​ -window=>​9,​ -filename=>"​$cds-gcskew.gif"​); ​          ​gcskew(\$afterstart,​ -window=>​9,​ -filename=>"​$cds-gcskew.gif"​); ​
       }        } 
 +</​code>​
  
 this section is for watching the start codon down stream GC skew 99 bp in a window every 9 bp (three codons) when the CAI value is more or equal to 0.8, or in other words, genes placed just after the start codon with high development quantity and with a W value of less or equal to 0.5, a  rear codon. ​     this section is for watching the start codon down stream GC skew 99 bp in a window every 9 bp (three codons) when the CAI value is more or equal to 0.8, or in other words, genes placed just after the start codon with high development quantity and with a W value of less or equal to 0.5, a  rear codon. ​    
tutorialgcskewenglish.1193097056.txt.gz ยท Last modified: 2014/01/18 07:44 (external edit)