G::Seq Operon
SummaryIncluded librariesPackage variablesDescriptionGeneral documentationMethods
Summary
G::Seq::Operon - Retrieves operon information for bacterial genoms
Package variables
No package variables defined.
Included modules
G::Messenger
LWP::Simple
SelfLoader
SubOpt
Inherit
Exporter
Synopsis
No synopsis!
Description
    This class is a part of G-language Genome Analysis Environment, 
    collecting sequence analysis methods related to operons.
Methods
set_operonDescriptionCode
Methods description
set_operoncode    nextTop
  Name: set_operon   -   set operon information from RegulonDB/ODB

  Description:
    This program retrieves the operon information from 
      1. RegulonDB for Escherichia coli K12 chromosome
      2. DOOR for other bacteria and plasmids
    and annotation to the given genome data. 

    Two attributes are added to each CDS hash.
        $genome->{$cds}->{operon}
    contains the name of the operon to which the gene belongs, and 
        $genome->{$cds}->{operonN}
    contains the rank order of the gene within the operon.
        $genome->{$cds}->{operonEvidence}
    contains evidence of the operon information (available for E.coli only)

  Usage:
    set_operon($gb);

 Options:
   None.

  References:
   1. Gama-Castro S et al. (2008) "RegulonDB (version 6.0): gene regulation model 
      of Escherichia coli K-12 beyond transcription, active (experimental) annotated 
      promoters and Textpresso navigation.", Nucleic Acids Res. 1;36(Database issue):D120-4

   2. Mao F et al. (2009) "DOOR: a database for prokaryotic operons", 
      Nucleic Acids Res. 1;37(Database issue):D459-D463

  Author: 
    Kazuharu Arakawa (gaou@sfc.keio.ac.jp)
History: 20090322-01 added support for DOOR (stopped using Operon Database) 20090313-01 added support for other organisms using Operon Database 20090313-02 modified to match latest version of RegulonDB format (6.0) 20070829-01 patched to match latest version of RegulonDB formamt (patch by Hiroyuki Nakamura <t04632hn@sfc.keio.ac.jp>
20061003-01 updated to use data from RegulonDB
20020207-01 initial posting
Methods code
set_operondescriptionprevnextTop
sub set_operon {
    my @args = opt_get(@_);
    my $gb = opt_as_gb(shift @args);

    if ($gb->{LOCUS}->{id} eq 'U00096' || $gb->{LOCUS}->{id} eq 'NC_000913'){

	my $url = "http://regulondb.ccg.unam.mx:80/data/OperonSet.txt";
	my $dir = $ENV{HOME} . '/.glang/data/OperonSet.txt';
	mirror($url, $dir);
	die("setOperon: cannot retrieve data from RegulonDB.") unless(-e $dir);

	my $flag = 0;
	open(FILE, $dir) || die($!);
	while (<FILE>) {
	    chomp;
	    my $line = $_;

	    if (/^Columns\:/) {
	      $flag++;
	      next;
	    }
	    elsif(/^\s+\(\d\)\s/) {
	      $flag++;
	      next;
	    }

	    if($flag == 6){
		my %geneOrder;

		my ($operon, $num, $direction, $genes, $evidence) = split(/\t/, $_, 5);
		next unless($num >= 2);

		foreach my $genepair (split(/,/, $genes)){
		    my ($gene, $locustag) = split(/\|/, $genepair, 2);
		    my $cds = $gb->gene2id($locustag);

		    $cds = $gb->gene2id($gene) unless(length $cds);

		    if($cds){
			$gb->{$cds}->{operon} = $operon;
			$gb->{$cds}->{operonEvidence} = $evidence;
			$geneOrder{$cds} = $gb->{$cds}->{start};
		    }
		}

		my $i = 1;
		if($direction eq 'forward'){
		    foreach my $cds (sort {$geneOrder{$a} <=> $geneOrder{$b}} keys %geneOrder){
			$gb->{$cds}->{operonN} = $i;
			$i ++;
		    }
		}else{
		    foreach my $cds (sort {$geneOrder{$b} <=> $geneOrder{$a}} keys %geneOrder){
			$gb->{$cds}->{operonN} = $i;
			$i ++;
		    }
		}
	    }else{
		$line =~ s/[^a-zA-Z0-9\-,\.\(\):\"\' ]//g;
		msg_error($line, "\n");
	    }
	}
	close(FILE);

	foreach my $cds ($gb->cds()){
	    $gb->{$cds}->{operonN} = 0 unless(length $gb->{$cds}->{operon});
	}

    }else{
	my $url = 'http://csbl1.bmb.uga.edu/OperonDB/downloadNCoperon.php?NC_id=' . $gb->{LOCUS}->{id};
	my $dir = $ENV{HOME} . '/.glang/data/Operon' . $gb->{LOCUS}->{id} . '.txt';
	mirror($url, $dir);
	die("No Operon data for this species.\n\n") unless(-e $dir);

	my $data = {};
	open(FILE, $dir) || die($!);
	while (<FILE>) {
	    chomp;
	    my ($operonName, $gi, $gene, undef) = split(/\s+/, $_, 4);
	    push(@{$data->{$operonName}}, $gene);
	}
	close(FILE);

	foreach my $operonName (keys %{$data}){
	    my @list = @{$data->{$operonName}};
	    @list = reverse(@operons) if($gb->{$operons[0]}->{direction} eq 'complement');
	    
	    my $i = 1;
	    foreach my $cds (@list){
		$gb->{$cds}->{operon} = $operonName;
		$gb->{$cds}->{operonN} = $i;
		$i ++;
	    }
	}
	
	foreach my $cds ($gb->cds()){
	    $gb->{$cds}->{operonN} = 0 unless(length $gb->{$cds}->{operon});
	}
    }

    return $gb;
}
General documentation
No general documentation available.