Using custom database for searching

How the annotations are stored

In local installation of Genome Projector, annotations are stored in the data folders inside of each View folders. (For directory structures within Genome Projector installation, see here). Each of these annotation files are named with the accession number of genomes as with zoomable map files, and these files are basically tab-delimited annotation table. Here each line represents one feature such as CDS, tRNA, and rRNA. Example with Escherichia coli K12 Genome Map is given below.

FEATURE2        CDS     197     91      213     98      thrL    b0001   16127995        LPT_ECOL6               thr operon leader peptide       go_process: threonine biosynthesis [goid 0009088]
FEATURE4        CDS     234     91      849     98      thrA    b0002   16127996        AK1H_ECOLI              bifunctional aspartokinase I/homeserine dehydrog enase I        multifunctional homotetrameric enzyme that catalyze s the phosphorylation of aspartate to form aspartyl-4-phos phate as well as conversion of aspartate semialdehyde to h omoserine; functions in a number of amino acid biosyntheti c pathways; go_component: cytoplasm [goid 0005737]; go_pro cess: threonine biosynthesis [goid 0009088]; go_process: m ethionine biosynthesis [goid 0009086]; go_process: homoser ine biosynthesis [goid 0009090]
FEATURE6        CDS     850     91      1083    98      thrB    b0003   16127997        KHSE_ECOLI              homoserine kinase       catalyzes the formation of O-phospho-L-homoserine f rom L-homoserine in threonine biosynthesis from asparate; go_component: cytoplasm [goid 0005737]; go_process: threon ine biosynthesis [goid 0009088]
FEATURE8        CDS     1083    91      1405    98      thrC    b0004   16127998        THRC_ECOLI              threonine synthase      catalyzes the formation of L-threonine fromO-phosph o-L-homoserine; go_component: cytoplasm [goid 0005737]; go _process: threonine biosynthesis [goid 0009088]
FEATURE10       CDS     1458    91      1532    98      yaaX    b0005   16127999        YAAX_ECOLI              hypothetical protein    orf, hypothetical protein
FEATURE12       CDS     1570    102     1764    109     yaaA    b0006   16128000        YAAA_ECOLI              hypothetical protein    
FEATURE14       CDS     1782    102     2139    109     yaaJ    b0007   16128001        YAAJ_ECOLI              predicted transporter   inner membrane transport protein
FEATURE16       CDS     2209    91      2447    98      talB    b0008   16128002        TAL1_SHISS              transaldolase B go_component: cytoplasm [goid 0005737]; go_process: pentose-phosphate shunt, non-oxidative branch [goid 00090 52]
FEATURE18       CDS     2476    91      2623    98      mogA    b0009   16128003        MOG_SHIFL               molybdenum cofactor biosynthesis protein        forms a trimer; related to eukaryotic protein gephy rin; functions during molybdenum cofactor biosynthesis; go _component: cytoplasm [goid 0005737]; go_process: Mo-molyb dopterin cofactor biosynthesis [goid 0006777]
FEATURE20       CDS     2632    102     2773    109     yaaH    b0010   16128004        YAAH_ECO57              conserved inner membrane protein associated with acetate transport      orf, hypothetical protein
FEATURE22       CDS     2810    102     2989    109     yaaW    b0011   16128005        YAAW_ECOLI              hypothetical protein    
FEATURE24       CDS     2857    91      2978    98      htgA    b0012   90111078        HTGA_ECOLI              hypothetical protein    positive regulator for sigma 32 heat shock promoter s
FEATURE26       CDS     2995    102     3096    109     yaaI    b0013   16128007        YAAI_ECOLI              hypothetical protein    orf, hypothetical protein
FEATURE28       CDS     3190    91      3669    98      dnaK    b0014   16128008        DNAK_ECOLI              molecular chaperone DnaK        heat shock protein 70; assists in folding of nascen t polypeptide chains; refolding of misfolded proteins; uti lizes ATPase activity to help fold; co-chaperones are DnaJ and GrpE; multiple copies in some bacteria; go_component: cytoplasm [goid 0005737]; go_process: protein folding [go id 0006457]; go_process: response to osmotic stress [goid 0006970]
FEATURE30       CDS     3692    91      3974    98      dnaJ    b0015   16128009        DNAJ_ECOLI              chaperone Hsp40, co-chaperone with DnaK chaperone with DnaK; heat shock protein
FEATURE33       CDS     4011    91      4289    98      insL-1  b0016   16128010        INSL_ECOLI              IS186/IS421 transposase 
FEATURE35       CDS     4337    102     4390    109     mokC    b0018   16128012        MOKC_ECOLI              regulatory protein for HokC, overlaps CDS of hok C      regulatory peptide whose translation enables hokC ( gef) expression
FEATURE37       CDS     4337    102     4375    109     hokC    b4412   49175991        HOKC_SHIFL              toxic membrane protein, small   small toxic membrane polypeptide
FEATURE41       CDS     4522    91      4813    98      nhaA    b0019   16128013        NHAA_ECOLI              pH-dependent sodium/proton antiporter   exports sodium by using the electrochemical proton gradient to allow protons into the cell; functions in adap tation to high salinity and alkaline pH; activity increase s at higher pH; downregulated at acidic pH; go_component: inner membrane [goid 0019866]; go_process: response to pH [goid 0009268]
FEATURE43       CDS     4828    91      5055    98      nhaR    b0020   16128014        NHAR_ECOLI              DNA-binding transcriptional activator   transcriptional activator of nhaA
FEATURE46       CDS     5102    102     5228    109     insB-1  b0021   16128015        INSB1_ECOLI             IS1 transposase InsAB'  
FEATURE48       CDS     5208    102     5277    109     insA-1  b0022   16128016        INSA_ECOLI              IS1 repressor protein InsA      
FEATURE50       CDS     5353    102     5419    109     rpsT    b0023   16128017        RS20_SHIDS              30S ribosomal protein S20       binds directly to the 16S rRNA and is involved in p ost-translational inhibition of arginine and ornithine dec arboxylase; go_component: cytosolic ribosome (sensu Bacter ia) [goid 0009281]; go_component: cytoplasm [goid 0005737] ; go_function: structural constituent of ribosome [goid 00 03735]; go_process: protein biosynthesis [goid 0006412]
FEATURE52       CDS     5445    91      5499    98      yaaY    b0024   16128018        YAAY_ECOLI              hypothetical protein    orf, hypothetical protein
FEATURE54       CDS     5501    91      5737    98      ribF    b0025   16128019        RIBF_ECOLI              hypothetical protein    go_component: cytoplasm [goid 0005737]

Here the rows are:

  1. G-language Feature ID
  2. feature type
  3. x1 coordinate (top left coordinate of box representing this gene in Genome Projector Genome Map View)
  4. y1 coordinate (same as above)
  5. x2 coordinate (bottom right coordinate of box)
  6. y2 coordinate (same as above)
  7. gene name
  8. locus tag
  9. NCBI gi
  10. UniProt Entry ID
  11. PDB ID (not present in the above example)
  12. product
  13. note

Similarly, when custom Genome Projector Views are generated, .coord files are generated in data folder within working directory along with the folder containing generated zoomable map. This file contains the following:

  1. G-language Feature ID
  2. feature type
  3. gene name
  4. coordinates

Genome Projector searches through this file with the given keywords, identifying the line (i.e. gene) that matches.

Adding custom annotation to your maps with GFF

GFF(General Feature Format) is a common format used to store annotation in a portable manner. Similar to the way Genome Projector annotations are stored, each line of GFF carries an annotation for a given feature. Therefore, annotation in GFF can be easily added onto the Genome Projector annotation by concatenating the lines representing the same feature.

This is done by UNIX shell command join. For example, joining of a Genome Projector annotation (gene name is in the third column) and GFF annotation (gene name is in the first column) can be achieved by the following command:

join -t "	" -1 3 -2 1 data.coord annotation.gff > result.coord

This command joins the 3rd column of first file with the 1st column of second file, adding a “Tab” in between. Tab can be input in shell by typing “Control-v Tab”.

Modifying displayed annotations

When annotation is added to Genome Projector conserving the first 13 columns as described above, Genome Projector simply handles the newly added annotations as extended note field. In order to change what is displayed upon searching, a little programming is required.

Two CGI files are responsible for showing the annotations: index.cgi and info.cgi. index.cgi is responsible for displaying the search results (pins). info.cgi shows the actual annotations in balloons.

Inside of each CGI files, there is a line parsing the annotation file as follows:

my ($feat, $typed, $x, $y, $x2, $y2, $gene, $locustag, $gi, $sw, $pdb, $annotation, $note)
        = split(/\t/, $_, 12);

Changing this code to match the database content and modifying the subsequent codes for data presentation can modify how the search results and annotations are displayed.

using_custom_database_for_searching.txt · Last modified: 2009/04/19 15:41 (external edit)