In local installation of Genome Projector, annotations are stored in the data folders inside of each View folders. (For directory structures within Genome Projector installation, see here). Each of these annotation files are named with the accession number of genomes as with zoomable map files, and these files are basically tab-delimited annotation table. Here each line represents one feature such as CDS, tRNA, and rRNA. Example with Escherichia coli K12 Genome Map is given below.
FEATURE2 CDS 197 91 213 98 thrL b0001 16127995 LPT_ECOL6 thr operon leader peptide go_process: threonine biosynthesis [goid 0009088] FEATURE4 CDS 234 91 849 98 thrA b0002 16127996 AK1H_ECOLI bifunctional aspartokinase I/homeserine dehydrog enase I multifunctional homotetrameric enzyme that catalyze s the phosphorylation of aspartate to form aspartyl-4-phos phate as well as conversion of aspartate semialdehyde to h omoserine; functions in a number of amino acid biosyntheti c pathways; go_component: cytoplasm [goid 0005737]; go_pro cess: threonine biosynthesis [goid 0009088]; go_process: m ethionine biosynthesis [goid 0009086]; go_process: homoser ine biosynthesis [goid 0009090] FEATURE6 CDS 850 91 1083 98 thrB b0003 16127997 KHSE_ECOLI homoserine kinase catalyzes the formation of O-phospho-L-homoserine f rom L-homoserine in threonine biosynthesis from asparate; go_component: cytoplasm [goid 0005737]; go_process: threon ine biosynthesis [goid 0009088] FEATURE8 CDS 1083 91 1405 98 thrC b0004 16127998 THRC_ECOLI threonine synthase catalyzes the formation of L-threonine fromO-phosph o-L-homoserine; go_component: cytoplasm [goid 0005737]; go _process: threonine biosynthesis [goid 0009088] FEATURE10 CDS 1458 91 1532 98 yaaX b0005 16127999 YAAX_ECOLI hypothetical protein orf, hypothetical protein FEATURE12 CDS 1570 102 1764 109 yaaA b0006 16128000 YAAA_ECOLI hypothetical protein FEATURE14 CDS 1782 102 2139 109 yaaJ b0007 16128001 YAAJ_ECOLI predicted transporter inner membrane transport protein FEATURE16 CDS 2209 91 2447 98 talB b0008 16128002 TAL1_SHISS transaldolase B go_component: cytoplasm [goid 0005737]; go_process: pentose-phosphate shunt, non-oxidative branch [goid 00090 52] FEATURE18 CDS 2476 91 2623 98 mogA b0009 16128003 MOG_SHIFL molybdenum cofactor biosynthesis protein forms a trimer; related to eukaryotic protein gephy rin; functions during molybdenum cofactor biosynthesis; go _component: cytoplasm [goid 0005737]; go_process: Mo-molyb dopterin cofactor biosynthesis [goid 0006777] FEATURE20 CDS 2632 102 2773 109 yaaH b0010 16128004 YAAH_ECO57 conserved inner membrane protein associated with acetate transport orf, hypothetical protein FEATURE22 CDS 2810 102 2989 109 yaaW b0011 16128005 YAAW_ECOLI hypothetical protein FEATURE24 CDS 2857 91 2978 98 htgA b0012 90111078 HTGA_ECOLI hypothetical protein positive regulator for sigma 32 heat shock promoter s FEATURE26 CDS 2995 102 3096 109 yaaI b0013 16128007 YAAI_ECOLI hypothetical protein orf, hypothetical protein FEATURE28 CDS 3190 91 3669 98 dnaK b0014 16128008 DNAK_ECOLI molecular chaperone DnaK heat shock protein 70; assists in folding of nascen t polypeptide chains; refolding of misfolded proteins; uti lizes ATPase activity to help fold; co-chaperones are DnaJ and GrpE; multiple copies in some bacteria; go_component: cytoplasm [goid 0005737]; go_process: protein folding [go id 0006457]; go_process: response to osmotic stress [goid 0006970] FEATURE30 CDS 3692 91 3974 98 dnaJ b0015 16128009 DNAJ_ECOLI chaperone Hsp40, co-chaperone with DnaK chaperone with DnaK; heat shock protein FEATURE33 CDS 4011 91 4289 98 insL-1 b0016 16128010 INSL_ECOLI IS186/IS421 transposase FEATURE35 CDS 4337 102 4390 109 mokC b0018 16128012 MOKC_ECOLI regulatory protein for HokC, overlaps CDS of hok C regulatory peptide whose translation enables hokC ( gef) expression FEATURE37 CDS 4337 102 4375 109 hokC b4412 49175991 HOKC_SHIFL toxic membrane protein, small small toxic membrane polypeptide FEATURE41 CDS 4522 91 4813 98 nhaA b0019 16128013 NHAA_ECOLI pH-dependent sodium/proton antiporter exports sodium by using the electrochemical proton gradient to allow protons into the cell; functions in adap tation to high salinity and alkaline pH; activity increase s at higher pH; downregulated at acidic pH; go_component: inner membrane [goid 0019866]; go_process: response to pH [goid 0009268] FEATURE43 CDS 4828 91 5055 98 nhaR b0020 16128014 NHAR_ECOLI DNA-binding transcriptional activator transcriptional activator of nhaA FEATURE46 CDS 5102 102 5228 109 insB-1 b0021 16128015 INSB1_ECOLI IS1 transposase InsAB' FEATURE48 CDS 5208 102 5277 109 insA-1 b0022 16128016 INSA_ECOLI IS1 repressor protein InsA FEATURE50 CDS 5353 102 5419 109 rpsT b0023 16128017 RS20_SHIDS 30S ribosomal protein S20 binds directly to the 16S rRNA and is involved in p ost-translational inhibition of arginine and ornithine dec arboxylase; go_component: cytosolic ribosome (sensu Bacter ia) [goid 0009281]; go_component: cytoplasm [goid 0005737] ; go_function: structural constituent of ribosome [goid 00 03735]; go_process: protein biosynthesis [goid 0006412] FEATURE52 CDS 5445 91 5499 98 yaaY b0024 16128018 YAAY_ECOLI hypothetical protein orf, hypothetical protein FEATURE54 CDS 5501 91 5737 98 ribF b0025 16128019 RIBF_ECOLI hypothetical protein go_component: cytoplasm [goid 0005737]
Here the rows are:
- G-language Feature ID
- feature type
- x1 coordinate (top left coordinate of box representing this gene in Genome Projector Genome Map View)
- y1 coordinate (same as above)
- x2 coordinate (bottom right coordinate of box)
- y2 coordinate (same as above)
- gene name
- locus tag
- NCBI gi
- UniProt Entry ID
- PDB ID (not present in the above example)
Similarly, when custom Genome Projector Views are generated, .coord files are generated in data folder within working directory along with the folder containing generated zoomable map. This file contains the following:
- G-language Feature ID
- feature type
- gene name
Genome Projector searches through this file with the given keywords, identifying the line (i.e. gene) that matches.
GFF(General Feature Format) is a common format used to store annotation in a portable manner. Similar to the way Genome Projector annotations are stored, each line of GFF carries an annotation for a given feature. Therefore, annotation in GFF can be easily added onto the Genome Projector annotation by concatenating the lines representing the same feature.
This is done by UNIX shell command join. For example, joining of a Genome Projector annotation (gene name is in the third column) and GFF annotation (gene name is in the first column) can be achieved by the following command:
join -t " " -1 3 -2 1 data.coord annotation.gff > result.coord
This command joins the 3rd column of first file with the 1st column of second file, adding a “Tab” in between. Tab can be input in shell by typing “Control-v Tab”.
When annotation is added to Genome Projector conserving the first 13 columns as described above, Genome Projector simply handles the newly added annotations as extended note field. In order to change what is displayed upon searching, a little programming is required.
Two CGI files are responsible for showing the annotations: index.cgi and info.cgi. index.cgi is responsible for displaying the search results (pins). info.cgi shows the actual annotations in balloons.
Inside of each CGI files, there is a line parsing the annotation file as follows:
my ($feat, $typed, $x, $y, $x2, $y2, $gene, $locustag, $gi, $sw, $pdb, $annotation, $note) = split(/\t/, $_, 12);
Changing this code to match the database content and modifying the subsequent codes for data presentation can modify how the search results and annotations are displayed.