Introduction

Many bacteria have circular chromosomes unlike eukaryotes, exhibiting various symmetries and polarities including the strand preference of genes between the leading and lagging strands, oligonucleotide orientation, and base compositional bias. These symmetries are all defined by a symmetrically located pair of finite replication origin and terminus. However, experimental evidence of replication origin and terminus are still limited, and majority of the discussions currently rely on in silico predictions using compositional bias of guanine and cytosine formed by the difference in the replication mechanisms between the leading and lagging strands. Such prediction sometimes results in highly asymmetric pairs of replication origin and terminus, and thus a comprehensive study for the cause of compositional symmetry in bacteria is still lacking. On the other hand, a conserved 28bp sequence element targeted by a tyrosine recombinase upon the resolution of malformed chromosome dimers during cell division, named the dif sequence, is recently suggested as a new marker of the replication terminus. The dif sequences, however, are identified only in a limited number of organisms.

Recursive Hidden Markov Modeling (RHMM)

We used recursively a Hidden Markov Model (HMM) supported by HMMER2 for dif sequence prediction. Firstly, to create a profile HMM, we predicted Escherichia genus 28 organisms dif sequences by a fuzzy matching (Perl module String::Approx), the query is the if of E. coli K12 with the parameter is no deletion and insertion, and 8bp mutation. Secondly, we calculated the similarity of between Escherichia XerCD and object organism XerCD amino-acid sequences for getting the clue as to prediction order. Finally, according to that similarity of XerCD, we predicted the dif sequences with recursively, in the case other phylum as well. This novel algorithm named the Recursive Hidden Markov Modeling "RHMM" is discussed for this purpose. Using this novel method, dif sequences are identified in 714 chromosomes harbored by 641 organisms.