----------------------------------------------------------------------- BIOINFORMATICS COLLOQUIUM College of Science George Mason University ----------------------------------------------------------------------- Composite Dependency-reflecting Model for Core Promoter Recognition in Vertebrate Genomic DNA Sequences Ki-Bong Kim Sangmyung University (Visitng Professor George Mason University) Abstract: This Talk deals with the development of a predictive probabilistic model, composite dependency-reflecting model (CDRM), which was designed to detect the core promoter regions and the transcription start sites (TSSs) in vertebrate genomic DNA sequences, an issue of some importance for genome annotation efforts. The model actually represents a combination of first-, second-, third- and even much higher order or long-range dependencies obtained using the expanded maximal dependency decomposition (EMDD) procedure, which iteratively decomposes data sets into subsets on the basis of dependency degree and pattern inherent in the target promoter region to be modeled. In addition, decomposed subsets are modeled by using a first-order Markov model, allowing the predictive model to reflect dependency between adjacent positions explicitly. In this way, the CDRM allows for potentially complex dependencies between positions in the core promoter region. Such complex dependencies may be closely related to the biological and structural contexts since promoter elements are present in various combinations separated by various distance in the sequence. Thus, the CDRM may be appropriate for recognizing core promoter region and TSSs in vertebrate genomic Contig. To demonstrate the effectiveness of the predictive model, we tested it using standardized data and real core promoters, and compared it with some current representative promoter-finding algorithms. The developed algorithm showed better accuracy in term of specificity and sensitivity than the promoter-finding ones used in performance comparison.