Exciting Final Project

Group1: Ananda Mondal and Guo Yu Lu

Predicting Viral Genome Type using Amino Acid Preference (AAP) Distribution -Study the effect of gene homology

Earlier study [1] shows that mammalian viral genome can be predicted based on amino acid preference distribution. In the present project, the effect of gene homology will be studied to see whether it has any bias or not in predicting the viral genome type.

Group 2:Katie Langley & Andrew Cron

Promoter prediction using multiple algorithms 1. To successfully discover the promoter for the Discoidin Domain Receptor 2 (DDR2) gene in mouse by comparing results from various available programs to one another and to current possible promoter regions experimentally discovered via laboratory research. 2. To compare found promoter sequences in mouse across species (rat, drosophilia, human, frog, chick, xenopus) to determine the conservation of the promoter. 3. To create a web interface to that integrates the various available program outputs into one simplistic promoter discovery output file.

Group3: Michael Bryson and Dylan Kane

Structural motif discovery While sequence motif discovery is an interesting problem with huge potential, a similar problem is that of structural motif discovery. Given a collection of protein structures with a known relationship, is it possible to identify a profile that represents the common areas of structure? While similar in definition to the problem of sequence motif discovery, structural motif discovery has a more involved formulation. Sequences have an intuitive representations as simply strings over a given alphabet, while protein structure has several possible representations

Group4: Haiwei Luo & Shiva

Eukaryotic protein localization prediction We want to build Hidden Markov Model (HMM) for each set of proteins with a specific localization in the cells. The signal and non-signal part of the sequences will be used as hidden states. In order to improve the final results, we also want to incorporate protein-protein interaction information as well as gene ontology database, since proteins have to physically locate at the close spot in order to interact, and we can rely on the localization information of other proteins that interact with our query protein. The localization information of those well-studied interacting proteins can be obtained by searching gene ontology database. Alternatively, if the query proteins have already been well studied, we can directly search against gene ontology database to get the localization information.