The report should be submitted by email to both instructors as a Word or PDF file with the filename "b630_24_hw1_Your_Name.doc or .pdf". The string "b630_24_hw1_Your_Name" should be also included in the message subject line.
1. Identify the protein encoded by DNA sequence corresponding to your G-number using two different approaches. Find the entries for this protein in two different databases.
>Last digit of G-number: 0, 1, 2 ATGCAGGCTCAACAGTACCAGCAGCAGCGTCGAAAATTTGCAGCTGCCTTCTTGGCATTCATTTTCATAC TGGCAGCTGTGGATACTGCTGAAGCAGGGAAGAAAGAGAAACCAGAAAAAAAAGTGAAGAAGTCTGACTG TGGAGAATGGCAGTGGAGTGTGTGTGTGCCCACCAGTGGAGACTGTGGGCTGGGCACACGGGAGGGCACT CGGACTGGAGCTGAGTGCAAGCAAACCATGAAGACCCAGAGATGTAAGATCCCCTGCAACTGGAAGAAGC AATTTGGCGCGGAGTGCAAATACCAGTTCCAGGCCTGGGGAGAATGTGACCTGAACACAGCCCTGAAGAC CAGAACTGGAAGTCTGAAGCGAGCCCTGCACAATGCCGAATGCCAGAAGACTGTCACCATCTCCAAGCCC TGTGGCAAACTGACCAAGCCCAAACCTCAAGCAGAATCTAAGAAGAAGAAAAAGGAAGGCAAGAAACAGG AGAAGATGCTGGATTAA >Last digit of G-number: 3, 4, 5, 6 ATGAAAGTCCTGCTTTGTGACCTGCTGCTGCTCAGTCTCTTCTCCAGTGTGTTCAGCAGTTGTCAGAGGG ACTGTCTCACATGCCAGGAGAAGCTCCACCCAGCCCTGGACAGCTTCGACCTGGAGGTGTGCATCCTCGA GTGTGAAGAGAAGGTCTTCCCCAGCCCCCTCTGGACTCCATGCACCAAGGTCATGGCCAGGAGCTCTTGG CAGCTCAGCCCTGCCGCCCCAGAGCATGTGGCGGCTGCTCTCTACCAGCCGAGAGCTTCGGAGATGCAGC ATCTGCGGCGAATGCCCCGAGTCCGGAGCTTGTTCCAGGAGCAGGAAGAGCCCGAGCCTGGCATGGAGGA GGCTGGTGAGATGGAGCAGAAGCAGCTGCAGAAGAGATTTGGGGGCTTCACCGGGGCCCGGAAGTCGGCC AGGAAGTTGGCCAATCAGAAGCGGTTCAGTGAGTTTATGAGGCAATACTTGGTCCTGAGCATGCAGTCCA GCCAGCGCCGGCGCACCCTGCACCAGAATGGTAATGTGTAG >Last digit of G-number: 7, 8, 9 ATGGCCTCCGGTGTGGCTGTCTCTGATGGTGTCATCAAGGTGTTCAACGACATGAAGGTGCGTAAGTCTT CAACGCCAGAGGAGGTGAAGAAGCGCAAGAAGGCGGTGCTCTTCTGCCTGAGTGAGGACAAGAAGAACAT CATCCTGGAGGAGGGCAAGGAGATCCTGGTGGGCGATGTGGGCCAGACTGTCGACGACCCCTACGCCACC TTTGTCAAGATGCTGCCAGATAAGGACTGCCGCTATGCCCTCTATGATGCAACCTATGAGACCAAGGAGA GCAAGAAGGAGGATCTGGTGTTTATCTTCTGGGCCCCCGAGTCTGCGCCCCTTAAGAGCAAAATGATTTA TGCCAGCTCCAAGGACGCCATCAAGAAGAAGCTGACAGGGATCAAGCATGAATTGCAAGCAAACTGCTAC GAGGAGGTCAAGGACCGCTGCACCCTGGCAGAGAAGCTGGGGGGCAGTGCCGTCATCTCCCTGGAGGGCA AGCCTTTGTGA
2. Briefly describe the function of your protein and identify biologically important sites/regions in this protein.
3. Find nine proteins homologous to your protein. The proteins should be from the different species belonging to the following groups: monkeys, cattle, cats, rodents, bats, birds, fish, worms, and yeast. At least five of these groups should be represented in your list of proteins (mark them in your report). Create a table containing several most important characteristics of these proteins.
4. Build multiple sequence alignments of ten sequences identified in Q1 and Q3 using two different MSA algorithms. Generate a tree for each alignment. Compare (qualitatively) the alignments and the trees and interpret the results of these comparisons.
5. Find known protein sequence motifs or patterns in your protein. Explain the statistical results of using these motifs to identify relevant proteins in a protein sequence database.
6. FOR BINF630, BINF530 AND BIOL580 STUDENTS ONLY: Locate one highly conserved 10 residue region and one of the least conserved 10 residue regions in one of the alignments from Q4. Write regular expressions for both regions. Search a protein sequence database using these two regular expressions. Report and interpret the results of these two searches.