The report should be submitted by email to both instructors as a Word or PDF file with the filename "b630_26_hw1_Your_Name.doc or .pdf". The string "b630_26_hw1_Your_Name" should be also included in the message subject line.
Include all the appropriate references to the tools, data sources, and literature used.
Please review new GMU Academic Standards Code and associated FAQ, including parts related to the use of LLMs and other AI tools.
1. Identify the protein encoded by DNA sequence corresponding to your G-number using two different approaches. Find the entries for this protein in two different databases.
>Last digit of G-number: 0, 1, 2 ATGGCAGGACAAGCGTTTAGAAAGTTTCTTCCACTCTTTGACCGAGTATTGGTTGAAAGGAGTGCTGCTG AAACTGTAACCAAAGGAGGCATTATGCTTCCAGAAAAATCTCAAGGAAAAGTATTGCAAGCAACAGTAGT CGCTGTTGGATCGGGTTCTAAAGGAAAGGGTGGAGAGATTCAACCAGTTAGCGTGAAAGTTGGAGATAAA GTTCTTCTCCCAGAATATGGAGGCACCAAAGTAGTTCTAGATGACAAGGATTATTTCCTATTTAGAGATG GTGACATTCTTGGAAAGTACGTAGACTGA >Last digit of G-number: 3, 4, 5, 6 ATGGTGAAGCAGATCGAGAGCAAGACTGCTTTTCAGGAAGCCTTGGACGCTGCAGGTGATAAACTTGTAG TAGTTGACTTCTCAGCCACGTGGTGTGGGCCTTGCAAAATGATCAAGCCTTTCTTTCATTCCCTCTCTGA AAAGTATTCCAACGTGATATTCCTTGAAGTAGATGTGGATGACTGTCAGGATGTTGCTTCAGAGTGTGAA GTCAAATGCATGCCAACATTCCAGTTTTTTAAGAAGGGACAAAAGGTGGGTGAATTTTCTGGAGCCAATA AGGAAAAGCTTGAAGCCACCATTAATGAATTAGTCTAA >Last digit of G-number: 7, 8, 9 ATGGCTCAAGAGTTTGTGAACTGCAAAATCCAGCCTGGGAAGGTGGTTGTGTTCATCAAGCCCACCTGCC CGTACTGCAGGAGGGCCCAAGAGATCCTCAGTCAATTGCCCATCAAACAAGGGCTTCTGGAATTTGTCGA TATCACAGCCACCAACCACACTAACGAGATTCAAGATTATTTGCAACAGCTCACGGGAGCAAGAACGGTG CCTCGAGTCTTTATTGGTAAAGATTGTATAGGCGGATGCAGTGATCTAGTCTCTTTGCAACAGAGTGGGG AACTGCTGACGCGGCTAAAGCAGATTGGAGCTCTGCAGTAA
2. Briefly describe your protein's function, its subcellular localization, and any biologically important sites or regions within the protein. Cite both primary and secondary sources to support this information.
3. Find eight proteins homologous to your protein from eight different species, four eukaryotic and four prokaryotic. Analyze quantitative similarity measures between your query protein and these homologs. Explain how this analysis contributes to understanding the protein's biological role and evolutionary history.
4. Create a flat-file protein database containing the nine proteins identified in Task 3. The database should include eight important characteristics for each protein. Justify your choice of characteristics.
5. FOR BINF630, BINF530 AND BIOL580 STUDENTS ONLY: Search for homologs of human transcription factor SOX-9 protein using BLASTp and PSI-BLAST (3 iterations). Compare and discuss the differences in the search results produced by these two algorithms.