BINF630/BINF530/BIOL580/BINF401 Spring 2026.
Homework 1. Due April 5, 2026.

The report should be submitted by email to both instructors as a Word or PDF file with the filename "b630_26_hw1_Your_Name.doc or .pdf". The string "b630_26_hw1_Your_Name" should be also included in the message subject line.

Include all the appropriate references to the tools, data sources, and literature used.

Please review new GMU Academic Standards Code and associated FAQ, including parts related to the use of LLMs and other AI tools.

1. Identify the protein encoded by DNA sequence corresponding to your G-number using two different approaches. Find the entries for this protein in two different databases.

>Last digit of G-number: 0, 1, 2 
ATGGCAGGACAAGCGTTTAGAAAGTTTCTTCCACTCTTTGACCGAGTATTGGTTGAAAGGAGTGCTGCTG
AAACTGTAACCAAAGGAGGCATTATGCTTCCAGAAAAATCTCAAGGAAAAGTATTGCAAGCAACAGTAGT
CGCTGTTGGATCGGGTTCTAAAGGAAAGGGTGGAGAGATTCAACCAGTTAGCGTGAAAGTTGGAGATAAA
GTTCTTCTCCCAGAATATGGAGGCACCAAAGTAGTTCTAGATGACAAGGATTATTTCCTATTTAGAGATG
GTGACATTCTTGGAAAGTACGTAGACTGA

>Last digit of G-number: 3, 4, 5, 6
ATGGTGAAGCAGATCGAGAGCAAGACTGCTTTTCAGGAAGCCTTGGACGCTGCAGGTGATAAACTTGTAG
TAGTTGACTTCTCAGCCACGTGGTGTGGGCCTTGCAAAATGATCAAGCCTTTCTTTCATTCCCTCTCTGA
AAAGTATTCCAACGTGATATTCCTTGAAGTAGATGTGGATGACTGTCAGGATGTTGCTTCAGAGTGTGAA
GTCAAATGCATGCCAACATTCCAGTTTTTTAAGAAGGGACAAAAGGTGGGTGAATTTTCTGGAGCCAATA
AGGAAAAGCTTGAAGCCACCATTAATGAATTAGTCTAA

>Last digit of G-number: 7, 8, 9
ATGGCTCAAGAGTTTGTGAACTGCAAAATCCAGCCTGGGAAGGTGGTTGTGTTCATCAAGCCCACCTGCC
CGTACTGCAGGAGGGCCCAAGAGATCCTCAGTCAATTGCCCATCAAACAAGGGCTTCTGGAATTTGTCGA
TATCACAGCCACCAACCACACTAACGAGATTCAAGATTATTTGCAACAGCTCACGGGAGCAAGAACGGTG
CCTCGAGTCTTTATTGGTAAAGATTGTATAGGCGGATGCAGTGATCTAGTCTCTTTGCAACAGAGTGGGG
AACTGCTGACGCGGCTAAAGCAGATTGGAGCTCTGCAGTAA

2. Briefly describe your protein's function, its subcellular localization, and any biologically important sites or regions within the protein. Cite both primary and secondary sources to support this information.

3. Find eight proteins homologous to your protein from eight different species, four eukaryotic and four prokaryotic. Analyze quantitative similarity measures between your query protein and these homologs. Explain how this analysis contributes to understanding the protein's biological role and evolutionary history.

4. Create a flat-file protein database containing the nine proteins identified in Task 3. The database should include eight important characteristics for each protein. Justify your choice of characteristics.

5. FOR BINF630, BINF530 AND BIOL580 STUDENTS ONLY: Search for homologs of human transcription factor SOX-9 protein using BLASTp and PSI-BLAST (3 iterations). Compare and discuss the differences in the search results produced by these two algorithms.