BINF630/BINF530/BIOL580/BINF401 Spring 2025.
Homework 1. Due March 27, 2025.

The report should be submitted by email to both instructors as a Word or PDF file with the filename "b630_25_hw1_Your_Name.doc or .pdf". The string "b630_24_hw1_Your_Name" should be also included in the message subject line.

Include all the appropriate references to the tools, data sources, and literature used.

Please review new GMU Academic Standards Code and associated FAQ, including parts related to the use of LLMs and other AI tools.

1. Identify the protein encoded by DNA sequence corresponding to your G-number using two different approaches. Find the entries for this protein in two different databases.

>Last digit of G-number: 0, 1, 2 
ATGCAGGCTCAACAGTACCAGCAGCAGCGTCGAAAATTTGCAGCTGCCTTCTTGGCATTCATTTTCATAC
TGGCAGCTGTGGATACTGCTGAAGCAGGGAAGAAAGAGAAACCAGAAAAAAAAGTGAAGAAGTCTGACTG
TGGAGAATGGCAGTGGAGTGTGTGTGTGCCCACCAGTGGAGACTGTGGGCTGGGCACACGGGAGGGCACT
CGGACTGGAGCTGAGTGCAAGCAAACCATGAAGACCCAGAGATGTAAGATCCCCTGCAACTGGAAGAAGC
AATTTGGCGCGGAGTGCAAATACCAGTTCCAGGCCTGGGGAGAATGTGACCTGAACACAGCCCTGAAGAC
CAGAACTGGAAGTCTGAAGCGAGCCCTGCACAATGCCGAATGCCAGAAGACTGTCACCATCTCCAAGCCC
TGTGGCAAACTGACCAAGCCCAAACCTCAAGCAGAATCTAAGAAGAAGAAAAAGGAAGGCAAGAAACAGG
AGAAGATGCTGGATTAA

>Last digit of G-number: 3, 4, 5, 6
ATGAAAGTCCTGCTTTGTGACCTGCTGCTGCTCAGTCTCTTCTCCAGTGTGTTCAGCAGTTGTCAGAGGG
ACTGTCTCACATGCCAGGAGAAGCTCCACCCAGCCCTGGACAGCTTCGACCTGGAGGTGTGCATCCTCGA
GTGTGAAGAGAAGGTCTTCCCCAGCCCCCTCTGGACTCCATGCACCAAGGTCATGGCCAGGAGCTCTTGG
CAGCTCAGCCCTGCCGCCCCAGAGCATGTGGCGGCTGCTCTCTACCAGCCGAGAGCTTCGGAGATGCAGC
ATCTGCGGCGAATGCCCCGAGTCCGGAGCTTGTTCCAGGAGCAGGAAGAGCCCGAGCCTGGCATGGAGGA
GGCTGGTGAGATGGAGCAGAAGCAGCTGCAGAAGAGATTTGGGGGCTTCACCGGGGCCCGGAAGTCGGCC
AGGAAGTTGGCCAATCAGAAGCGGTTCAGTGAGTTTATGAGGCAATACTTGGTCCTGAGCATGCAGTCCA
GCCAGCGCCGGCGCACCCTGCACCAGAATGGTAATGTGTAG

>Last digit of G-number: 7, 8, 9
ATGGCCTCCGGTGTGGCTGTCTCTGATGGTGTCATCAAGGTGTTCAACGACATGAAGGTGCGTAAGTCTT
CAACGCCAGAGGAGGTGAAGAAGCGCAAGAAGGCGGTGCTCTTCTGCCTGAGTGAGGACAAGAAGAACAT
CATCCTGGAGGAGGGCAAGGAGATCCTGGTGGGCGATGTGGGCCAGACTGTCGACGACCCCTACGCCACC
TTTGTCAAGATGCTGCCAGATAAGGACTGCCGCTATGCCCTCTATGATGCAACCTATGAGACCAAGGAGA
GCAAGAAGGAGGATCTGGTGTTTATCTTCTGGGCCCCCGAGTCTGCGCCCCTTAAGAGCAAAATGATTTA
TGCCAGCTCCAAGGACGCCATCAAGAAGAAGCTGACAGGGATCAAGCATGAATTGCAAGCAAACTGCTAC
GAGGAGGTCAAGGACCGCTGCACCCTGGCAGAGAAGCTGGGGGGCAGTGCCGTCATCTCCCTGGAGGGCA
AGCCTTTGTGA

2. Briefly describe the function of your protein and identify biologically important sites/regions in this protein.

3. Find nine proteins homologous to your protein. The proteins should be from the different species belonging to the following groups: monkeys, cattle, cats, rodents, bats, birds, fish, worms, and yeast. At least five of these groups should be represented in your list of proteins (mark them in your report).

4. Create a table containing several most important characteristics of found proteins.

5. Build multiple sequence alignments of ten sequences identified in Q1 and Q3 using two different MSA algorithms. Generate a tree for each alignment. Compare (qualitatively) the alignments and the trees and interpret the results of these comparisons.

6. FOR BINF630, BINF530 AND BIOL580 STUDENTS ONLY: Find known protein sequence motifs or patterns in your protein. Explain the statistical results of using these motifs to identify relevant proteins in a protein sequence database.