The instructions for assignment 2 were to select a human enzyme sequence without known 3-dimensional structure from a public database with homologs in PDB with at least 30% but no more than 70% sequence identity. The 3-dimensional structure of the query sequence will then be predicted using two algorithms of choice and analyzed. A residue substitution of interest will then be performed, and the 3-dimensional structure will be predicted again from the newly mutated sequence. The impact of the residue substitution on the predicted conformations will be analyzed. The purpose of this assignment is to gain insight into structural prediction tools and algorithms and assess in a real example how mutations potentially alter protein conformation.
1) Selecting the query sequence
UniProtKB was searched using NOT database:(type:pdb) AND organism:"Homo sapiens [9606]" AND keyword:enzyme to find good query sequence candidates for analysis. In accordance with the assignment, these sequences were all human enzymes without known PDB structures. These sequences were then manually input into PDB and a BLAST analysis was performed to determine the sequence identities between the query sequences and target sequences in PDB. Any target sequences from structures in PDB with sequence identities less than 30% or greater than 70% were discarded.
In the end, the sequence for methylenetetrahydrofolate reductase was selected. The length of this sequence is 656 amino acids, although when BLAST was performed on the sequence using default BLAST parameters only approximately 279-282 residues were found to have sequence simiilarity with preexisting PDB structures. 12 PDB entries were found to match the query sequence, although there were only 3 unique sequences for these 12 structures. These PDB entries had between 34 to 39% sequence identity with the query sequence. For more information regarding the alignments, please see the full summary from PDB. The complete query protein sequence is provided below.
Query Sequence |
MVNEARGNSSLNPCLEGSASSGSESSKDSSRCSTPGLDPERHERLREKMRRRLESGDKWF SLEFFPPRTAEGAVNLISRFDRMAAGGPLYIDVTWHPAGDPGSDKETSSMMIASTAVNYC GLETILHMTCCRQRLEEITGHLHKAKQLGLKNIMALRGDPIGDQWEEEEGGFNYAVDLVK HIRSEFGDYFDICVAGYPKGHPEAGSFEADLKHLKEKVSAGADFIITQLFFEADTFFRFV KACTDMGITCPIVPGIFPIQGYHSLRQLVKLSKLEVPQEIKDVIEPIKDNDAAIRNYGIE LAVSLCQELLASGLVPGLHFYTLNREMATTEVLKRLGMWTEDPRRPLPWALSAHPKRREE DVRPIFWASRPKSYIYRTQEWDEFPNGRWGNSSSPAFGELKDYYLFYLKSKSPKEELLKM WGEELTSEESVFEVFVLYLSGEPNRNGHKVTCLPWNDEPLAAETSLLKEELLRVNRQGIL TINSQPNINGKPSSDPIVGWGPSGGYVFQKAYLEFFTSRETAEALLQVLKKYELRVNYHL VNVKGENITNAPELQPNAVTWGIFPGREIIQPTVVDPVSFMFWKDEAFALWIERWGKLYE EESPSRTIIQYIHDNYFLVNLVDNDFPLDNCLWQVVEDTLELLNRPTQNARETEAP |
2) Modeling the query sequence
Given that homologous proteins exist for the query sequence in structure databases, the structure of the sequence can be predicted using various structure prediction software. Particularly, SwissModel and Phyre were used. These programs were accessed from the web. Both programs accepted the query sequence as a string of 1-letter amino acid residues and used that sequence to perform modeling. Once modeling was completed, NIH MBI Laboratory for Structural Genomics and Proteomics' Structural Analysis and Verification Server (SAVES) was used to analyze the structures produced for the query sequence. SAVES runs a variety of checks, including ProCheck, to verify the protein structure of an uploaded PDB file.
3) Modeling a query sequence mutant
At residues 222 of the query sequence is a common natural polymorphism where alanine is mutated into valine. This mutation is documented to make the protein more thermolabile and decrease its activity by approximately 50%. Interestingly, this mutation decreases the risk for adult acute leukemia. The mutated sequence is provided below. The A222V mutation is highlighted in red.
Mutated Sequence |
MVNEARGNSSLNPCLEGSASSGSESSKDSSRCSTPGLDPERHERLREKMRRRLESGDKWF SLEFFPPRTAEGAVNLISRFDRMAAGGPLYIDVTWHPAGDPGSDKETSSMMIASTAVNYC GLETILHMTCCRQRLEEITGHLHKAKQLGLKNIMALRGDPIGDQWEEEEGGFNYAVDLVK HIRSEFGDYFDICVAGYPKGHPEAGSFEADLKHLKEKVSAGVDFIITQLFFEADTFFRFV KACTDMGITCPIVPGIFPIQGYHSLRQLVKLSKLEVPQEIKDVIEPIKDNDAAIRNYGIE LAVSLCQELLASGLVPGLHFYTLNREMATTEVLKRLGMWTEDPRRPLPWALSAHPKRREE DVRPIFWASRPKSYIYRTQEWDEFPNGRWGNSSSPAFGELKDYYLFYLKSKSPKEELLKM WGEELTSEESVFEVFVLYLSGEPNRNGHKVTCLPWNDEPLAAETSLLKEELLRVNRQGIL TINSQPNINGKPSSDPIVGWGPSGGYVFQKAYLEFFTSRETAEALLQVLKKYELRVNYHL VNVKGENITNAPELQPNAVTWGIFPGREIIQPTVVDPVSFMFWKDEAFALWIERWGKLYE EESPSRTIIQYIHDNYFLVNLVDNDFPLDNCLWQVVEDTLELLNRPTQNARETEAP |
A priori knowledge of the different amino acids structures suggests that valine may not greatly impact the protein structurally, as it is only has two extra methyl groups than alanine. However, regardless of the expected result of the A222V mutation, SwissModel and Phyre can be used to further analyze any structural impact the mutation has. SAVES can be used again to validate these structures.
In general, this comparison continues to illustrate that neither SwissModel or Phyre are perfect at predicting the sequence's 3-dimensional structure. CE Alignment calculated a Z-score of 7.5 and an RMSD value of 2.0 angstroms between these two models. This indicates that the models may have structural errors, they generally similar. As above, a Jmol view of the two structures is provided below. SwissModel is again shown in white, and Phyre is in red. The A222V residue can be highlighted by selecting the "Highlight A222V" button.
4) Comparison of wild-type and mutant models
Below is a comparison of the wild-type sequence models and the mutant models between SwissModel and Phyre. The comparison structures were generated using CE Alignment. Importantly, CE Alignment reported both SwissModel and Phyre's WT/MT superimposed structures as having a RMSD of 0.0 angstroms and a Z-score of 7.9. This confirms that the mutation does not drastically affect the models as was initially predicted. When the mutated residue is highlighted using the "Highlight A222V" button, it is easily discernible that some degree of empty space surrounds the 222 residue. This empty space may allow the addition of the two methyl groups in the A222V substitution to occur with relatively little structrual impact. In support of this, SAVES validation of the mutant sequence model vs the wild-type sequence model did not reveal any significant differences in errors produced. The wild-type models of the query sequence are presented in white, whereas the mutant models are presented in red. From Jmol, one can see easily see that the structures are very superimposable (it is virtually impossible to see each protein separately). The Jmol representation seems to slightly suggest SwissModel produces models for the wild-type and mutant sequences that are perhaps less congruent in structure than the models produced by Phyre, although this observation is qualitative and does not have any quantifiable evidence to support it.
5) Conclusion
Many algorithms and web-accessible programs are currently available to the public to perform protein structure prediction. Although specific algorithms may produce similar results, differences will emerge in the final structures produced. These final structures are also greatly impacted by protein structure data that has been resolved from X-ray crystallography and NMR spectroscopy. Analyzing regions of functional interest or local structures can be used to help validate the structures produced. Highly conserved sequences of proteins should have conserved structure, which helps make modeling possible.
Modeling is also a useful tool for identify the impact sequence mutations have on protein structure. In this example, the A222V mutation did not greatly impact the structure of the protein, although the A222V mutation itself has been documented to have a very specific biological effect. More modeling will definitely need to be performed to better guage the impact this mutation has on the protein structure. The fact that the A222V mutation has been highly reference and has a noted functional effect is further validation that the structures produced during modeling may not be correct. Molecular dynamics and other energetic minimization tools could also be used to further conform the structures produced from modeling and try to better probe their native state conformations. Molecular dyanmics simulations should also greatly help pull the models out of conformations with bad angle, bond, or torsional parameters as these positions should be energetically unfavorable.