Christopher Lockhart

Protein Structure Analysis

Homework 2


The instructions for assignment 2 were to select a human enzyme sequence without known 3-dimensional structure from a public database with homologs in PDB with at least 30% but no more than 70% sequence identity. The 3-dimensional structure of the query sequence will then be predicted using two algorithms of choice and analyzed. A residue substitution of interest will then be performed, and the 3-dimensional structure will be predicted again from the newly mutated sequence. The impact of the residue substitution on the predicted conformations will be analyzed. The purpose of this assignment is to gain insight into structural prediction tools and algorithms and assess in a real example how mutations potentially alter protein conformation.

1) Selecting the query sequence

UniProtKB was searched using NOT database:(type:pdb) AND organism:"Homo sapiens [9606]" AND keyword:enzyme to find good query sequence candidates for analysis. In accordance with the assignment, these sequences were all human enzymes without known PDB structures. These sequences were then manually input into PDB and a BLAST analysis was performed to determine the sequence identities between the query sequences and target sequences in PDB. Any target sequences from structures in PDB with sequence identities less than 30% or greater than 70% were discarded.

In the end, the sequence for methylenetetrahydrofolate reductase was selected. The length of this sequence is 656 amino acids, although when BLAST was performed on the sequence using default BLAST parameters only approximately 279-282 residues were found to have sequence simiilarity with preexisting PDB structures. 12 PDB entries were found to match the query sequence, although there were only 3 unique sequences for these 12 structures. These PDB entries had between 34 to 39% sequence identity with the query sequence. For more information regarding the alignments, please see the full summary from PDB. The complete query protein sequence is provided below.

Query Sequence
    MVNEARGNSSLNPCLEGSASSGSESSKDSSRCSTPGLDPERHERLREKMRRRLESGDKWF
    SLEFFPPRTAEGAVNLISRFDRMAAGGPLYIDVTWHPAGDPGSDKETSSMMIASTAVNYC
    GLETILHMTCCRQRLEEITGHLHKAKQLGLKNIMALRGDPIGDQWEEEEGGFNYAVDLVK
    HIRSEFGDYFDICVAGYPKGHPEAGSFEADLKHLKEKVSAGADFIITQLFFEADTFFRFV
    KACTDMGITCPIVPGIFPIQGYHSLRQLVKLSKLEVPQEIKDVIEPIKDNDAAIRNYGIE
    LAVSLCQELLASGLVPGLHFYTLNREMATTEVLKRLGMWTEDPRRPLPWALSAHPKRREE
    DVRPIFWASRPKSYIYRTQEWDEFPNGRWGNSSSPAFGELKDYYLFYLKSKSPKEELLKM
    WGEELTSEESVFEVFVLYLSGEPNRNGHKVTCLPWNDEPLAAETSLLKEELLRVNRQGIL
    TINSQPNINGKPSSDPIVGWGPSGGYVFQKAYLEFFTSRETAEALLQVLKKYELRVNYHL
    VNVKGENITNAPELQPNAVTWGIFPGREIIQPTVVDPVSFMFWKDEAFALWIERWGKLYE
    EESPSRTIIQYIHDNYFLVNLVDNDFPLDNCLWQVVEDTLELLNRPTQNARETEAP
   

2) Modeling the query sequence

Given that homologous proteins exist for the query sequence in structure databases, the structure of the sequence can be predicted using various structure prediction software. Particularly, SwissModel and Phyre were used. These programs were accessed from the web. Both programs accepted the query sequence as a string of 1-letter amino acid residues and used that sequence to perform modeling. Once modeling was completed, NIH MBI Laboratory for Structural Genomics and Proteomics' Structural Analysis and Verification Server (SAVES) was used to analyze the structures produced for the query sequence. SAVES runs a variety of checks, including ProCheck, to verify the protein structure of an uploaded PDB file.

From the above analyses, it seems as if both models generally had some issues. Procheck, one of the more widely used structure prediction validation softwares, seems to produce more errors for SwissModel than Phyre's model. However, Verify_3D, Errat, and Prove all produced better scores for the SwissModel model than the Phyre model. Given this discrepency, it is unclear which of the two structures is a better representation of the query sequence. Neither model likely is a perfect representation of the query sequence. This may in part be due to the relatively limited sequence identity between the query sequence and the sequenes of known PDB structures. Further analysis will need to be done with other structure prediction software in order to come up with a more concrete answer. Nevertheless, a full 3-dimensional comparison of the two structures can be found in Jmol below. The SwissModel structure is in white, whereas the Phyre structure is in red. From this model, it is easy to see that although most of the backbone structure does agree between the two models, there are significant portions of the models which are not in agreement. As residues 1-656 (the whole length) of the query sequence only has one function as methylenetetrahydrofolate reductase listed in UniProtKB's sequence annotation, particular binding sites of biological interest could not be identified for analysis. Rather, there are a few alpha-helix structures which line up well between the two models that I chose to feature instead. These structures can be highlighted in Jmol using the button entitled "Highlight Structures of Interest". CE Alignment gave the two structres a Z-score of 7.5 and an RMSD value of 2.0 angstroms, which indicates that the structures match up well regardless of any artifical bond or angle parameters created during modeling.

3) Modeling a query sequence mutant

At residues 222 of the query sequence is a common natural polymorphism where alanine is mutated into valine. This mutation is documented to make the protein more thermolabile and decrease its activity by approximately 50%. Interestingly, this mutation decreases the risk for adult acute leukemia. The mutated sequence is provided below. The A222V mutation is highlighted in red.

Mutated Sequence
    MVNEARGNSSLNPCLEGSASSGSESSKDSSRCSTPGLDPERHERLREKMRRRLESGDKWF
    SLEFFPPRTAEGAVNLISRFDRMAAGGPLYIDVTWHPAGDPGSDKETSSMMIASTAVNYC
    GLETILHMTCCRQRLEEITGHLHKAKQLGLKNIMALRGDPIGDQWEEEEGGFNYAVDLVK
    HIRSEFGDYFDICVAGYPKGHPEAGSFEADLKHLKEKVSAGVDFIITQLFFEADTFFRFV
    KACTDMGITCPIVPGIFPIQGYHSLRQLVKLSKLEVPQEIKDVIEPIKDNDAAIRNYGIE
    LAVSLCQELLASGLVPGLHFYTLNREMATTEVLKRLGMWTEDPRRPLPWALSAHPKRREE
    DVRPIFWASRPKSYIYRTQEWDEFPNGRWGNSSSPAFGELKDYYLFYLKSKSPKEELLKM
    WGEELTSEESVFEVFVLYLSGEPNRNGHKVTCLPWNDEPLAAETSLLKEELLRVNRQGIL
    TINSQPNINGKPSSDPIVGWGPSGGYVFQKAYLEFFTSRETAEALLQVLKKYELRVNYHL
    VNVKGENITNAPELQPNAVTWGIFPGREIIQPTVVDPVSFMFWKDEAFALWIERWGKLYE
    EESPSRTIIQYIHDNYFLVNLVDNDFPLDNCLWQVVEDTLELLNRPTQNARETEAP
   

A priori knowledge of the different amino acids structures suggests that valine may not greatly impact the protein structurally, as it is only has two extra methyl groups than alanine. However, regardless of the expected result of the A222V mutation, SwissModel and Phyre can be used to further analyze any structural impact the mutation has. SAVES can be used again to validate these structures.

In general, this comparison continues to illustrate that neither SwissModel or Phyre are perfect at predicting the sequence's 3-dimensional structure. CE Alignment calculated a Z-score of 7.5 and an RMSD value of 2.0 angstroms between these two models. This indicates that the models may have structural errors, they generally similar. As above, a Jmol view of the two structures is provided below. SwissModel is again shown in white, and Phyre is in red. The A222V residue can be highlighted by selecting the "Highlight A222V" button.

4) Comparison of wild-type and mutant models

Below is a comparison of the wild-type sequence models and the mutant models between SwissModel and Phyre. The comparison structures were generated using CE Alignment. Importantly, CE Alignment reported both SwissModel and Phyre's WT/MT superimposed structures as having a RMSD of 0.0 angstroms and a Z-score of 7.9. This confirms that the mutation does not drastically affect the models as was initially predicted. When the mutated residue is highlighted using the "Highlight A222V" button, it is easily discernible that some degree of empty space surrounds the 222 residue. This empty space may allow the addition of the two methyl groups in the A222V substitution to occur with relatively little structrual impact. In support of this, SAVES validation of the mutant sequence model vs the wild-type sequence model did not reveal any significant differences in errors produced. The wild-type models of the query sequence are presented in white, whereas the mutant models are presented in red. From Jmol, one can see easily see that the structures are very superimposable (it is virtually impossible to see each protein separately). The Jmol representation seems to slightly suggest SwissModel produces models for the wild-type and mutant sequences that are perhaps less congruent in structure than the models produced by Phyre, although this observation is qualitative and does not have any quantifiable evidence to support it.

5) Conclusion

Many algorithms and web-accessible programs are currently available to the public to perform protein structure prediction. Although specific algorithms may produce similar results, differences will emerge in the final structures produced. These final structures are also greatly impacted by protein structure data that has been resolved from X-ray crystallography and NMR spectroscopy. Analyzing regions of functional interest or local structures can be used to help validate the structures produced. Highly conserved sequences of proteins should have conserved structure, which helps make modeling possible.

Modeling is also a useful tool for identify the impact sequence mutations have on protein structure. In this example, the A222V mutation did not greatly impact the structure of the protein, although the A222V mutation itself has been documented to have a very specific biological effect. More modeling will definitely need to be performed to better guage the impact this mutation has on the protein structure. The fact that the A222V mutation has been highly reference and has a noted functional effect is further validation that the structures produced during modeling may not be correct. Molecular dynamics and other energetic minimization tools could also be used to further conform the structures produced from modeling and try to better probe their native state conformations. Molecular dyanmics simulations should also greatly help pull the models out of conformations with bad angle, bond, or torsional parameters as these positions should be energetically unfavorable.

Written by Chris Lockhart for Protein Structure Analysis, BINF 731. Last updated 12/01/2011.