Shengyuan Wang, PhD.
School of Systems Biology
George Mason University
Manassas, VA 20110
1. Improve codon-level knowledge-based potential calculations and analyze silent mutation
- Different from the traditional view that silent mutations do not bring structural or functional changes, recent evidence show that more than 50 human diseases are associated with silent mutations
- To investigate the impact of silent mutations on protein structures, I build a dataset containing 10,220 individual proteins capturing sequences and structural information from BLAST and PDB, serving as the major database for dissertation research
- I optimize the method of the available knowledge-based potential using Delaunay tessellation at the amino acid level, and extend the potential estimation to codon level which is insufficiently explored
- I compare topological variations between disease-causing and non-disease-causing silent mutations from SynMICdb dataset
- I apply a mutagenesis method to study the association between silent mutations and cancer
2. Model fitness of beta-lactamase protein by implementing computational mutagenesis and machine learning techniques
- Beta-lactamase TEM-1 protein provides multiple resistance to beta-lactam antibiotics. Understanding how their fitness is impacted by structures may aid antibiotic resistance study
- The goal for this study is to characterize the relationship between the structure and fitness in single amino acid substitution mutations in beta-lactamase. The hypotheses are (1) protein fitness changes caused by mutations can be predicted using topology-based features; (2) using codon-level potentials will improve protein fitness prediction
- In methods, I created a structure profile containing multiple variables characterizing the structure properties of each single amino acid substitution. For example, amino acid residues surrounding the mutation site were characterized. This structure profile is linked to published protein fitness data using the mutagenesis technique. Different machine learning algorithms are used to predict beta-lactamase fitness changes based on the structure properties around the mutation residue
- In results, with the available experimentally determined fitness data of mutants of beta-lactamase protein, their fitness and structure were linked. A structure-function relationship is established, and structural data for a variant has the capacity to distinguish the corresponding functional impact. Random forest and artificial neural network resulted in highest accuracy by using either residual or local profile. Model predictions were good across different kinds of amino acid mutation, locations and secondary structures
3. Design a novel approach to identify and characterize kinked alpha-helices
- Kinked alpha-helix is a helix with a sharp change in the direction of the helix axis. The kinked site easily allows for confirmation changes and structure variations in proteins. However, there is low agreement in kinked helices identification from available methods, and there is need to develop new methods to identify and characterize kinked helices
- I created a dataset containing 8,826 individual protein chains with 46,604 alpha-helices. Built from existing methods, we create a geometry method called ValgusHel to identify kinked positions and use structure alignment to validate the results. In addition, we develop another topological method to identify kinked helices based on amino acid residue topological properties by using Delaunay tessellation and N-gram algorithm
- Preliminary results show that the topological method may bring better results and higher consistency compared to using geometric method alone. These methods may be useful in future studies to examine structure-function relationships in kinked alpha-helices