2006 Summer Internship Reading and Software Materials

Papers covering the general topics of Delaunay tessellation, computational mutagenesis, and machine learning. Pay special attention to the definitions of balanced error rate (BER) and Matthew's correlation coefficient (MCC) given in #4, because these are new concepts that we will be applying. The paper in #3 was recently published and covers one aspect of our work using the residual scores of the mutants with known, experimentally determined activity.

  1. Singh R.K., Tropsha A., Vaisman I.I. Delaunay Tessellation of Proteins: Four Body Nearest Neighbor Propensities of Amino Acid Residues, J. Comput. Biol. 3 (1996) 213-221.
  2. Masso M., Vaisman I. Comprehensive Mutagenesis of HIV-1 Protease: A Computational Geometry Approach, Biochem. Biophys. Res. Comm. 305 (2003) 322-326.
  3. Masso M., Lu Z., Vaisman I. Computational Mutagenesis Studies of Protein Structure-Function Correlations, Proteins 64 (2006) 234-245.
  4. Bao L., Cui Y. Prediction of the Phenotypic Effects of Non-Synonymous Single Nucleotide Polymorphisms Using Structural and Evolutionary Information, Bioinformatics 21 (2005) 2185-2190.
  5. Krishnan V.G., Westhead D.R. A Comparative Study of Machine-Learning Methods to Predict the Effects of Single Nucleotide Polymorphisms on Protein Function, Bioinformatics 19 (2003) 2199-209.
  6. Karchin R., Kelly L., Sali A. Improving Functional Annotation of Non-Synonomous SNPs with Information Theory, PSB 2005.
  7. Fawcett T. ROC graphs: notes and practical considerations for researchers (2004).

Barnase Papers

  1. Paper 1 (Axe, et al. 1) contains all the single point mutants of barnase that we are examining (Fig. 2). The underlined positions in Fig. 2 are those that directly interact with RNA.
  2. Paper 2 (Buckle, et al. 1) is the reference used by Paper 1 to identify the underlined positions. In particular, the positions were taken from Fig. 5.
  3. Paper 3 (Buckle, et al. 2) discusses the interaction of barnase and barstar. Barstar binds to and inhibits barnase, and the residue positions in barnase that are at the barnase-barstar interface are listed in Table 2, column 2 (notice that many of them are also positions that interact with RNA).
  4. Paper 4 (Fersht) is a review article about barnase and folding. This paper has additional good biology background about barnase (say for the introduction of a paper that you would write about our results).
  5. Paper 5 (Axe et al. 2) is a paper exploring multiple mutants in barnase. Although we will not be looking at these mutants for the project, the paper contains a good deal of information for learning more about barnase.

IL-3 Papers

  1. Paper 1 (Olins et al.) contains (almost) all the single point mutants of IL-3 that we are examining (Table II) … see Paper 3 below for the remaining mutants that we are using. Here in Paper 1 you have 3 activity classes (full is > 20% of wt activity, moderate is 5-19% of wt activity, and low is < 5% of wt activity). However, in Table III of this same paper there is a list of 16 mutants (which are in the full class in Table II) whose activity is actually more than 5-fold greater than wt ( > 500% of the wt activity). Also, in Paper 3 (Tables I, II, and III) below, 12 of the additional mutants that we included have biological activity greater than 100%. Some of the mutants in the tables of Paper 3 are also in Paper 1, but we always used the activity class given in Paper 1 if there were any discrepancies. In fact, one of these 12 mutants with activity greater than 100% in Paper 3 is listed only as “moderate” in Paper 1 (E75K), so we just called it moderate. So there are a total of 16+11=27 mutants with activity greater than 100% in our dataset from both papers that we called “full”.
    Before you run the automute.pl program, you should take these 27 mutants out of the full class in our dataset and place them in a separate activity class (call it "increased") which is even higher than full. In your Delaunay directory, you have 2 files called 1jli_activity_mutants (followed by .xls and .txt) which contain all the mutants and their activity (last summer you typed all of it in Excel, and then you saved it also as a .txt file so the automute.pl program can read it). You should download the files to your PC, change these 27 mutants from full to increased in those files, and upload the files back to the Delaunay directory (all using the SSH Secure File Transfer). Alternatively, if you remember how to use the vi editor, you can make the changes directly from the command prompt in the SSH Secure Shell.
  2. Paper 2 (Feng et al.) is the primary citation for the 1JLI structure in PDB. There is alot of good biology background about the protein in this paper, and you can find information about amino acid annotations into groups. For example, on page 528 they identify all the amino acids in IL-3 that form a buried hydrophobic core.
  3. Paper 3 (Bagley et al.) identifies the amino acids in IL-3 that are at the interface between it and the IL-3 receptor by using molecular modeling. This paper also includes a good biological background about IL-3, as well as a summary of previous mutagenesis studies on IL-3 and the importance of certain positions on activity. As mentioned above, look at Tables I, II, and III in this paper for the extra mutants that we included in our dataset.
  4. Paper 4 (Klein et al. 1) also concerns the identification of the receptor binding site in IL-3. Compare the results in this paper to those in Paper 3.
  5. Paper 5 (Klein et al. 2), Paper 6 (Lopez et al.), and Paper 7 (Klein et al. 3) may also contain useful information, but we will not be working with the multiple mutants of IL-3.