2005 Summer Internship Reading Materials

Download, print, and read through items 1-5. Contained in these papers are the various topics that comprise the project. It is not necessary for the project to understand everything in these papers, and I will go over the important parts with you in detail after you arrive. If you have not done so already, familiarize yourselves with the Protein Data Bank (PDB) website and the PDB file format.

  1. Singh R.K., Tropsha A., Vaisman I.I. Delaunay Tessellation of Proteins: Four Body Nearest Neighbor Propensities of Amino Acid Residues, J. Comput. Biol. 3 (1996) 213-221.
  2. Masso M., Vaisman I. Comprehensive Mutagenesis of HIV-1 Protease: A Computational Geometry Approach, Biochem. Biophys. Res. Comm. 305 (2003) 322-326.
  3. Krishnan V.G., Westhead D.R. A Comparative Study of Machine-Learning Methods to Predict the Effects of Single Nucleotide Polymorphisms on Protein Function, Bioinformatics 19 (2003) 2199-209.
  4. Karchin R., Kelly L., Sali A. Improving Functional Annotation of Non-Synonomous SNPs with Information Theory, PSB 2005.
  5. Fawcett T. ROC graphs: notes and practical considerations for researchers (2004).
  6. Free Software: WEKA suite of machine learning algorithms
  7. We will be using this software at GMU for the project. If you have the opportunity beforehand, explore the website briefly then download a version of WEKA that has a GUI (the Windows or Linux versions) onto your home or school computers and familiarize yourselves with what it has to offer (specifically the Explorer and Experimenter windows). Do not download the older version that is only command-line driven because it is not user-friendly at all. To assist you with navigating WEKA, read this general tutorial as well as this Experimenter window guide. The supervised learning algorithms that we will be concentrating on are support vector machines (called SMO in the WEKA functions directory) and decision trees (called J48 in the WEKA trees directory), and we will be applying the MultiClassClassifier and CostSensitiveClassifier metas (found in the WEKA metas directory) to these algorithms.