2005 Summer Internship Reading Materials
Download, print, and read through items
1-5. Contained in these papers are the various topics that comprise the
project. It is not necessary for the project to understand everything
in these papers, and I will go over the important parts with you in
detail after you arrive. If you have not done so already, familiarize
yourselves with the Protein Data
Bank (PDB) website and the PDB file format.
- Singh R.K., Tropsha A., Vaisman I.I. Delaunay
Tessellation of
Proteins: Four Body Nearest Neighbor Propensities
of Amino Acid Residues, J. Comput. Biol. 3 (1996) 213-221.
- Masso M., Vaisman I. Comprehensive
Mutagenesis of
HIV-1 Protease: A Computational
Geometry Approach, Biochem. Biophys. Res. Comm. 305 (2003) 322-326.
- Krishnan V.G., Westhead D.R. A Comparative Study
of Machine-Learning Methods to Predict the Effects of Single Nucleotide
Polymorphisms on Protein Function, Bioinformatics 19 (2003)
2199-209.
- Karchin R., Kelly L., Sali A. Improving
Functional Annotation of
Non-Synonomous SNPs with Information Theory, PSB 2005.
- Fawcett
T. ROC graphs: notes and
practical considerations for researchers (2004).
- Free
Software: WEKA
suite of machine learning algorithms
We will be using this
software at GMU for the project. If you have the opportunity
beforehand, explore the website briefly then download a version of WEKA
that has a GUI (the Windows or Linux versions) onto your home or school
computers and familiarize yourselves with what it has to offer
(specifically the Explorer and Experimenter windows). Do not download
the older version that is only command-line driven because it is not
user-friendly at all. To assist you with navigating WEKA, read this
general tutorial as well as this
Experimenter window guide. The
supervised learning algorithms that we will be concentrating on are
support vector machines (called SMO in the WEKA functions directory)
and decision trees (called J48 in the WEKA trees directory), and we
will be applying the MultiClassClassifier and CostSensitiveClassifier
metas (found in the WEKA metas directory) to these algorithms.