BINF739 Text Data Mining in Bioinformatics

Spring 2009- Jeffrey L. Solka

This course will provide an overview of the application of text data mining to bioinformatics. Topics to be discussed include statistical methods for document encoding, natural language processing methods for document encoding, visualization of document collections, clustering document collections, methods for semi-automated query refinement, and methods for literature-based discovery.

Instructor

Jeff Solka, 540-653-1982 (D), 540-371-3961 (N), 540-809-9799(C), jlsolka@gmail.com

Place and Time

4:30pm -- 7:10pm Wednesdays Ocaquan Building Room 327


Textbook

Sophia Ananiadou and John McNaught (Eds) - Text Mining for Biology and Biomedcine, Artech House, 2006. (requires)

Soumya Raychaudhuri -  Computational Text Analysis for Gunctional Genomics and Bioinformatics, Oxford,  2006 (optional)

Roger Bilisoly, Practical Text Mining With PERL, Wiley, 2008 (optional)

Grading

Grades will be determined based on student presentations on a current paper in the literature along with a student project. Students will
be expected to present therir project results and write a short paper detailing their project results. The grades will be broken down as follows

1/3*student paper presentation grade + 1/3 * student project presentation grade + 1/3 * student project writeup

NOTATIONAL SCHEDULE

 

1/21/09

            Course and Topical Overview

            Read Ananiadou and McNaught Chapters 1 and 2, Raychaudhuri Chapter 1 and           Bilisoly Chapter 1 and Appendix A and B

 

1/28/09

            Corpora and Their Annotation

            Read Ananiadou and McNaught Chapter 8, Raychaudhuri Chapter 2, and Bilisoly         Chapter 2

 

2/4/09

            Resources for Biological Text Data Mining

            Read Ananiadou and McNaught Chapter 3, Raychaudhuri Chapter 3, and Bilisoly         Chapter 3

           

 

2/11/09

            Terminology Management and Abbreviations

            Read Ananiadou and McNaught Chapters 4 and 5, Raychaudhuri Chapter 4, and          Bilisoly Chapter 4

 

           

2/18/09

            Bag of Words Based Approaches and Natural Language Processing

            Read Ananiadou and McNaught Chapter 2, Raychaudhuri Chapter 5, and         Bilisoly Chapter 5

 

 

2/25/09

            Named Entity Recognition and Information Extraction 

            and a Special Topic Lecture on Streaming Text Data Mining - Dr. Elizabeth Hohman

            Read Ananiadou and McNaught Chapters 6 and 7, Raychaudhuri Chapter 6, and          Bilisoly Chapter 6

 

 

3/4/09

            Evaluation of Text Mining in Biology

            Read Ananiadou and McNaught Chapter 9, Raychaudhuri Chapter 7, and         Bilisoly Chapter 7

 

3/11/09

 

            No Classes Spring Break

 

3/18/09

            Clustering

            and a Special Topic Lecture on Text Data Mining the Wikipedia - Dr. David Marchette

            Read Raychaudhuri Chapter 8, and Bilisoly Chapter 8

 

 

3/25/09

            Dimensionality Reduction and Visualization

            and a Special Topic Leture on Iterative Denoising an Iterative Scheme for the Revelation of Cluster Structure on Document Collections 

            Dr. Kendall Giles

            Read Raychaudhuri Chapter 9, and Bilisoly Chapter 9

 

4/1/09

            Literature-based Discovery - I

            and a Special Topic Lecture on Literature-based Discovery and SARS a New Connection

            Read Raychaudhuri Chapter 9

 

4/8/09

            Literature-Based Discovery - II

            Read Raychaudhuri Chapter 10           

 

4/15/09

            Integrating Text Mining With Data Mining

            Read Ananiadou and McNaught Chapter 10

           

 

4/22/09

            Student Paper Presentations

 

 

4/29/09

            Student Project Presentations

 

 

5/6/09

            Final Exam (All Assignments Due)

 

 



Papers of Interest

KRISTIN SAINANI, PHD, Mining Biomedical Literature Using Computers to Extract Knowledge Nuggets 

Lars Juhl Jensen, Jasmin Saric and Peer Bork, Literature mining for the biologist: from information retrieval to biological discovery  

Links of Interest

LIterature Mining for the Biologist Aditional Link