Design and Implementation of Bioinformatics Databases
 BINF 8211/BINF 6211
 UNCC
 Spring 2008

Course Description:

This course has two primary goals:
1) Students will learn where and how to obtain information from public biological databases and how to assess the quality of that information.
2) Students will learn to design, implement, populate and query databases to support their own research, which often requires the integration of  public data with new data.

By the end of the course the student should be able to:  understand datamodels, under stand the basics of using a DBMS,  be able to integrate external and novel scientific data,  understand and employ SQL as a research tool.

Course Objectives:

This course is essentially a Research Methods course, in which students learn to model and instantiate, in a relational database management system, a practical database to support their own scientific research; students also learn methods for access, and methods for the retrieval and integration of data from relevant public databases that do not depend simply on prepared forms.

The large public repositories use relational schemas, often supplemented with XML schemas. Using these systems at a low level, (i.e. not screen scraping) often requires an understanding of the data representations and how to perform joins and to use SQL for data retrieval. Biological data is often very complex and many valid representations are possible.  This variation is reflected in diversity of database system designs and representations which can lead to difficulties in integration. By the end of this course students will be introduced to strategies for coping with this challenge. As students are introduced to SQL and other programming tools for creating biological databases to support their own research, they will also be expected to learn how to understand and make effective use of existing public repositories.

Particular topics will include formats and schemas in important bioinformatics databases (Genbank, EMBL, PDB, TAIR, SGD, ArrayExpress, GEO), XML schema and XML exchange methods, using generic database tools to browse and manage databases (Pgadmin), the types of models used in designing the preeminent bioinformatics databases, and how ontologies (such as GO and MAGE) affect database design and queries.

As a part of the course, the student will create, populate, and query a database supporting a research area of biological interest (the focus changes each year and in the past has included non-microarray gene expression data and SNP data).

When and Where:

Cameron Hall 274
Monday 11-12:15
Wednesday 11-12:15
Instructors:

Dr. Jennifer Weller
Office: Cameron Hall 220
Phone: (704) 687-7678
Email: jweller2@uncc.edu
Office Hours: Wednesday 1-3 or by appointment

Dr. D. Andrew Carr
Office:Cameron Hall 295
Email:dcarr10@uncc.edu
Office Hours: By Appointment


Book:
"Database Systems: Design, Implementation and Management, Seventh Edition"
 by Peter Rob and Carlos Coronel
2007, from Course Technology:
Thomson Learning, Boston MA. ISBN 13: 978-1-4188-3593-4.

Course Schedule
Updated 4.16.08
Course Readings

Software Links

This site was designed and maintained by Dr. Andrew Carr. Jan 09, 2008
Top of Page
© 2007 UNC Charlotte Copyright | Privacy Statement
Page Maintained By: Dr. D. Andrew Carr