Course Description:
This
course has two primary goals:
1) Students will learn where and how to
obtain information
from public biological databases and how to assess the quality of that
information.
2)
Students will learn to design,
implement, populate and
query databases to support their own research, which often requires the
integration of public data with new
data.
By
the end of the course the student should be able to: understand
datamodels, under stand the basics
of using a DBMS, be able to integrate
external and novel scientific data, understand
and employ SQL as a research tool.
|
Course Objectives:
This course is essentially a
Research Methods course, in which
students learn to model and instantiate, in a relational database
management
system, a practical database to support their own scientific research;
students
also learn methods for access, and methods for the retrieval and
integration of
data from relevant public databases that do not depend simply on
prepared
forms.
The
large public repositories use relational schemas, often supplemented
with XML
schemas. Using these systems at a low level, (i.e. not screen scraping)
often requires
an understanding of the data representations and how to perform joins
and to
use SQL for data retrieval. Biological data is often very complex and
many
valid representations are possible. This
variation is reflected in diversity of database system designs and
representations which can lead to difficulties in integration. By the
end of
this course students will be introduced to strategies for coping with
this
challenge. As students are introduced to SQL and other programming
tools for
creating biological databases to support their own research, they will
also be
expected to learn how to understand and make effective use of existing
public
repositories.
Particular
topics will include formats and schemas in important bioinformatics
databases
(Genbank, EMBL, PDB, TAIR, SGD, ArrayExpress, GEO), XML schema and XML
exchange
methods, using generic database tools to browse and manage databases
(Pgadmin),
the types of models used in designing the preeminent bioinformatics
databases,
and how ontologies (such as GO and MAGE) affect database design and
queries.
As a
part of the course, the student will create, populate, and query a
database
supporting a research area of biological interest (the focus changes
each year
and in the past has included non-microarray gene expression data and
SNP data).
|
When
and Where:
Cameron Hall 274
Monday 11-12:15
Wednesday 11-12:15
|
Instructors:
Dr. Jennifer Weller
Office: Cameron Hall 220
Phone: (704) 687-7678
Email: jweller2@uncc.edu
Office Hours: Wednesday 1-3 or by appointment
Dr. D. Andrew Carr
Office:Cameron Hall 295
Email:dcarr10@uncc.edu
Office Hours: By Appointment
|
Book:
"Database
Systems: Design, Implementation
and Management, Seventh Edition"
by Peter Rob and Carlos Coronel
2007, from Course Technology:
Thomson Learning, Boston
MA. ISBN
13:
978-1-4188-3593-4.
|