BINF 704 Fall 2009 Colloquium
Instructor - Dr.Jeff Solka (jlsolka@gmail.com)
Meeting Place -
Prince William Bull Run Hall Rm 130
Meeting Time -
Course Webpage
http://binf.gmu.edu/~jsolka/fall09/binf704/Fall_2009BINF_704_colloquium_Syllabus_rev1.html
Course
Description:
This
course will provide an opportunity to learn about ongoing
bioinformatics
research outside of
Prerequisites:
Good standing in the Bioinformatics and Computational Biology Department
Required
Text:
None
Grading:
Grades
will be based on 8 short reports on the speakers, each approximately 2
paragraphs
in length. These ½ page summaries of the talks CANNOT just
be the speaker’s
abstract and are due 1 week after the speaker’s presentation.
Students are
expected to produce these reports for 8 of the planned
presentations.
In
addition a summary report for one of the talks must be created ahead of
time
prior to the speaker’s presentation based on the
student’s analysis of a minimum
of two papers by the speaker. This 3-5 page summary report must be
prepared
ahead of time and should end with five questions they would like to ask
the
speaker to answer about their work. All citations have to be given in
full,
including extra Web sites used. The students are warned to be very
careful with
regards to plagiarism issues. This particular summary report must be
turned in
a week prior to the speaker’s planned presentation. A seminar
can be covered by
both a report and a summary report, since sometimes the speaker does
not cover
all the materials in their papers in the seminar.
Projected Class Schedule
Student Orientation
Title: Comprehensive Full Genomic Sequencing of 2009 Novel H1N1 Viruses by High Throughput “Next-Gen” Sequencing
Background of 2009 Novel H1N1 Pandemic Outbreaks:
The genomes of the last three pandemic
influenza viruses (1918 H1N1, 1957 H2N2 and 1968 H3N2) all originated in whole
or in part from non-human reservoirs, and the HA genes of all of the pandemic
viruses ultimately originated from avian influenza viruses. Novel 2009
influenza A (H1N1) is a new flu virus of swine origin that was first detected
in
The novel H1N1 flu mainly spreads in the same
way that regular "seasonal influenza" spreads, which is through the
air from coughs and sneezes or touching those infected surfaces. It seems that
new cases in the
Abstract:
Since its initial introduction in 2005, the 454 Roche FLX sequencing platform had been utilized for ultra-depth sequencing projects for various microorganisms. The massively parallel pico-liter scale amplifications and pyrosequencing of individual DNA molecules (Margulies et al. 2005) allow scientists to investigate the heterogeneous populations of microbial words that play important role in determining disease outcome and drug resistance. Here, we systematically investigate the potential of ultra-deep pyrosequencing to determine and assemble full genome sequences of 2009 novel H1N1 viruses from worldwide geographic origins. A robust RT-PCR protocol was established to efficiently amplify across the boards of all 8 2009 novel H1N1 RNAs into sufficient cDNA quantities, i.e., greater than 5 ug to be processed and sequenced by the Roche 454 FLX system using MID bar-coding system. Massive DNA sequences, i.e. >1,000,000 reads with mean >200 base pairs in length derived from de novo sequences of each individual cDNA fragments were readily obtainable from each individual Roche 454 FLX sequencing run containing up to 24 bar-coded full genomic influenza A cDNA of difference origins. In addition to general consensus sequences routinely detected by traditional Sanger sequencing method, rare genetic variants, i.e. 1-2% of total viral population could also be detected and confirmed from pyrosequencing that might play important roles in determining/predicting viral virulence or anti-viral drug resistance. Our readiness to handle the next wave of 2009 H1N1 outbreaks could be greatly enhanced by using Roche 454 as a feasible platform to sequence and analyze large number of 2009 novel H1N1 genomes for the imminent large-scale 2009 winter influenza season in north hemisphere.
Methods of high-level data exploration capable of robustness in the face of noise found within microarray data are few and far between. Solutions making use of all original features to derive cluster structure can be misleading while those that rely on a trivial feature selection can miss important characteristics. We present a method adopted from previous work in the field of geography (Guo et al, 2003) relying upon conditional entropy between pairs of dimensions to uncover underlying, native cluster structure within a dataset. Applied to an artificially clustered data set, this method performed well though some sensitivity to multiplicative noise was in evidence. When applied to gene expression data, the method produced a clear representation of the underlying data structure.
Making Rare Diseases More
Complex with Genomics: Eye on Childhood Cancer
The initial
assembly of human genomic sequence led to anticipation for rapid advances in
understanding the genetic basis of human diseases. Efforts are now focused on whole-genome genotyping
and DNA sequencing to characterize relationships between genotype
and phenotype in complex or multigenic human diseases like diabetes and
cancer. A major challenge for these
studies is the ability to distinguish between neutral and functional genetic variants
and to understand how functional mutations contribute to disease. Our
laboratory’s mission is to map and characterize medically important polymorphisms
and mutations in rare human diseases using bioinformatic and laboratory based
methodologies. In one example, we
have examined childhood cancers which are a collection of exceptionally rare
diseases where their rarity severely limits suitable numbers of tumor samples available
for genomic studies. Rhabdomyosarcoma
(RMS) is a childhood cancer originating from skeletal muscle that affects
approximately 250 children each year in the
In the post-genomic era,
multi-faceted research on complex disorders such as autism has generated
diverse types of molecular information related to the disorder. Operationally,
the number of articles
reporting putative candidate loci, as well as high throughput array-based
studies reporting many loci in a single publication is accumulating at a fast
pace. To
address the genetic complexity of ASD, we have developed AutDB
(http://www.mindspec.org/autdb.html), a publicly available web-portal for
on-going collection, manual annotation and visualization of genes linked to the
disorder. We present a disease-driven database model in AutDB where all genes
connected to ASD are collected and classified according to their genetic
variation: candidates identified from genetic association studies, rare single
gene mutations and genes linked to syndromic autism. Gene entries are richly
annotated for their relevance to autism, along with an in-depth view of their
molecular functions. The content of AutDB originates entirely from the
published scientific literature and is organized to optimize its use by the
research community. The main focus of this resource is to provide an
up-to-date, annotated list of ASD candidate genes in the form of reference
dataset for interrogating molecular mechanisms underlying the disorder. Our
model for consolidated knowledge representation in genetically complex
disorders could be replicated to study other such disorders.
No
Classes Columbus Day Break
Oct.
20, 2009
Dr. Françoise Seillier-Moiseiwitsch (Georgetown University Medical Center)
Title: Analysis of 2D gels: global and region-based approaches
Key Words and
Phrases: wavelets, alignment, modeling, 2D gels, images
Two-dimensional polyacrylamide gel electrophoresis is currently one of
the techniques of
choice to separate and display all the proteins expressed in
a tissue. In the
resulting protein maps for groups of patients, we seek to identify proteins
that are differentially expressed. I
will describe a comprehensive analytical approach that deals with
preprocessing, alignment and differential analysis. Preprocessing removes the
bulk of the background noise. It involves smoothing, selecting regions
containing spots and gradient thresholding.
Images are aligned using
cubic-spline transformations. The alignment is formulated as a quadratic
programming problem that is optimized using an interior-point method. In the
global approach, wavelets are utilized to summarize the aligned images, and
statistical tests performed on the wavelet coefficients. In the region-based
approach, the images are segmented using the watershed algorithm and summary
statistics are computed on each region. Statistical tests are applied to these
summary statistics.
Dr. Ki-Bong Kim (
Abstract:
This Talk deals with the
development of a predictive probabilistic model, composite
dependency-reflecting model (CDRM), which
was designed to detect the core promoter regions and the transcription
start sites (TSSs) in vertebrate genomic DNA sequences, an issue of some
importance for genome annotation efforts. The model actually represents a
combination of first-, second-, third- and even much higher order or long-range
dependencies obtained using the expanded maximal dependency decomposition
(EMDD) procedure, which iteratively decomposes data sets into subsets on the
basis of dependency degree and pattern inherent in the target promoter region
to be modeled. In addition, decomposed subsets are modeled by using a
first-order Markov model, allowing the predictive model to reflect dependency
between adjacent positions explicitly. In this way, the CDRM allows for potentially
complex dependencies between positions in the core promoter region. Such
complex dependencies may be closely related to the biological and structural
contexts since promoter elements are present in various combinations separated
by various distance in the sequence. Thus, the CDRM may be appropriate for
recognizing core promoter region and TSSs in vertebrate genomic Contig. To
demonstrate the effectiveness of the predictive model, we tested it using
standardized data and real core promoters, and compared it with some current
representative promoter-finding algorithms. The developed algorithm showed
better accuracy in term of specificity and sensitivity than the
promoter-finding ones used in performance comparison.
Dr. James Diggans Mitre Corporation
Title - DOTS: DNA Order Tracking System
Abstract
The ability to construct gene- and genome-length DNA fragments from scratch is a rapidly developing technology in the field of synthetic biology. Existing efforts to screen DNA synthesis orders rely solely upon direct alignment of ordered sequence to threat agents. Our approach assumes intent to obfuscate on the part of the ordering entity. We hypothesize that any truly valuable screening method will have to collate and analyze orders over time to determine relationships between orders and overall correspondence to threat agent sequence. A system capable of monitoring incoming DNA synthesis requests and identifying hazardous DNA sequences, ordered whole or in parts and spanning multiple vendor companies, time frames and points of origin, can be built around existing techniques utilized for sequence alignment and shotgun genome assembly.
Weifan Zheng, Ph.D., NCCS
Title - "Chemometrics approach to modeling siRNA gene silencing and receptor-based SAR"
Abstract
Chemometrics approach has been widely employed in cheminformatics to study QSAR (quantitative structure activity relationship) of drug molecules. In this talk, I will describe how PLS (partial least square) and SVM (support vector machine) modeling can be used to study the gene silencing potency of siRNAs. A couple of new ways to characterize the structures of siRNA molecules will be introduced that have made possible the quantitative modeling of chemically modified siRNAs. I will also describe the application of PLS and computational geometry in receptor-based QSAR modeling of drug activities.
Please attend student research day