Visualizing Gene Regulation and Peptide Docking Statistics

By Daniel B. Carr
George Mason University
 
This talk presents a 3-D rendering approach to visualizing letter-indexed statistics. The letters stand for either m-mers of nucleotides or amino acids.   The goal is to develop overviews that can potentially reveal scientific patterns.   The typical published view of letter-indexed statistics is a table with a few of the top ranked instances.

The domain of all m-mers of given length poses a challenge since it grows exponentially with the length.  The approach presented begins by using self-similar geometric structure to construct unique plotting coordinates for short sequences.   Point color, size, and more general glyphs then encode statistics for short sequences.   For longer sequences, features of paths that connect points represent statistics associated with the corresponding concatenated letters.   Interactive filtering options then help to provide focus on the dominant structures residing within the huge combinatorial space.

The talk presents examples using new software called GLISTEN.   The first example is from early efforts in studying gene regulation.  This delightful fractal shows all 4096 statistics indexed by nucleotide 6-mers.  Showing all 10-mers on a page is not a problem.  More challenging examples derive from looking at databases of peptides found to dock on specific HLA molecules.  The graphics focus on 9-mers and position-based margin tables up through 4 dimensions.  Within the 126 4-D tables of potentially 160,000 cells each, there is a least one surprising pattern.