----------------------------------------------------------------------- BIOINFORMATICS COLLOQUIUM College of Science George Mason University ----------------------------------------------------------------------- A mitochondrial data set for the domestic dog Marc Allard Abstract: We are in the process of building a dog mtDNA database for the control region sequences for 500 dogs, and a second database of the complete mtDNA genome sequence for 100 dogs. The second database will support the first in providing additional sites that will break up the most common haplotypes observed. Tissues and blood have been collected for a wide diversity of dog samples. This includes both registered animals as well as unregistered mixed bred dogs. DNA has been extracted on over four hundred specimens. Sample preparation, PCR, and sequencing have been optimized for dog DNA with 10 primers covering the control region and 86 primers for the complete mtDNA genome. Sequencing has been conducted largely on an ABI capillary sequencer. Sequences are read in both directions to insure fidelity to the sample and these contigs are assembled using the sequencher software. The hypervariable repeat region is excluded from the alignment and from further analyses. Consensus sequences are compiled into the Nexus format where they are phylogenetically analyzed using winclada and nona software. Additional data editing is conducted for analysis in the population based software arlequin. Preliminary evidence shows considerable variation in both regions, with over 100 variable SNPs observed for the complete mtDNA genome and approximately 40 SNPs in the control region. Together these discriminate most dogs though there are still some animals that share common haplotypes. Our projections are that the control reqion sequencing will be completed in October and the complete mtDNA genome sequencing with be completed by the end of the year.