A UTILITY TO IDENTIFY CORE SETS OF GENES ACROSS BACTERIAL OR VIRAL GENOMES

WARNING!! - Genomes Over 2Mbp May Take A While To Process - Please Be Patient

 
NCBI Accession number for Reference genome:*
   
NCBI Accession number for query genome1:*
NCBI Accession number for query genome2:
NCBI Accession number for query genome3:
NCBI Accession number for query genome4:
 
Please enter blastp threshold score in the box below
Score:
 
Enter your email address in the box below
Email Address:*
 
By Clicking the "Submit Query" you acknowlege that you have read the Important Notes below. Your results will be e-mailed to the address provided above

IMPORTANT NOTES:

CoreGenes 4.0 has been updated to process larger genomes. Upper limit to genome size has not been established, but pairwise comparisions of 6Mbps genomes have been successfully completed after about 40 minutes. Depending on the size of the query genome(s), your session may timeout. For that reason, CoreGenes 4.0 requires you to provide an e-mail to ensure you receive the results even if your browser times out.

In 2016 NCBI phased out the use of GI numbers. CoreGenes 4.0 has been updated to work with the current GenBank format, which provides versioned protein ids, rather than GI numbers. The designator "PI" is used to reference the unique identifier that corresponds to "protein_id" in the GenBank record.

Previous versions of CoreGenes performed strictly hierarchical analyses, which listed only genes in common across ALL genomes requested relative to the reference genome, and the order of the query genomes. Therefore relative order permmutes the results of the CoreGene output table. To address this issue, CoreGenes 4.0 includes a pairwise table of orthologs for each genome queried against the reference accession number. This table is included below the heirarchichal output.

NOTA BENE, any inaccurate or non-GenBank accession numbers will result in the following ERROR message: "The accession number is either not present in NCBI database or the file for this accession number does not have any genes" Please make sure that you enter the exact NCBI-GenBank accession number of full genomes into the Reference and Query boxes.

DISCLAIMER: We provide you with this information and software "AS IS" and we cannot take any responsibility if the program does not run as desired. In no event shall the authors and/or George Mason University or contributors to the software will be liable for any direct, indirect, incidental, special, exemplary, or consequential damages however caused. This software is copyrighted by GMU. Do not modify without permission.

PLEASE Email Us with any issues, comments or suggestions for improvement.

MANY thanks for using CoreGenes! Good Hunting!!!



References:
1. Mahadevan, P., King, J.F. and Seto, D. (2009). CGUG: in silico proteome and genome parsing tool for the determination of "core" and unique genes in the analysis of genomes up to ca. 1.9 Mb. BMC Research Methods 2:168.
2. Mahadevan, P., King, J.F. and Seto, D. (2009). Data mining pathogen genomes using GeneOrder, CoreGenes and CGUG: gene order, synteny, and in silico proteomes. International Journal of Computational Biology and Drug Design 2:100-114.
3. Zafar, N., Mazumder, R. and Seto, D. (2002). CoreGenes: A computational tool for identifying and cataloging "core" genes in a set of small genomes. BMC Bioinformatics 3:12.


Copyright (c) 2003, 2017, George Mason University. All rights reserved.

Core Genes 4.0 runs on Tomcat Servlet/JSP Engine developed by the Apache Software Foundation (http://www.apache.org/) and is compiled under Java 8 update 131. CoreGenes utilizes WU-BLAST (Gish, W. (1996-2003))