----------------------------------------------------------------------- BIOINFORMATICS COLLOQUIUM College of Science George Mason University ----------------------------------------------------------------------- Performing Scientometric Analysis through Document Clustering and Dynamic Graph Visualization Avory Bryant NSWCDD Abstract: Scientometrics is performed by the analysis of the open source scientific literature in an attempt to analyze science. Bibliographic databases provide access to this scientific literature in the form of millions of publications from journals and conference proceedings amongst other resources. Document clustering refers to clustering based on free text content-based features such as a publications title or abstract. These features can be used to represent publications in the vector space model by a term-document matrix which clustering methodologies can be applied to. This presentation focuses on the 2-D graph visualization of these clustering solutions using two techniques. The first technique being a specified graph layout obtained by multi-dimensional scaling and the other a force directed graph layout obtained using distances in the ambient space. Nodes represent clusters while edges represent some relationship between the documents in clusters like overlapping citations or overlapping institution affiliations. Node color or size can also be used to highlight cluster specific features such as the number of documents in a cluster or the average growth rate, by publication year, of the documents belonging to a cluster. Note the focus of this work is not on document clustering or 2-D visualization of high dimensional data but on performing scientometrics analysis at the document cluster level using graph visualization. Using this dynamic graph visualization (node positions being static) scheme we hope to create a system that can be used to take advantage of the feature rich environment provided by the open source scientific literature.