A nucleic acid secondary structure dot plot is a triangular plot that
depicts base pairs as dots or other symbols. We shall refer to these
symbols as dots. A dot in column *i* and row *j* of a triangular
array,
represents the base pair
*i*.*j*. The advantage of a dot plot is that it can display the base
pairs in more than 1 folding simultaneously. It can be used to compare
a few foldings, or the base pair distribution in many millions of
foldings.

*Mfold* computes a number,
for every possible base
pair, *i*.*j*. This is the minimum free energy of any folding that
contains the *i*.*j* base pair. As above, we let
be the
overall minimum folding free energy, and
a user
selected free energy increment. Clearly

The energy increment is derived from and

This dot plot contains the

The *energy dot plot* gives an overall visual impression of how ``well-defined''
the folding is. A cluttered plot, or cluttered regions, indicate either
structural plasticity (the lack of well-defined structure) or else the
inability of the algorithm to predict a structure with confidence. A
couple of crude measures of ``well-definedness'' have been introduced
in *mfold* . The first is ``P-num''.
is a measure of the
level of promiscuity of *r*_{i} in its pairing with other bases in
foldings within
of .
It is the number of different base
pairs, *i*.*j*, or *k*.*i* that can form in this set of foldings, and is
simply the number of dots in the *i*^{th} row and *i*^{th} column of
the *energy dot plot* . If
is defined to be 1 when
``expression'' is true, and 0 otherwise, then P-num may be defined as:

P-num pertains to individual bases. H-num is ``well-definedness'' measure for a base pair

A helix, already defined as a collection of two or more consecutive base pairs, may be described as a triple

There are 5 files associated with the *energy dot plot* .

`FILE_NAME.PLOT' : This is a text file that contains all the
base pairs on the *energy dot plot* , organized into helices for which
is
constant. The first record is a header, and each subsequent record
describes a single helix. The records are usually sorted by
,
and are often filtered so that short helices or isolated
base pairs (helices of length 1) in suboptimal foldings are
removed. Figure 9 shows a sample plot file.

level length istart jstart energy 1 8 206 242 -972 1 7 319 434 -972 1 7 108 141 -972 1 7 53 185 -972 1 6 334 412 -972 1 6 308 444 -972 1 6 288 472 -972 1 6 247 279 -972 ... 2 4 8 23 -971 2 2 69 78 -971 2 4 1 24 -970 2 2 10 17 -970 2 3 345 400 -967 2 2 297 462 -967 ... |

`FILE_NAME.ANN' : This file contains P-num information for a
particular
.
The *i*^{th} record contains *i* and
.
This file is used for annotating plotted structures.

`FILE_NAME.H-NUM' : This file is the same as
`file_name.plot', except that the ``energy'' column is replaced by an
``h-num'' column. These files are usually sorted by h-num; lowest to
highest, or best determined to worst determined. Often, only helices
in optimal foldings are retained. Figure 10 shows part of a
sorted and filtered h-num file corresponding to the plot file in
Figure 9.

level length istart jstart h-num 1 4 38 194 6.8 1 4 215 232 7.3 1 5 31 201 8.4 1 7 53 185 8.4 1 2 47 189 11.0 1 8 206 242 11.9 1 6 61 176 13.7 1 4 89 163 13.8 1 3 255 271 14.0 1 3 104 145 15.0 1 1 68 79 16.0 1 4 121 131 17.0 1 6 288 472 17.3 ... 1 2 353 389 35.0 1 3 364 377 38.7 1 3 297 459 39.0 |

`FILE_NAME.PS' : This is a PostScript file of the *energy dot plot* .

`FILE_NAME.GIF' : This is an image of the *energy dot plot* in ``gif''
format, suitable for display on web pages.

Michael Zuker Institute for Biomedical Computing Washington University in St. Louis 1998-12-05 |