Statistical analysis of the composition of the Delaunay simplices.
Delaunay tessellation of 103 protein chains in the Jones’ dataset (Jones et al 1992) generates a total of 114,617 simplices. The composition of these simplices was first analyzed in terms of unbiased preferences for four amino acid residues to be clustered together. We analyzed the results of the Delaunay tessellation of these proteins in terms of statistical likelihood of occurrence of four nearest neighbor amino acid residues for all observed quadruplet combinations of 20 natural amino acids. The log-likelihood factor, q, for each quadruplet was calculated from the following equation:

where i,j,k,l are any of the 20 natural amino acid residues, fijkl is the observed normalized frequency of occurrence of a given quadruplet, and pijkl is the randomly expected frequency of occurrence of a given quadruplet. The qijkl shows the likelihood of finding four particular residues in one simplex. The fijkl is calculated by dividing the total number of occurrence of each quadruplet type by the total number of observed quadruplets of all types. The pijkl was calculated from the following equation:

where ai, aj, ak,and al are the observed frequencies of occurrence of individual amino acid residue (i.e. total number of occurrences of each residue type divided by the total number of amino acid residues in the dataset), and C is the permutation factor, defined as

where n is the number of distinct residue types in a quadruplet and ti is the number of amino acids of type i. The factor C accounts for the permutability of replicated residue types.
Theoretically, the maximum number of all possible quadruplets of natural amino acid residues is 8,855 whereas only 8,351 actually occur in the dataset. The log-likelihood factor q is plotted in Figure 6 for all observed quadruplets of natural amino acids. Each quadruplet is thus characterized by a certain value of the q factor which describes the nonrandom bias for the four amino acid residues to be found in the same Delaunay simplex. This value can be interpreted as a four-body potential energy function. This function can be applied both for inverted structure prediction and in simulations of protein folding. This work is currently in progress, and the results will be described elsewhere.

Figure 6. Log-likelihood ratio for the Delaunay simplices.