Services
DistanceP - 1.0
Predicts protein distance constraints
The distanceP server predicts distance constraints between amino acids in proteins from the amino acid sequence. It is an update of the Sowhat server.
Submission
Introduction to distanceP
This server is an update and improvement of the sowhat server. For an overview consult the abstracts of the following papers (Abstract tab). Detailed background and introduction can be found in the papers:
- Using Sequence Motifs for Enhanced Neural Network Prediction of Protein Distance Constraints. J. Gorodkin, O. Lund, C. A. Andersen, and S. Brunak. In proceedings of the seventh international conference for molecular biology, eds. T. Lengauer, R. Schneider, P. Bork, D. Brutlag, J. Glasgow, H-W. Mewes, and R. Zimmer, pp 95-105, 1999. (http://www.cbs.dtu.dk/services/distanceP/)
- Protein distance constraints predicted by neural networks and probability density functions. O. Lund, K. Frimand, J. Gorodkin, H. Bohr, J. Bohr, J. Hansen, and S. Brunak. Protein Engineering, Volume 10, Issue 11: November 1997. 1241-1248. (http://www.cbs.dtu.dk/services/CPHmodels/)
Download the poster presented at BIOINFORMATICS'99.
A description of distanceP is given on the Manual tab.
For each sequence separation (residues) a threshold has been computed as the average physical distance (Angstrom) with sequence windows chosen from a non-redundant data set of proteins with known three-dimensional structure. Each of these thresholds serves as constraints for the neural network predictions. The networks predict whether the physical distance for two sequence separated residues is below or above the thresholds. The computed thresholds for each sequence separation can be downloaded here. The list of pdb entries that were used for training the neural networks in distanceP can be downloaded here.
Paper to reference when reporting results
- Using Sequence Motifs for Enhanced Neural Network Prediction of Protein Distance Constraints. J. Gorodkin, O. Lund, C. A. Andersen, and S. Brunak. In proceedings of the seventh international conference for molecular biology, eds. T. Lengauer, R. Schneider, P. Bork, D. Brutlag, J. Glasgow, H-W. Mewes, and R. Zimmer, pp 95-105, 1999. (http://www.cbs.dtu.dk/services/distanceP/)

distanceP Manual
The description is given for command line version. However the options are all
available on the www interface.
distanceP
cat datafile | distanceP -dmin 2 -dmax 25 > dist.mp
(You can then use MatrixPlot
to generate a postscript file)
References
Abstract for the paper:
Using Sequence Motifs for Enhanced Neural Network Prediction of Protein Distance Constraints. J. Gorodkin, O. Lund, C. A. Andersen, and S. Brunak. ISMB99. In press.
Correlations between sequence separation (in residues) and distance (in Angstrom) of any pair of amino acids in polypeptide chains are investigated. For each sequence separation we define a distance threshold. For pairs of amino acids where the distance between C-alpha atoms is smaller than the threshold, a characteristic sequence (logo) motif, is found. The motifs change as the sequence separation increases: for small separations they consist of one peak located in between the two residues, then additional peaks at these residues appear, and finally the center peak smears out for very large separations. We also find correlations between the residues in the center of the motif. This and other statistical analyses are used to design neural networks with enhanced performance compared to earlier work. Importantly, the statistical analysis explains why neural networks perform better than simple statistical data-driven approaches such as pair probability density functions. The statistical results also explain characteristics of the network performance for increasing sequence separation. The improvement of the new network design is significant in the sequence separation range 10--30 residues. Finally, we find that the performance curve for increasing sequence separation is directly correlated to the corresponding information content. A WWW server, distanceP, is available at http://services.healthtech.dtu.dk/service.php?distanceP-1.0.
Abstract for the paper:
Protein distance constraints predicted by neural networks and probability density functions. O. Lund, K. Frimand, J. Gorodkin, H. Bohr, J. Bohr, J. Hansen, and S. Brunak. Protein Engineering, Volume 10, Issue 11: November 1997. 1241-1248.
We predict interatomic Calpha distances by two independent data driven methods. The first method uses statistically derived probability distributions of the pairwise distance between two amino acids, whilst the latter method consists of a neural network prediction approach equipped with windows taking the context of the two residues into account. These two methods are used to predict whether distances in independent test sets were above or below given thresholds. We investigate which distance thresholds produce the most information-rich constraints and, in turn, the optimal performance of the two methods. The predictions are based on a data set derived using a new threshold which defines when sequence similarity implies structural similarity. We show that distances in proteins are predicted more accurately by neural networks than by probability density functions. We show that the accuracy of the predictions can be further increased by using sequence profiles. A threading method based on the predicted distances is presented. A homepage with software, predictions and data related to this paper is available at http://services.healthtech.dtu.dk/service.php?CPHmodels-3.2.