Submission
Confidentiality:
The sequences are kept confidential and will be deleted after processing.
CITATIONS
For publication of results, please cite:
Using Sequence Motifs for Enhanced Neural Network Prediction
of Protein Distance Constraints.
J. Gorodkin, O. Lund, C. A. Andersen, and S. Brunak
In proceedings of the Seventh International Conference for Molecular
Biology,
eds. T. Lengauer, R. Schneider, P. Bork, D. Brutlag, J. Glasgow, H-W. Mewes,
and R. Zimmer: 95-105, 1999.
View the article.
Introduction to distanceP
This server is an update and improvement of the
sowhat server.
For an overview consult the abstracts of the
following papers (Abstract tab). Detailed background and introduction can be found in the papers:
-
Using Sequence Motifs for Enhanced Neural Network Prediction
of Protein Distance Constraints.
J. Gorodkin, O. Lund, C. A. Andersen, and S. Brunak.
In proceedings of the seventh international conference
for molecular biology, eds. T. Lengauer, R. Schneider, P. Bork,
D. Brutlag, J. Glasgow, H-W. Mewes, and R. Zimmer, pp 95-105, 1999.
(http://www.cbs.dtu.dk/services/distanceP/)
-
Protein distance constraints predicted by neural networks
and probability density functions.
O. Lund, K. Frimand, J. Gorodkin, H. Bohr, J. Bohr, J. Hansen, and S. Brunak.
Protein Engineering, Volume 10, Issue 11: November 1997. 1241-1248.
(http://www.cbs.dtu.dk/services/CPHmodels/)
Download the
transparancies
from the talk given at ISMB'99.
Download the
poster
presented at BIOINFORMATICS'99.
A description of distanceP is given on the
Manual tab.
For each sequence separation (residues) a threshold has been computed as
the average physical distance (Angstrom) with sequence windows
chosen from a non-redundant data set
of proteins with known three-dimensional structure. Each of these thresholds
serves as constraints for the neural network predictions. The networks predict
whether the physical distance for two sequence separated residues is below or
above the thresholds. The computed thresholds for each sequence separation
can be downloaded
here.
The list of pdb entries that were used for training the neural networks
in distanceP can be downloaded
here.
Paper to reference when reporting results
-
Using Sequence Motifs for Enhanced Neural Network Prediction
of Protein Distance Constraints.
J. Gorodkin, O. Lund, C. A. Andersen, and S. Brunak.
In proceedings of the seventh international conference
for molecular biology, eds. T. Lengauer, R. Schneider, P. Bork,
D. Brutlag, J. Glasgow, H-W. Mewes, and R. Zimmer, pp 95-105, 1999.
(http://www.cbs.dtu.dk/services/distanceP/)
distanceP Manual
The description is given for command line version. However the options are all
available on the www interface.
distanceP
- NAME
- distanceP - distanceP predicts distance constraint probabilities between
residues in a protein chain. distanceP is an update of the
sowhat
server.
- SYNOPSIS
- distanceP [options] datafile
- DESCRIPTION
- distanceP takes any file containing the data in any format. All characters
that does not match those of the standard 20 letter protein alphabet are
discarded. The
letter "X" is a wild card though.
The output produced by distanceP is generated in the
Ma/trixPlot data
format
(mp
format). distanceP consist of multiple C programs and a C shell script.
- OPTIONS
-
- -dmin <number>
- Number. Specify the minimum sequence separation (residues) for which the
constraint probability should be calculated. Minimum and default value is 2.
- -dmax <number>
- Number. Specify the maximum sequence separation (residues) for which the
constraint probability should be calculated. Maximum and default value is 50.
- -seqletters y|n
- Show sequence letters along the edges of the distance plot.
- -pdbentry <pdbname>
- Generates prediction of given PDB entry, and compares to the known
3D coordinates in the database. The default chain is the first one in
the PDB entry. Any sequence submitted to the program will with this
option be ignored.
- -pdbchain <pdbchain>
- Works only with the option -pdbentry, and is used to specify the
chain identifier in that PDB entry.
- -nprof <number>
- Number of profile sequences generated from the query sequence.
Maximum number of profile sequences is 40.
- EXAMPLE
-
cat datafile | distanceP -dmin 2 -dmax 25 > dist.mp
(You can then use MatrixPlot
to generate a postscript file)
- AUTHOR
- Jan Gorodkin,
gorodkin@cbs.dtu.dk,
April 1999.
- REFERENCES
-
Using Sequence Motifs for Enhanced Neural Network Prediction
of Protein Distance Constraints.
J. Gorodkin, O. Lund, C. A. Andersen, and S. Brunak.
In proceedings of the seventh international conference
for molecular biology, eds. T. Lengauer, R. Schneider, P. Bork,
D. Brutlag, J. Glasgow, H-W. Mewes, and R. Zimmer, pp 95-105, 1999.
(http://www.cbs.dtu.dk/services/distanceP/)
-
Protein distance constraints predicted by neural networks
and probability density functions.
O. Lund, K. Frimand, J. Gorodkin, H. Bohr, J. Bohr, J. Hansen, and S. Brunak.
Protein Engineering, Volume 10, Issue 11: November 1997. 1241-1248.
(http://www.cbs.dtu.dk/services/CPHmodels/)
References
Abstract for the paper:
Using Sequence Motifs for Enhanced Neural Network Prediction
of Protein Distance Constraints.
J. Gorodkin, O. Lund, C. A. Andersen, and S. Brunak.
ISMB99. In press.
Correlations between sequence separation (in residues) and distance (in
Angstrom) of any pair of amino acids in polypeptide chains are investigated.
For each sequence separation we define a distance threshold. For pairs of amino
acids where the distance between C-alpha atoms is smaller than the threshold, a
characteristic sequence (logo) motif, is found. The motifs change as the
sequence separation increases: for small separations they consist of one peak
located in between the two residues, then additional peaks at these residues
appear, and finally the center peak smears out for very large separations. We
also find correlations between the residues in the center of the motif. This
and other statistical analyses are used to design neural networks with enhanced
performance compared to earlier work. Importantly, the statistical analysis
explains why neural networks perform better than simple statistical data-driven
approaches such as pair probability density functions. The statistical results
also explain characteristics of the network performance for increasing sequence
separation. The improvement of the new network design is significant in the
sequence separation range 10--30 residues. Finally, we find that the
performance curve for increasing sequence separation is directly correlated to
the corresponding information content. A WWW server, distanceP, is available at
http://services.healthtech.dtu.dk/service.php?distanceP-1.0.
Abstract for the paper:
Protein distance constraints predicted by neural networks
and probability density functions.
O. Lund, K. Frimand, J. Gorodkin, H. Bohr, J. Bohr, J. Hansen, and S. Brunak.
Protein Engineering, Volume 10, Issue 11: November 1997. 1241-1248.
We predict interatomic Calpha distances by two independent data driven methods.
The first method uses statistically derived probability distributions of the
pairwise distance between two amino acids, whilst the latter method consists of
a neural network prediction approach equipped with windows taking the context
of the two residues into account. These two methods are used to predict whether
distances in independent test sets were above or below given thresholds. We
investigate which distance thresholds produce the most information-rich
constraints and, in turn, the optimal performance of the two methods. The
predictions are based on a data set derived using a new threshold which defines
when sequence similarity implies structural similarity. We show that distances
in proteins are predicted more accurately by neural networks than by
probability density functions. We show that the accuracy of the predictions can
be further increased by using sequence profiles. A threading method based on
the predicted distances is presented. A homepage with software, predictions and
data related to this paper is available at
http://services.healthtech.dtu.dk/service.php?CPHmodels-3.2.