Contrary to widespread belief, acceptor sites for N-linked
glycosylation on protein sequences, are not well
characterised. The consensus sequence, Asn-Xaa-Ser/Thr
(where Xaa is not Pro), is known to be a prerequisite for
the modification. However, not all of these sequons are
modified and it is thus not discriminatory between
glycosylated and non-glycosylated asparagines. We train
artificial neural networks on the surrounding sequence
context, in an attempt to discriminate between acceptor and
non-acceptor sequons. In a cross-validated performance, the
networks could identify 86% of the glycosylated and 61% of
the non-glycosylated sequons, with an overall accuracy of
76%. The method can be optimised for high specificity
or high sensitivity. Apart from characterising individual
proteins, the prediction method can rapidly
scan complete proteomes.
Glycosylation is an important post-translational
modification, and is known to influence protein folding,
localisation and trafficking, protein solubility,
antigenicity, biological activity and half-life, as well as
cell-cell interactions. We investigate the spread of known
and predicted N-glycosylation sites across functional
categories of the human proteome.
The network will be updated and predictions can alter due to different versions. The network is balanced to give optimal predictions whether or not you submit sequences with homology to the known N-glycosylated proteins. If however the submitted sequence is very close to or identical to the sequences in our training dataset, the accuracy can be expected to be higher than reported above.
We would appreciate any confirmation or the opposite of our predictions. Since an expanded data set with additional N-glycosylated sequences would increase the performance of the network, we are very interested in receiving such material. User feedback is the only way we will learn to enhance the performance of the method. Any other comments regarding the predictions or the data may be sent to: