GlycateBase ver. 1.0
Release notes
Generation of data set
This webpage describes the generation of the data set used to develop the artificial neural network based glycation predictor
NetGlycate-1.0.
NetGlycate-1.0 predicts glycation of ε amino groups of lysines in mammalian proteins and the data set therefore only contain such data. Glycation data was obtained from the literature. The resulting data set consists of 20 proteins with 89 glycated lysines and 126 non-glycated
lysines and can be downloaded below. Only experimentally verified glycation
sites
were used, and all sequences were extracted from the UniProt
database (Bairoch
et al., 2005). It was
decided to mask out lysines in pro- and signal peptides
since these parts
of the proteins are cleaved off during maturation of the proteins and are thus
not
available for glycation. The references from which the glycation data
was taken are shown in Table 1.
To avoid confusing the
prediction algorithm, unvalidated glycation sites were masked out, however,
some of the studies mentioned in Table 1 claim
to have validated
some of the sites in the dataset that were masked out as unvalidated. The
reasons for these sites being masked out as unvalidated are described below.
UniProt ID
| UniProt AC
| Citation
GPX1_BOVIN
| P00435
| Baldwin et al., 1995
| RNP_BOVIN
| P00656
| Watkins et al., 1985
et al., 2003
CFAB_HUMAN
| P00751
| Niemann et al., 1991
| B2MG_HUMAN
| P01884
| Miyata et al., 1994
| HBA_HUMAN
| P01922
| Shapiro et al., 1980
Zhang et al., 2001
| HBB_HUMAN
| P02023
| Shapiro et al., 1980
Zhang et al., 2001
| CRAA_BOVIN
| P02470
| Abraham et al., 1994
| CRAB_BOVIN
| P02510
| Abraham et al., 1994
| CRB2_BOVIN
| P02522
| Zhao et al., 1996
| CRGB_BOVIN
| P02526
| Smith et al., 1996
| APA1_HUMAN
| P02647
| Calvo et al., 1993
| APE_HUMAN
| P02649
| Shuvaev et al., 1999
| ALBU_HUMAN
| P02768
| Garlick & Mazer, 1983
Shaklai et al., 1984
Iberg & Flückiger, 1986
Lapolla et al., 2004
| MIP_BOVIN
| P06624
| Swamy-Mruthinti & Schey, 1997
| PMGE_HUMAN
| P07738
| Fujita et al., 1998
| SODE_HUMAN
| P08294
| Adachi et al., 1992
| TAU_HUMAN
| P10636
| Nacharaju et al., 1997
| ALAT_PIG
| P13191
| Beranek et al., 2001
| CD59_HUMAN
| P13987
| Acosta et al., 2000
| AKA1_RAT
| P51635
| Takahashi et al., 1995
| | |
Table 1. The references from where the glycation data is taken are shown.
For RNP_BOVIN, Cotham et al., 2003, states that all ten lysines are glycated but finds the same four major sites as Watkins et al., 1985. It was therefore decided to use the four glycation sites from Watkins et al., 1985 and mask out the remaining six lysines. Of the remaining six lysines the
predictor
only predicts K-92 to be glycated thus surporting the notion that the remaining six lysines are mainly minor sites. In fact Ames, 2005 finds K-92 to be glycated thus confirming the prediction made by our predictor.
There has been some controversy about the glycation sites for CRGB_BOVIN. In particular K-163. According to the newest article, Smith et al., 1996, it is only the N-terminus and K-2 that gets glycated and not K-163. It was therefore decided to mask out K-163. The
predictor
predicts K-163 to be un-glycated thus agreing with Smith et al., 1996.
The protein APE_HUMAN contains suspiciously few glycation sites compared to other proteins of similar length. Furthermore, since K-93 only corresponds to 20% of the detected Amadori products (Shuvaev et al., 1999), the other lysines in the protein are masked out as unvalidated. The same problem arises for the protein CFAB_HUMAN and it was therefore decided also to mask out the non-glycated lysines in this protein. For both APE_HUMAN and CFAB_HUMAN the
predictor
predicts several glycation sites among the masked out lysines thus suggesting that there is more than one glycation site in each of these two proteins.
The training of neural networks was done using three-fold cross-validation. The division into cross validation groups was made on the site level meaning that each
sequence in the cross-validation groups only contained one site.
The other sites were masked out and the sequence is then repeated one
time for each site.
The positive and negative sites were extracted as a window of 21 amino acid residues and a phylogenetic tree was constructed. The tree was then
inspected visually and the related sites were placed in the same
cross validation group. This was done in order to prevent the situation where the network had learned the sites in the test set before-hand from
the learning set. This situation would occur if related sites were
placed in the test and learning set and could lead to an
overestimation of the performance of the network if the related
sites belonged to the same category (glycated or non-glycated
lysine). If the related sites belonged to different categories it
could give problems with learning to classify the sites correctly.
The remaining positive and negative sites were then added randomly
to the three cross validation groups in such a way that the cross-validation groups contained almost the same number of positive and
negative sites (see Table 2) and that all sites
in the cross-validation groups were placed in random order.
Group
| Positive
| Negative
1
| 29
| 43
| 2
| 30
| 45
| 3
| 30
| 38
| |
Table 2. Number of positive and negative sites in each cross-validation group.
Both in vitro and in vivo data were used to make
the data set as large as possible. It was, however, decided to only
include in vitro data that were obtained at conditions
that resembles physiological conditions. Note that the glycated proteins
used in this study are of mammalian origin.
Data set
G: positive site
K: negative site
S: signal peptide (not used for training)
P: propeptide (not used for training)
U: unvalidated site (not used for training)
-: non-lysine residue (not used for training)
For the complete dataset click
here.
References
- Abraham et al., 1994
-
Abraham,E.C., Cherian,M. and Smith,J.B. (1994).
Site selectivity in the glycation of alpha A- and alpha
B-crystallins by glucose.
Biochem Biophys Res Commun, 201 , 1451-1456.
- Acosta et al., 2000
-
Acosta,J., Hettinga,J., Flückiger,R., Krumrei,N., Goldfine,A., Angarita,L.
and Halperin,J. (2000).
Molecular basis for a link between complement and the vascular
complications of diabetes.
Proc Natl Acad Sci U S A, 97 , 5450-5455.
- Adachi et al., 1992
-
Adachi,T., Ohta,H., Hayashi,K., Hirano,K. and Marklund,S.L. (1992).
The site of nonenzymic glycation of human extracellular-superoxide
dismutase in vitro.
Free Radic Biol Med, 13 , 205-210.
- Ames, 2005
-
Ames J.M. (2005).
Application of semiquantitative proteomics techniques to the maillard reaction.
Ann N Y Acad Sci, 1043 , 225-35.
- Bairoch et al., 2005
-
Bairoch,A., Apweiler,R., Wu,C.H., Barker,W.C., Boeckmann,B., Ferro,S.,
Gasteiger,E., Huang,H., Lopez,R., Magrane,M., Martin,M.J., Natale,D.A.,
O'Donovan,C., Redaschi,N. and Yeh,L.S.L. (2005).
The Universal Protein Resource (UniProt).
Nucleic Acids Res, 33 , 154-159.
- Baldwin et al., 1995
-
Baldwin,J.S., Lee,L., Leung,T.K., Muruganandam,A. and Mutus,B. (1995).
Identification of the site of non-enzymatic glycation of glutathione
peroxidase: rationalization of the glycation-related catalytic alterations on
the basis of three-dimensional protein structure.
Biochim Biophys Acta, 1247 , 60-64.
- Beranek et al., 2001
-
Beranek,M., Drsata,J. and Palicka,V. (2001).
Inhibitory effect of glycation on catalytic activity of alanine
aminotransferase.
Mol Cell Biochem, 218 , 35-39.
- Calvo et al., 1993
-
Calvo,C., Ulloa,N., Campos,M., Verdugo,C. and Ayrault-Jarrier,M. (1993).
The preferential site of non-enzymatic glycation of human
apolipoprotein A-I in vivo.
Clin Chim Acta, 217 , 193-198.
- Cotham et al., 2003
-
Cotham,W.E., Hinton,D.J.S., Metz,T.O., Brock,J.W.C., Thorpe,S.R., Baynes,J.W.
and Ames,J.M. (2003).
Mass spectrometric analysis of glucose-modified ribonuclease.
Biochem Soc Trans, 31 , 1426-1427.
- Fujita et al., 1998
-
Fujita,T., Suzuki,K., Tada,T., Yoshihara,Y., Hamaoka,R., Uchida,K., Matuo,Y.,
Sasaki,T., Hanafusa,T. and Taniguchi,N. (1998).
Human erythrocyte bisphosphoglycerate mutase: inactivation by
glycation in vivo and in vitro.
J Biochem (Tokyo), 124 , 1237-1244.
- Garlick & Mazer, 1983
-
Garlick,R.L. and Mazer,J.S. (1983).
The principal site of nonenzymatic glycosylation of human serum
albumin in vivo.
J Biol Chem, 258 , 6142-6146.
- Iberg & Flückiger, 1986
-
Iberg,N. and Flückiger,R. (1986).
Nonenzymatic glycosylation of albumin in vivo. Identification of
multiple glycosylated sites.
J Biol Chem, 261 , 13542-13545.
- Lapolla et al., 2004
-
Lapolla,A., Fedele,D., Reitano,R., Arico,N.C., Seraglia,R., Traldi,P.,
Marotta,E. and Tonani,R. (2004).
Enzymatic digestion and mass spectrometry in the study of advanced
glycation end products/peptides.
J Am Soc Mass Spectrom, 15 , 496-509.
- Miyata et al., 1994
-
Miyata,T., Inagi,R., Wada,Y., Ueda,Y., Iida,Y., Takahashi,M., Taniguchi,N. and
Maeda,K. (1994).
Glycation of human beta 2-microglobulin in patients with
hemodialysis-associated amyloidosis: identification of the glycated sites.
Biochemistry, 33 , 12215-12221.
- Nacharaju et al., 1997
-
Nacharaju,P., Ko,L. and Yen,S.H. (1997).
Characterization of in vitro glycation sites of tau.
J Neurochem, 69 , 1709-1719.
- Niemann et al., 1991
-
Niemann,M.A., Bhown,A.S. and Miller,E.J. (1991).
The principal site of glycation of human complement factor B.
Biochem J, 274 ( Pt 2), 473-480.
- Shaklai et al., 1984
-
Shaklai,N., Garlick,R.L. and Bunn,H.F. (1984).
Nonenzymatic glycosylation of human serum albumin alters its
conformation and function.
J Biol Chem, 259 , 3812-3817.
- Shapiro et al., 1980
-
Shapiro,R., McManus,M.J., Zalut,C. and Bunn,H.F. (1980).
Sites of nonenzymatic glycosylation of human hemoglobin A.
J Biol Chem, 255 , 3120-3127.
- Shuvaev et al., 1999
-
Shuvaev,V.V., Fujii,J., Kawasaki,Y., Itoh,H., Hamaoka,R., Barbier,A.,
Ziegler,O., Siest,G. and Taniguchi,N. (1999).
Glycation of apolipoprotein E impairs its binding to heparin:
identification of the major glycation site.
Biochim Biophys Acta, 1454 , 296-308.
- Smith et al., 1996
-
Smith,J.B., Hanson,S.R., Cerny,R.L., Zhao,H.R. and Abraham,E.C. (1996).
Identification of the glycation site of lens gamma B-crystallin by
fast atom bombardment tandem mass spectrometry.
Anal Biochem, 243 , 186-189.
- Swamy-Mruthinti & Schey, 1997
-
Swamy-Mruthinti,S. and Schey,K.L. (1997).
Mass spectroscopic identification of in vitro glycated sites of
MIP.
Curr Eye Res, 16 , 936-941.
- Takahashi et al., 1995
-
Takahashi,M., Lu,Y.B., Myint,T., Fujii,J., Wada,Y. and Taniguchi,N. (1995).
In vivo glycation of aldehyde reductase, a major 3-deoxyglucosone
reducing enzyme: identification of glycation sites.
Biochemistry, 34 , 1433-1438.
- Watkins et al., 1985
-
Watkins,N.G., Thorpe,S.R. and Baynes,J.W. (1985).
Glycation of amino groups in protein. Studies on the specificity of
modification of RNase by glucose.
J Biol Chem, 260 , 10629-10636.
- Zhang et al., 2001
-
Zhang,X., Medzihradszky,K.F., Cunningham,J., Lee,P.D., Rognerud,C.L., Ou,C.N.,
Harmatz,P. and Witkowska,H.E. (2001).
Characterization of glycated hemoglobin in diabetic patients:
usefulness of electrospray mass spectrometry in monitoring the extent and
distribution of glycation.
J Chromatogr B Biomed Sci Appl, 759 , 1-15.
- Zhao et al., 1996
-
Zhao,H.R., Smith,J.B., Jiang,X.Y. and Abraham,E.C. (1996).
Sites of glycation of beta B2-crystallin by glucose and fructose.
Biochem Biophys Res Commun, 229 , 128-133.
NEW DATA, COMMENTS AND SUGGESTIONS
New data, comments and suggestions may be sent to
Morten Bo Johansen
E-mail address:mbj@cbs.dtu.dk
CITATIONS
If the use of this database contributes significantly to your results,
please cite:
Analysis and prediction of mammalian protein glycation.
Morten Bo Johansen, Lars Kiemer and Søren Brunak
Glycobiology, 16:844-853, 2006
PMID: 16762979
doi: 10.1093/glycob/cwl009