DTU Health Tech
Department of Health Technology
This link is for the general contact of the DTU Health Tech institute.
If you need help with the bioinformatics programs, see the "Getting Help" section below the program.
Note! The database itself has recently been updated to v6.00. Some of the information on this
page may not yet have been changed accordingly.
O-GLYCBASE is a revised database of O- and C-glycosylated proteins.
Version 6.00 has 242 glycoprotein entries. The criteria for
inclusion are at least one experimentally verified O- or C-glycosylation
site. The terminal sugar linked to serine or threonine is cited when known.
The database is non-redundant in the sense that it contains no identical
sequences, unless there is conflicting glycosylation data.
Mucins have tandem repeat sequences, which are O-glycosylated.
This result in some redundancy of the O-glycosylation sites.
For prediction purposes we have also included a version of the database
which contains no identical O-glycosylation sites (window=9) called
O-Unique.seq. This data set has been used as the training set of the
NetOGlyc prediction server (Hansen et al. 1995).
Fields: Description > Entry accession number and entry date GLYCPROT: Glycoprotein name, and alternative names SPECIES: Species DB_REF: Crossreferences to PIR, SWISS-PROT, PDB and PROSITE. OGLYCAN: Type of carbohydrate linked to serine or threonine SER: Residue numbers of the O-linked serines THR: Residue numbers of the O-linked threonines ASN: Residue numbers of the N-linked asparagines TRP: Residue numbers of the C-linked tryptophans REFERENCES: References of O-glycan assignment. SEQ: Sequence length, including signal peptide. SEQUENCE in one letter code. ex: STPSTPNASKLPGHSTNGT Assignment ...ST.N.......stn..
Where where uppercase T,S,N denote experimentally verified glycosylation sites of threonine, serine and asparagine, respectively and lower case t,s,n denote predicted sites. Dots (.) denote no glycosylation.
COMMENTS: contain any comments END End of entry
This non-redundant database contains 53 entries only including mammalian mucin type
glycoproteins. It contains 265 O-glycosylation sites.
First line contains sequence length - signalpeptide, database name, number of
experimentally and predicted glycosylation sites eg. ( 17, 0) and glycoprotein name
Second line starts the sequence in one letter uppercase code.
Below is given the assignment with the same notation as in O-GLYCBASE
Ex: 50 A29789 (pir) ( 17, 0) mucin - sheep (fragment) SSVPGESATPQQPGALSESTTQLPGVTGTSAVTGSEPGLPSTGVSGLPGT SS....S.T.......S.STT.....T.TS..T.S.....ST..S....T
The leukosialins are cut into peptides marked (p1-4) as this is the only regions
where the assignment can be performed. Including the rest of the sequences would
introduce false negative sites. (See comments in O-GLYCBASE).
This data set can be used for benchmark studies. It is identical to the data set
used to train the neural networks used in the
NetOglyc prediction server (Hansen et al. 1995).
O-GLYCBASE version 4.0: a revised database of O-glycosylated proteins.
Ramneek Gupta, Hanne Birch, Kristoffer Rapacki, Søren Brunak and Jan E. Hansen
Nucleic Acids Research, 27: 370-372, 1999.