DTU Health Tech

Department of Health Technology

O-GlycBase v6.00


Release notes

Note! The database itself has recently been updated to v6.00. Some of the information on this page may not yet have been changed accordingly.

O-GLYCBASE is a revised database of O- and C-glycosylated proteins.
Version 6.00 has 242 glycoprotein entries. The criteria for inclusion are at least one experimentally verified O- or C-glycosylation site. The terminal sugar linked to serine or threonine is cited when known. The database is non-redundant in the sense that it contains no identical sequences, unless there is conflicting glycosylation data. Mucins have tandem repeat sequences, which are O-glycosylated. This result in some redundancy of the O-glycosylation sites. For prediction purposes we have also included a version of the database which contains no identical O-glycosylation sites (window=9) called O-Unique.seq. This data set has been used as the training set of the NetOGlyc prediction server (Hansen et al. 1995).


Databases


Format of O-GLYCBASE

Fields:		Description
>		Entry accession number and entry date
GLYCPROT:	Glycoprotein name, and alternative names
SPECIES:	Species
DB_REF:         Crossreferences to PIR, SWISS-PROT, PDB and PROSITE.
OGLYCAN:	Type of carbohydrate linked to serine or threonine
SER:		Residue numbers of the O-linked serines 
THR:		Residue numbers of the O-linked threonines
ASN: 		Residue numbers of the N-linked asparagines   
TRP: 		Residue numbers of the C-linked tryptophans 
REFERENCES:     References of O-glycan assignment. 
SEQ:		Sequence length, including signal peptide.
SEQUENCE        in one letter code. ex:	 STPSTPNASKLPGHSTNGT
Assignment                               ...ST.N.......stn..

Where where uppercase T,S,N denote experimentally verified glycosylation sites of threonine, serine and asparagine, respectively and lower case t,s,n denote predicted sites. Dots (.) denote no glycosylation.

COMMENTS: 	contain any comments
END		End of entry

Format of O-Unique.

This non-redundant database contains 53 entries only including mammalian mucin type glycoproteins. It contains 265 O-glycosylation sites.
First line contains sequence length - signalpeptide, database name, number of experimentally and predicted glycosylation sites eg. ( 17, 0) and glycoprotein name
Second line starts the sequence in one letter uppercase code. Below is given the assignment with the same notation as in O-GLYCBASE

Ex: 
   50 A29789  (pir) ( 17,  0)    mucin - sheep (fragment)
SSVPGESATPQQPGALSESTTQLPGVTGTSAVTGSEPGLPSTGVSGLPGT
SS....S.T.......S.STT.....T.TS..T.S.....ST..S....T

The leukosialins are cut into peptides marked (p1-4) as this is the only regions where the assignment can be performed. Including the rest of the sequences would introduce false negative sites. (See comments in O-GLYCBASE). This data set can be used for benchmark studies. It is identical to the data set used to train the neural networks used in the NetOglyc prediction server (Hansen et al. 1995).


PAPER TO REFERENCE WHILE REPORTING RESULTS:

O-GLYCBASE version 4.0: a revised database of O-glycosylated proteins.
Ramneek Gupta, Hanne Birch, Kristoffer Rapacki, Søren Brunak and Jan E. Hansen
Nucleic Acids Research, 27: 370-372, 1999.


Last change: Sep 18, 2001,Ramneek Gupta