NetOGlyc - 4.0

O-GalNAc (mucin type) glycosylation sites in mammalian proteins

The NetOglyc server produces neural network predictions of mucin type GalNAc O-glycosylation sites in mammalian proteins.


Sequence submission: paste the sequence(s) and/or upload a local file

Paste a single sequence or several sequences in FASTA format into the field below:

Submit a file in FASTA format directly from your local disk:

Note: Please allow 2-3 minutes of processing time per input sequence.

Restrictions: At most 50 sequences and 200,000 amino acids per submission; each sequence not more than 4,000 amino acids.

Confidentiality: The sequences are kept confidential and will be deleted after processing.


For publication of results, please cite:

Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology.
Steentoft C, Vakhrushev SY, Joshi HJ, Kong Y, Vester-Christensen MB, Schjoldager KT, Lavrsen K, Dabelsteen S, Pedersen NB, Marcos-Silva L, Gupta R, Bennett EP, Mandel U, Brunak S, Wandall HH, Levery SB, Clausen H.
EMBO J, 32(10):1478-88, May 15, 2013.
(doi: 10.1038/emboj.2013.79. Epub 2013 Apr 12)

PMID: 23584533          

Usage instructions

1. Specify the input sequences

All the input sequences must be in one-letter amino acid code. The allowed alphabet (not case sensitive) is as follows:

A C D E F G H I K L M N P Q R S T V W Y and X (unknown)

Non-alphabetic symbols e.g. digits and white space will be ignored. All the other symbols will be converted to X before processing. The sequences can be input in the following two ways:

  • Paste a single sequence (just the amino acids) or a number of sequences in FASTA format into the upper window of the main server page.

  • Select a FASTA file on your local disk, either by typing the file name into the lower window or by browsing the disk.

Both ways can be employed at the same time: all the specified sequences will be processed. However, there may be not more than 50 sequences in total in one submission. Sequences longer than 4000 amino acids will be ignored.

NetOGlyc works best on complete protein sequence with signal peptide included. It is possible, however, to submit single sites - you should then include 15 residues on both sides of the Ser/Thr you want evaluated.

2. Customize your run

The current version (4.0) does not need customization. Optional graphical illustration of the predictions and a signal peptide check will be added in the next version.

3. Submit the job

Click on the "Submit" button. The status of your job (either 'queued' or 'running') will be displayed and constantly updated until it terminates and the server output appears in the browser window.

At any time during the wait you may enter your e-mail address and simply leave the window. Your job will continue; you will be notified by e-mail when it has terminated. The e-mail message will contain the URL under which the results are stored; they will remain on the server for 24 hours for you to collect them.

Output format


The output conforms to the GFF version 2 format. For each input sequence the server prints a list of potential glycosylation sites, showing their positions in the sequence and the prediction confidence scores. Only the sites with scores higher than 0.5 are predicted as glycosylated and marked with the string "#POSITIVE" in the comment field.

The example below shows the output for human granulocyte-macrophage colony-stimulating factor, The example below shows the output for human granulocyte-macrophage colony-stimulating factor, taken from the UniProt entry CSF2_HUMAN. Currently, 4 sites have been experimentally annotated for this protein, and NetOGlyc predicts that two of these are glycosylated. Additionally, it predicts an additional site is glycosylated at site 108. Occupancy of O-glycosylation sites can vary in-vivo depending on the cells that are expressing the protein. The interactions between sites of initial O-Glycosylation with subsequent sites of glycosylation are yet to be fully elucidated, while our capability to precisely predict the substrate specificity of individual GalNAc-Ts remains limited. The combination of these factors mean that although NetOGlyc will attempt to predict individual sites of glycosylation, a safe interpretation of a positive prediction is that the protein in that local region is more likely to carry O-GalNAc modifications.


##gff-version 2
##source-version NetOGlyc
##date 13-7-15
##Type Protein
#seqname	source	feature	start	end	score	strand	frame	comment
CSF2_HUMAN	netOGlyc-	CARBOHYD	5	5	0.04656	.	.	
CSF2_HUMAN	netOGlyc-	CARBOHYD	11	11	0.0297036	.	.	
CSF2_HUMAN	netOGlyc-	CARBOHYD	15	15	0.296424	.	.	
CSF2_HUMAN	netOGlyc-	CARBOHYD	17	17	0.182807	.	.	
CSF2_HUMAN	netOGlyc-	CARBOHYD	22	22	0.567964	.	.	#POSITIVE
CSF2_HUMAN	netOGlyc-	CARBOHYD	24	24	0.35171	.	.	
CSF2_HUMAN	netOGlyc-	CARBOHYD	26	26	0.558198	.	.	#POSITIVE
CSF2_HUMAN	netOGlyc-	CARBOHYD	27	27	0.182353	.	.	
CSF2_HUMAN	netOGlyc-	CARBOHYD	46	46	0.199041	.	.	
CSF2_HUMAN	netOGlyc-	CARBOHYD	49	49	0.28106	.	.	
CSF2_HUMAN	netOGlyc-	CARBOHYD	56	56	0.0436836	.	.	
CSF2_HUMAN	netOGlyc-	CARBOHYD	61	61	0.0710515	.	.	
CSF2_HUMAN	netOGlyc-	CARBOHYD	70	70	0.107585	.	.	
CSF2_HUMAN	netOGlyc-	CARBOHYD	74	74	0.0440414	.	.	
CSF2_HUMAN	netOGlyc-	CARBOHYD	86	86	0.163292	.	.	
CSF2_HUMAN	netOGlyc-	CARBOHYD	88	88	0.0735873	.	.	
CSF2_HUMAN	netOGlyc-	CARBOHYD	95	95	0.251049	.	.	
CSF2_HUMAN	netOGlyc-	CARBOHYD	99	99	0.335158	.	.	
CSF2_HUMAN	netOGlyc-	CARBOHYD	108	108	0.505881	.	.	#POSITIVE
CSF2_HUMAN	netOGlyc-	CARBOHYD	111	111	0.100125	.	.	
CSF2_HUMAN	netOGlyc-	CARBOHYD	112	112	0.488462	.	.	
CSF2_HUMAN	netOGlyc-	CARBOHYD	115	115	0.314924	.	.	
CSF2_HUMAN	netOGlyc-	CARBOHYD	119	119	0.0818746	.	.	
CSF2_HUMAN	netOGlyc-	CARBOHYD	122	122	0.0480526	.	.	

Article abstract


Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology.
Steentoft C1, Vakhrushev SY1, Joshi HJ1,2, Kong Y1, Vester-Christensen MB1, Schjoldager KT1, Lavrsen K1, Dabelsteen S1, Pedersen NB1, Marcos-Silva L1,3, Gupta R2, Bennett EP1, Mandel U1, Brunak S2,4,5, Wandall HH1, Levery SB1, Clausen H1.
EMBO J, 32(10):1478-88, May 15, 2013.
(doi: 10.1038/emboj.2013.79. Epub 2013 Apr 12)

1Copenhagen Center for Glycomics, Departments of Cellular and Molecular Medicine and School of Dentistry, University of Copenhagen,
  Copenhagen N, Denmark
2Center for Biological Sequence Analysis, Department of Systems Biology Technical University of Denmark, Lyngby, Denmark
3IPATIMUP, Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
4Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Hørsholm, Denmark
5Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark

PMID: 23584533


Glycosylation is the most abundant and diverse posttranslational modification of proteins. While several types of glycosylation can be predicted by the protein sequence context, and substantial knowledge of these glycoproteomes is available, our knowledge of the GalNAc-type O-glycosylation is highly limited. This type of glycosylation is unique in being regulated by 20 polypeptide GalNAc-transferases attaching the initiating GalNAc monosaccharides to Ser and Thr (and likely some Tyr) residues. We have developed a genetic engineering approach using human cell lines to simplify O-glycosylation (SimpleCells) that enables proteome-wide discovery of O-glycan sites using `bottom-up' ETD-based mass spectrometric analysis. We implemented this on 12 human cell lines from different organs, and present a first map of the human O-glycoproteome with almost 3000 glycosites in over 600 O-glycoproteins as well as an improved NetOGlyc4.0 model for prediction of O-glycosylation. The finding of unique subsets of O-glycoproteins in each cell line provides evidence that the O-glycoproteome is differentially regulated and dynamic. The greatly expanded view of the O-glycoproteome should facilitate the exploration of how site-specific O-glycosylation regulates protein function.

Software Downloads


Correspondence:        Technical Support: