DTU Health Tech

Department of Health Technology

NetCGlyc - 1.0

C-mannosylation sites in mammalian proteins

The NetCGlyc 1.0 produces neural network predictions of C-mannosylation sites in mammalian proteins.

Submission


Sequence submission: paste the sequence(s) and/or upload a local file

Paste a single sequence or several sequences in FASTA format into the field below:

Submit a file in FASTA format directly from your local disk:

Output format     GFF     short


Restrictions
At most 100 sequences per submission; each sequence not less than 15 and not more than 4000 amino acids, no more than 200,000 amino acids in total.
Confidentiality
The sequences are kept confidential and will be deleted after processing.


CITATIONS

For publication of results, please cite:

NetCGlyc 1.0: Prediction of mammalian C-mannosylation sites.
Karin Julenius
Glycobiology, 17:868-876, 2007.

View the full text

Instructions


1. Specify the input sequences

All the input sequences must be in one-letter amino acid code. The allowed alphabet (not case sensitive) is as follows:

A C D E F G H I K L M N P Q R S T V W Y and X (unknown)

All the other symbols will be converted to X before processing. The sequences can be input in the following two ways:

  • Paste a single sequence (just the amino acids) or a number of sequences in FASTA format into the upper window of the main server page.

  • Select a FASTA file on your local disk, either by typing the file name into the lower window or by browsing the disk.

Both ways can be employed at the same time: all the specified sequences will be processed. However, there may be not more than 100 sequences in one submission. Sequences shorter than 15 or longer than 4000 amino acids will be ignored.

2. Customize your run

By default the output is in GFF; click on 'short' for a simpler output (see the Output format section for descriptions and examples).

3. Submit the job

Click on the "Submit" button. The status of your job (either 'queued' or 'running') will be displayed and constantly updated until it terminates and the server output appears in the browser window.

At any time during the wait you may enter your e-mail address and simply leave the window. Your job will continue; you will be notified by e-mail when it has terminated. The e-mail message will contain the URL under which the results are stored; they will remain on the server for 24 hours for you to collect them.

Output format


DESCRIPTION

The output conforms to the GFF format. For each tryptophan residue in the input sequences the server prints a line in the form:
  • sequence name
  • prediction source ("netCglyc" and version)
  • predicted feature (C-mannosylation)
  • residue number
  • prediction score (a number in 0.0-1.0)
  • non-applicable (GFF specific)
  • answer: "W" if the score is greater than 0.5, else "." (dot).
Only the residues with scores higher than 0.5, marked with "W" are predicted as C-mannosylated.

The example below shows the NetCGlyc 1.0 output for ADAMTS-like protein 5 precursor (ADAMTSL-5), taken from the UniProt entry ATL5_HUMAN. One of the 12 tryptophan residues in the sequence (#41) is predicted as C-mannosylated, two more (#38 and #44) have scores close to the threshold.


EXAMPLE OUTPUT


##gff-version 2
##source-version netCglyc-1.0b
##date 2007-03-14
##Type Protein
# seqname                source         feature  start    end   score +/-  ?
# ----------------------------------------------------------------------------
Q6ZMM2_ATL5_HUMAN        netCglyc-1.0b  C-manno     21     21   0.269  .   .
Q6ZMM2_ATL5_HUMAN        netCglyc-1.0b  C-manno     38     38   0.459  .   .
Q6ZMM2_ATL5_HUMAN        netCglyc-1.0b  C-manno     41     41   0.639  .   W
Q6ZMM2_ATL5_HUMAN        netCglyc-1.0b  C-manno     44     44   0.484  .   .
Q6ZMM2_ATL5_HUMAN        netCglyc-1.0b  C-manno     72     72   0.221  .   .
Q6ZMM2_ATL5_HUMAN        netCglyc-1.0b  C-manno    115    115   0.285  .   .
Q6ZMM2_ATL5_HUMAN        netCglyc-1.0b  C-manno    207    207   0.228  .   .
Q6ZMM2_ATL5_HUMAN        netCglyc-1.0b  C-manno    244    244   0.246  .   .
Q6ZMM2_ATL5_HUMAN        netCglyc-1.0b  C-manno    299    299   0.160  .   .
Q6ZMM2_ATL5_HUMAN        netCglyc-1.0b  C-manno    317    317   0.203  .   .
Q6ZMM2_ATL5_HUMAN        netCglyc-1.0b  C-manno    410    410   0.243  .   .
Q6ZMM2_ATL5_HUMAN        netCglyc-1.0b  C-manno    454    454   0.227  .   .
# ----------------------------------------------------------------------------

References



NetCGlyc 1.0: Prediction of mammalian C-mannosylation sites.
Karin Julenius
Glycobiology, 17:868-876, 2007.

Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE-171 77 Stockholm, Sweden

Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-106 91 Stockholm, Sweden


ABSTRACT

C-mannosylation is the attachment of an alpha-mannopyranose to a tryptophan via a C-C link. The sequence WXXW, in which the first Trp becomes mannosylated has been suggested as a consensus rule for the modification, but only 2/3 of known sites follow this rule. We have gathered a data set of 69 experimentally verified C-mannosylation sites from literature. We analyzed these for sequence context and found that apart from Trp in position +3, Cys is accepted in the same position. We also find a clear preference in position +1, where a small and/or polar residue (Ser, Ala, Gly, Thr) is preferred and a Phe or Leu discriminated against. The Protein Data Bank was searched for structural information and five structures of C-mannosylated proteins were obtained. We showed that modified tryptophan residues are at least partly solvent-exposed. A method predicting the location of C-mannosylation sites in proteins was developed using a neural network approach. The best overall network used as input sequence information in a 21-residue window plus information on presence/absence of WXXW motif. NetCGlyc 1.0 correctly predicts 93% of both positive and negative C-mannosylation sites. This is a significant improvement over the WXXW consensus motif itself, which only identifies 67% of positive sites. NetCGlyc 1.0 is available at http://www.cbs.dtu.dk/services/NetCGlyc/. Using NetCGlyc 1.0, we scanned the human genome and found 2573 exported or transmembrane transcripts with at least one predicted C-mannosylation site.

Software Downloads




GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Correspondence: Technical Support: