NetAcet - 1.0

N-terminal acetylation in eukaryotic proteins

The NetAcet 1.0 server predicts substrates of N-acetyltransferase A (NatA). The method was trained on yeast data but, as mentioned in the article describing the method, it obtains similar performance values on mammalian substrates acetylated by NatA orthologs.

Submission

Restrictions
At most 2000 sequences and 200,000 amino acids per submission; each sequence not less than 40 and not more than 4,000 amino acids.

Confidentiality:
The sequences are kept confidential and will be deleted after processing.

CITATIONS

For publication of results, please cite:

NetAcet: Prediction of N-terminal acetylation sites.
Lars Kiemer, Jannick Dyrløv Bendtsen and Nikolaj Blom.
Accepted in Bioinformatics, 2004.

Instructions

1a. Specify the input sequences

All the input sequences must be in one-letter amino acid code. The allowed alphabet (not case sensitive) is as follows:

A C D E F G H I K L M N P Q R S T V W Y

Please note that the sequences containing other symbols e.g. X (unknown) will be discarded before processing. The sequences can be input in the following two ways:

1b. Specify the input sequences

All the input sequences must be in one-letter amino acid code. The allowed alphabet (not case sensitive) is as follows:

A C D E F G H I K L M N P Q R S T V W Y and X (unknown)

All the other symbols will be converted to X before processing. The sequences can be input in the following two ways:

Paste a single sequence (just the amino acids) or a number of sequences in FASTA format into the upper window of the main server page.
Select a FASTA file on your local disk, either by typing the file name into the lower window or by browsing the disk.

Both ways can be employed at the same time: all the specified sequences will be processed. However, there may be not more than 10 sequences in total in one submission. The sequences shorter than 15 or longer than 4000 amino acids will be ignored.

2. Submit the job

Click on the "Submit" button. The status of your job (either 'queued' or 'running') will be displayed and constantly updated until it terminates and the server output appears in the browser window.

At any time during the wait you may enter your e-mail address and simply leave the window. Your job will continue; you will be notified by e-mail when it has terminated. The e-mail message will contain the URL under which the results are stored; they will remain on the server for 24 hours for you to collect them.

Output format

DESCRIPTION

The following example has a correct prediction at serine at position 1.

EXAMPLE OUTPUT

>RS0A_YEAST - netAcet 1.0 prediction SLPATFDLTPEDAQLLLAANTHLGARNVQVHQEPYVFNARPDGVHVINVGKTWEKLVLAARIIAAIPNPEDVVAISSRTF GQRAVLKFAAHTGATPIAGRFTPGSFTNYITRSFKEPRLVIVTDPRSDAQAIKEASYVNIPVIALTDLDSPSEFVDVAIP CNNRGKHSIGLIWYLLAREVLRLRGALVDRTQPWSIMPDLYFYRDPEEVEQQVAEEATTEEAGEEEAKEEVTEEQAEATE WAEENADNVEW #Seq-Position-Residue Score Acetylation predicted #------------------------------------------------------ RS0A_YEAST-1-S 0.513 yes //

References

NetAcet: Prediction of N-terminal acetylation sites.
Lars Kiemer, Jannick Dyrløv Bendtsen and Nikolaj Blom.
Bioinformatics, 21(7):1269-70, 2005.

Center for Biological Sequence Analysis, BioCentrum-DTU, The Technical University of Denmark, DK-2800 Lyngby, Denmark

ABSTRACT

We present here a neural network based method for prediction of amino-terminal acetylation - by far the most abundant post-translational modification in eukaryotes. The method was developed on a yeast data set for N-acetyltransferase A (NatA) acetylation, which is the type of N-acetylation for which most examples are known and for which orthologs have been found in several eukaryotes. We obtain correlation coefficients close to 0.7 on yeast data and a sensitivity up to 74% on mammalian data, suggesting that the method is valid for eukaryotic NatA orthologs.

Scientific Background

For a brief description of the NetAcet method please consult the article abstract.

Data sets and statictics

A very important task in machine learning methods is to obtain a clean and accurate dataset for training and testing. Bias and noise in the data set often lead to wrong predictions.

Description of data sets
Dataset extraction
Homology reduction
Sequence logos
Download the training sets

Data set

The section describes the extraction and homology reduction of the data sets used for training of NetAcet 1.0.

Extraction

The data used for NetAcet were extracted from Table 2 in Polevoda et al. and from the Yeast Protein Map. All inconsistensies between the two data sets were removed resulting in a positive set of 61 sequences and 76 negative sequences.

Homology reduction

Sequences were truncated to their N-terminal 40 residues and subsequently homology reduced by visual inspection of a neighbour-joining tree generated from a ClustalW multible alignment. Four sequences were removed from the positive dataset due to close homology to other sequences and following this reduction the two closest homologs were 52% identical although the average homology is much lower.

Below is shown an unrooted phylogenetic tree of the positive data set before homology reduction.

Below is shown an unrooted phylogenetic tree of the negative data set before homology reduction.

Negative training set. Save image to disk to see proper picture.

Sequence logos

To visualise the sequence information content for N-terminal acetylation, we have generated sequence logos for the yeast training set. The total height of the stack of letters at each position shows the amount of sequence conservation at the position, while the relative height of each letter shows the relative abundance of the corresponding amino acid.

Blue:	Positively charged residues
Red:	Negatively charged residues
Green:	Neutral polar residues
Black:	Hydrophobic residues

Download the dataset

The datasets used for the training of NetAcet can be downloaded here:

Positive training set
Negative training set

GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Correspondence: Technical Support: