DTU Health Tech

Department of Health Technology

We recently made large changes to the webserver infrastructure, so you might experience errors. Please report issues to health-master@dtu.dk

DeepLoc - 1.0

Prediction of eukaryotic protein subcellular localization using deep learning

DeepLoc-1.0 predicts the subcellular localization of eukaryotic proteins. It can differentiate between 10 different localizations: Nucleus, Cytoplasm, Extracellular, Mitochondrion, Cell membrane, Endoplasmic reticulum, Chloroplast, Golgi apparatus, Lysosome/Vacuole and Peroxisome.


NOTE: This is not the newest version of DeepLoc. To use the current version, please go to DeepLoc 2.0!

Submission


Submit data

Paste or upload protein sequence(s) as fasta format to predict the subcellular localization. The prediction can take a few minutes per sequence.

Protein sequences should be not less than 10 and not more than 6000 amino acids.

For example proteins Click here
Format directly from your local disk:



Profiles (accurate, 50 sequences maximum)
BLOSUM62 (fast, 500 sequences maximum)

Instructions/Help


The DeepLoc-1.0 server predicts the subcellular localization of eukaryotics proteins using Neural Networks algorithm trained on Uniprot proteins with experimental evidence of subcellular localization. It only uses the sequence information to perform the prediction.

The DeepLoc-1.0 server requires protein sequence(s) in fasta format, and can not handle nucleic acid sequences.

Paste protein sequence(s) in fasta format or upload a fasta file.

After the server successfully finishes the job, a summary page shows up. If an error happens during the prediction a log will appear specifying the error. Use the navigation bar to flip through the various output pages.

Training and testing data sets


The dataset used to train and test the DeepLoc-1.0 server is available here deeploc_dataset

It is a fasta file composed by header and sequence. The header is composed by the accession number from Uniprot, the annotated subcellular localization and possibly a description field indicating if the protein was part of the test set. The subcellular localization includes an additional label, where S indicates soluble, M membrane and U unknown.

>Q3E7A9 Mitochondrion-S
MSNPCQKEACAIQDCLLSHQYDDAKCAKVIDQLYICCSKFYNDNGKDSRSPCCPLPSLLELKMKQRKLTPGDS
>Q9SMX3 Mitochondrion-M test
MVKGPGLYTEIGKKARDLLYRDYQGDQKFSVTTYSSTGVAITTTGTNKGSLFLGDVATQVKNNNFTADVKVST
DSSLLTTLTFDEPAPGLKVIVQAKLPDHKSGKAEVQYFHDYAGISTSVGFTATPIVNFSGVVGTNGLSLGTDV
AYNTESGNFKHFNAGFNFTKDDLTASLILNDKGEKLNASYYQIVSPSTVVGAEISHNFTTKENAITVGTQHAL>
DPLTTVKARVNNAGVANALIQHEWRPKSFFTVSGEVDSKAIDKSAKVGIALALKP

References


Please cite:

Jose Juan Almagro Armenteros, Casper Kaae Sønderby, Søren Kaae Sønderby, Henrik Nielsen, Ole Winther; DeepLoc: prediction of protein subcellular localization using deep learning, Bioinformatics, btx431

Abstract

Motivation: The prediction of eukaryotic protein subcellular localization is a well-studied topic in bioinformatics due to its relevance in proteomics research. Many machine learning methods have been successfully applied in this task, but in most of them, predictions rely on annotation of homologues from knowledge databases. For novel proteins where no annotated homologues exist, and for predicting the effects of sequence variants, it is desirable to have methods for predicting protein properties from sequence information only. Results: Here, we present a prediction algorithm using deep neural networks to predict protein subcellular localization relying only on sequence information. At its core, the prediction model uses a recurrent neural network that processes the entire protein sequence and an attention mechanism identifying protein regions important for the subcellular localization. The model was trained and tested on a protein dataset extracted from one of the latest UniProt releases, in which experimentally annotated proteins follow more stringent criteria than previously. We demonstrate that our model achieves a good accuracy (78% for 10 categories; 92% for membrane-bound or soluble), outperforming current state-of-the-art algorithms, including those relying on homology information.

Software Downloads


  • Version 2.0
  • Version 1.0


GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0). If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Correspondence: Technical Support: