The DeepLoc-1.0 server predicts the subcellular localization of eukaryotics proteins using Neural Networks algorithm trained on Uniprot proteins with experimental evidence of subcellular localization. It only uses the sequence information to perform the prediction.
The DeepLoc-1.0 server requires protein sequence(s) in fasta format, and can not handle nucleic acid sequences.
Paste protein sequence(s) in fasta format or upload a fasta file.
After the server successfully finishes the job, a summary page shows up.
If an error happens during the prediction a log will appear specifying the error.
Use the navigation bar to flip through the various output pages.
Training and testing data sets
The dataset used to train and test the DeepLoc-1.0 server is available here deeploc_dataset
It is a fasta file composed by header and sequence. The header is composed by the accession number from Uniprot, the annotated subcellular localization and possibly a description field indicating if the protein was part of the test set. The subcellular localization includes an additional label, where S indicates soluble, M membrane and U unknown.
>Q9SMX3 Mitochondrion-M test
Jose Juan Almagro Armenteros, Casper Kaae Sønderby, Søren Kaae Sønderby, Henrik Nielsen, Ole Winther; DeepLoc: prediction of protein subcellular localization using deep learning, Bioinformatics, btx431
Motivation: The prediction of eukaryotic protein subcellular localization is a well-studied topic in bioinformatics due to its relevance in proteomics research. Many machine learning methods have been successfully applied in this task, but in most of them, predictions rely on annotation of homologues from knowledge databases. For novel proteins where no annotated homologues exist, and for predicting the effects of sequence variants, it is desirable to have methods for predicting protein properties from sequence information only.
Results: Here, we present a prediction algorithm using deep neural networks to predict protein subcellular localization relying only on sequence information. At its core, the prediction model uses a recurrent neural network that processes the entire protein sequence and an attention mechanism identifying protein regions important for the subcellular localization. The model was trained and tested on a protein dataset extracted from one of the latest UniProt releases, in which experimentally annotated proteins follow more stringent criteria than previously. We demonstrate that our model achieves a good accuracy (78% for 10 categories; 92% for membrane-bound or soluble), outperforming current state-of-the-art algorithms, including those relying on homology information.