Submission
CITATIONS
For publication of results, please cite:
Prediction of Disease Causing Non-Synonymous SNPs by the Artificial Neural
Network Predictor NetDiseaseSNP.
Johansen MB, Izarzugaza JM, Brunak S, Petersen TN, Gupta R.
PLoS One. 2013 Jul 25;8(7):e68370. doi: 10.1371/journal.pone.0068370.
PMID: 23935863
Instructions
1. Specify the input sequences
All the input sequences must be in one-letter amino acid
code. The allowed alphabet (not case sensitive) is as follows:
A C D E F G H I K L M N P Q R S T V W Y B Z X U
The sequences can be input in the following two ways:
-
Paste a single sequence (just the amino acids) or a number of sequences in
FASTA
format into the upper sequence window of the main server page.
-
Select a FASTA
file on your local disk, either by typing the file name into the lower sequence window
or by browsing the disk.
2. Specify the input variant data
All the input variant data must be in one-letter amino acid
code. The allowed alphabet for the native and variant amino acids (not case sensitive) are as follows:
A C D E F G H I K L M N P Q R S T V W Y U
The format of the variant data has to be in the form:
'Accession' 'Native amino acid' 'Position' 'Variant amino acid'
where the accession has to correspond to the sequence identifier in the fasta sequence.
Example:
Accesssion S 2 T
Accesssion H 6 K
.
.
The variant data can be input in the following two ways:
-
Paste variant data into the upper variant data window of the main server page.
-
Select a variant file on your local disk, either by typing the file name into the lower variant data window
or by browsing the disk.
3. Lowercase letters
All lowercase letters for amino acids in sequence and variant data will be changed to uppercase letters.
4. Submit the job
Click on the
"Submit" button. The status of your job (either 'queued'
or 'running') will be displayed and constantly updated until it terminates and
the server output appears in the browser window.
At any time during the wait you may enter your e-mail address and simply leave
the window. Your job will continue; you will be notified by e-mail when it has
terminated. The e-mail message will contain the URL under which the results are
stored; they will remain on the server for 24 hours for you to collect them.
Article Abstract
REFERENCE
Prediction of Disease Causing Non-Synonymous SNPs by the Artificial Neural
Network Predictor NetDiseaseSNP.
Johansen MB, Izarzugaza JM, Brunak S, Petersen TN,
Gupta R.1
PLoS One. 2013 Jul 25;8(7):e68370. doi: 10.1371/journal.pone.0068370.
1to whom correspondence should be addressed:
ramneek@cbs.dtu.dk
Center for Biological Sequence Analysis, CBS, Department of Systems Biology,
Technical University of Denmark, DK-2800 Lyngby, Denmark.
PMID: 23935863
ABSTRACT
We have developed a sequence conservation-based artificial neural network
predictor called NetDiseaseSNP which classifies nsSNPs as disease-causing or
neutral. Our method uses the excellent alignment generation algorithm of SIFT
to identify related sequences and a combination of 31 features assessing
sequence conservation and the predicted surface accessibility to produce a
single score which can be used to rank nsSNPs based on their potential to cause
disease. NetDiseaseSNP classifies successfully disease-causing and neutral
mutations. In addition, we show that NetDiseaseSNP discriminates cancer driver
and passenger mutations satisfactorily. Our method outperforms other
state-of-the-art methods on several disease/neutral datasets as well as on
cancer driver/passenger mutation datasets and can thus be used to pinpoint and
prioritize plausible disease candidates among nsSNPs for further investigation.
NetDiseaseSNP is publicly available as an online tool as well as a web service:
http://services.healthtech.dtu.dk/service.php?NetDiseaseSNP-1.0.
Training set
The training set for NetDiseaseSNP is made publicly available below with the exception of information from HGMD Professional which we are not allowed to share for licensing reasons. We appreciate this poses a limitation in others obtaining a full data set, however the additional curated information in HGMD Professional warrants its use in this work, and academic pricing for this product is known to be reasonable if other computational groups want to obtain the data. Data that we do make available originates from UniProt.
The variant data file contains lines of tab separated fields of the format: 'Accession of sequence', 'native amino acid', 'position', 'variant amino acid', 'target value: Neutral=0/Disease=1', 'Origin of data':
Variant data
The fasta sequences corresponding to the variant data can be found via the link below:
Sequence data