We recently made large changes to the webserver infrastructure, so you might experience errors. Please report issues to health-master@dtu.dk

NetDiseaseSNP - 1.0

Predicts whether a single non-synonymous SNP causes a disease or is invariant

The NetDiseaseSNP server predicts whether a single non-synonymous SNP causes a disease or is invariant.

Submission

Paste one or several sequences in FASTA format into the field below or load example

Submit a file in FASTA format directly from your local disk:

Paste in variant data into the field below or load example

Submit a file (example} directly from your local disk:

sort variants by disease causing potential

Advanced options

include predictions on variants where NetDiseaseSNP and SIFT disagree. Overall performance of NetDiseaseSNP: MCC=0.67, sensitivity=0.82, specificity=0.85
generate predictions on all variants including variants encoded with Blosum62 matrix data. Overall performance of NetDiseaseSNP: MCC=0.64, sensitivity=0.80, specificity=0.83

Restrictions:
At most 2,000 sequences and 200,000 amino acids per submission; each sequence not more than 4,000 amino acids.

Confidentiality:
The sequences are kept confidential and will be deleted after processing.

CITATIONS

For publication of results, please cite:

Prediction of Disease Causing Non-Synonymous SNPs by the Artificial Neural Network Predictor NetDiseaseSNP.
Johansen MB, Izarzugaza JM, Brunak S, Petersen TN, Gupta R.
PLoS One. 2013 Jul 25;8(7):e68370. doi: 10.1371/journal.pone.0068370.

PMID: 23935863

Instructions

1. Specify the input sequences

All the input sequences must be in one-letter amino acid code. The allowed alphabet (not case sensitive) is as follows:

A C D E F G H I K L M N P Q R S T V W Y B Z X U

The sequences can be input in the following two ways:

Paste a single sequence (just the amino acids) or a number of sequences in FASTA format into the upper sequence window of the main server page.
Select a FASTA file on your local disk, either by typing the file name into the lower sequence window or by browsing the disk.

2. Specify the input variant data

All the input variant data must be in one-letter amino acid code. The allowed alphabet for the native and variant amino acids (not case sensitive) are as follows:

A C D E F G H I K L M N P Q R S T V W Y U

The format of the variant data has to be in the form:

'Accession' 'Native amino acid' 'Position' 'Variant amino acid'
where the accession has to correspond to the sequence identifier in the fasta sequence.

Example:
Accesssion S 2 T
Accesssion H 6 K
.
.

The variant data can be input in the following two ways:

Paste variant data into the upper variant data window of the main server page.
Select a variant file on your local disk, either by typing the file name into the lower variant data window or by browsing the disk.

3. Lowercase letters

All lowercase letters for amino acids in sequence and variant data will be changed to uppercase letters.

4. Submit the job

Click on the "Submit" button. The status of your job (either 'queued' or 'running') will be displayed and constantly updated until it terminates and the server output appears in the browser window.

At any time during the wait you may enter your e-mail address and simply leave the window. Your job will continue; you will be notified by e-mail when it has terminated. The e-mail message will contain the URL under which the results are stored; they will remain on the server for 24 hours for you to collect them.

Output format

Column 1: Accession specified in the fasta sequence file and variant file
Column 2: Amino acid number for the variant position in the sequence
Column 3: Variant: 'native amino acid'->'variant amino acid'
Column 4: NetDiseaseSNP score: 0 to 1 (score>=0.5:DISEASE; score<0.5:NEUTRAL)
Column 5: NetDiseaseSNP predicted category for the variant: DISEASE/NEUTRAL

Article Abstract

REFERENCE

Prediction of Disease Causing Non-Synonymous SNPs by the Artificial Neural Network Predictor NetDiseaseSNP.
Johansen MB, Izarzugaza JM, Brunak S, Petersen TN, Gupta R.¹
PLoS One. 2013 Jul 25;8(7):e68370. doi: 10.1371/journal.pone.0068370.

¹to whom correspondence should be addressed: ramneek@cbs.dtu.dk

Center for Biological Sequence Analysis, CBS, Department of Systems Biology, Technical University of Denmark, DK-2800 Lyngby, Denmark.

PMID: 23935863

ABSTRACT

We have developed a sequence conservation-based artificial neural network predictor called NetDiseaseSNP which classifies nsSNPs as disease-causing or neutral. Our method uses the excellent alignment generation algorithm of SIFT to identify related sequences and a combination of 31 features assessing sequence conservation and the predicted surface accessibility to produce a single score which can be used to rank nsSNPs based on their potential to cause disease. NetDiseaseSNP classifies successfully disease-causing and neutral mutations. In addition, we show that NetDiseaseSNP discriminates cancer driver and passenger mutations satisfactorily. Our method outperforms other state-of-the-art methods on several disease/neutral datasets as well as on cancer driver/passenger mutation datasets and can thus be used to pinpoint and prioritize plausible disease candidates among nsSNPs for further investigation. NetDiseaseSNP is publicly available as an online tool as well as a web service: http://services.healthtech.dtu.dk/service.php?NetDiseaseSNP-1.0.

Training set

The training set for NetDiseaseSNP is made publicly available below with the exception of information from HGMD Professional which we are not allowed to share for licensing reasons. We appreciate this poses a limitation in others obtaining a full data set, however the additional curated information in HGMD Professional warrants its use in this work, and academic pricing for this product is known to be reasonable if other computational groups want to obtain the data. Data that we do make available originates from UniProt.

The variant data file contains lines of tab separated fields of the format: 'Accession of sequence', 'native amino acid', 'position', 'variant amino acid', 'target value: Neutral=0/Disease=1', 'Origin of data':

Variant data

The fasta sequences corresponding to the variant data can be found via the link below:

Sequence data

GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0). If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Correspondence: Technical Support: