DTU Health Tech

Department of Health Technology

NetPhos - 3.1

Generic phosphorylation sites in eukaryotic proteins


The NetPhos 3.1 server predicts serine, threonine or tyrosine phosphorylation sites in eukaryotic proteins using ensembles of neural networks. Both generic and kinase specific predictions are performed. The generic predictions are identical to the predictions performed by NetPhos 2.0. The kinase specific predictions are identical to the predictions by NetPhosK 1.0. Predictions are made for the following 17 kinases:

ATM, CKI, CKII, CaM-II, DNAPK, EGFR, GSK3, INSR, PKA, PKB, PKC, PKG, RSK, SRC, cdc2, cdk5 and p38MAPK.



Submission


Sequence submission: paste the sequence(s) and/or upload a local file

Paste a single sequence or several sequences in FASTA format into the field below:

Submit a file in FASTA format directly from your local disk:

Residues to predict serine threonine tyrosine all three

For each residue display only the best prediction

Display only the scores higher than

Output format classical GFF

Generate graphics


Restrictions:
At most 2000 sequences and 200,000 amino acids per submission; each sequence not less than 15 and not more than 4,000 amino acids.

Confidentiality:
The sequences are kept confidential and will be deleted after processing.


CITATIONS

For publication of results, please cite:

Generic predictions:

Sequence- and structure-based prediction of eukaryotic protein phosphorylation sites.
Blom, N., Gammeltoft, S., and Brunak, S.
Journal of Molecular Biology: 294(5): 1351-1362, 1999.

PMID: 10600390

Kinase specific predictions:

Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence.
Blom N, Sicheritz-Ponten T, Gupta R, Gammeltoft S, Brunak S.
Proteomics: Jun;4(6):1633-49, review 2004.

PMID: 15174133

Instructions



1. Specify the input sequences

All the input sequences must be in one-letter amino acid code. The allowed alphabet (not case sensitive) is as follows:

A C D E F G H I K L M N P Q R S T V W Y and X (unknown)

The characters not in the one-letter amino acid code e.g. letters like 'B' or 'Z', digits and other non-alphabetic symbols will all be converted to X before processing; they will be accepted but not used when making the predictions. White space within the sequences, if any, will be ignored. The sequences can be input in the following two ways:

  • Paste a single sequence or several sequences in FASTA format into the input field.
    Note: if you paste more than one sequence FASTA format must be used.

  • Submit a file in FASTA format directly from your local disk. The file may contain multiple sequences.
Both methods may be used in the same submission. Peptides shorter than 9 residues can produce non-reliable results. Always include 4 residues on both sides of the Tyr/Ser/Thr you want evaluated. The optimal is to use the complete "native" protein sequence.


2. Customize your run

By default the server predicts phosphorylation sites for all the serine, threonine and tyrosine residues in the input sequences, displays all the predictions made for each such residue and generates graphical output illustrating the results. These settings can be changed, as follows:

  • "Residues to predict" - the scope of prediction can be limited to just one of serine/threonine/tyrosine. The deafult is to predict for all the three amino acids.

  • "For each residue display only the best prediction" - for each residue the server can be made to display only the prediction with the highest score. The default is to display all the predictions for all the residues.

  • "Display only the scores higher than ..." - only the predictions with scores higher than a given threshold will be displayed. By default all the predictions are displayed (the threshold of "0"). NOTE: the choice of "0.5" will imply that only the positive predictions will be shown (see the Output format).

  • "Output format" - by default the output format is the native NetPhos format (see the Output format). It can be changed to GFF.

  • "Generate graphics" - by default the server produces graphical output illustrating the predictions. This can be disabled.
Before sumitting larger runs, the user is advised to test the settings above on a few sequences, to arrive at the best combination for a given run.


3. Submit the job

When ready press the button labelled 'Submit'. The status of your job (either 'queued' or 'running') will be displayed and constantly updated until it terminates and the server output appears in the browser window.

At any time during the wait you may enter your e-mail address and simply leave the window. Your job will continue; you will be notified by e-mail when it has terminated. The e-mail message will contain the URL under which the results are stored; they will remain on the server for 24 hours for you to collect them.

Output format



Classical format

For each input sequence the following is shown (see the example below):
  • FASTA-like header line: a line showing the sequence name and length.

  • Prediction lines: one line per residue and kinase, with six columns in the form:

    1. Sequence - the sequence name;
    2. # - the position of the residue in the sequence;
    3. x - the residue in one-letter code;
    4. Context - the sequence context of the residue, shown as a 9-residue subsequence centered on the residue;
    5. Score - the prediction score (a value in the range [0.000-1.000]; the scores above 0.500 indicate positive predictions);
    6. Kinase - the active kinase or the string "unsp" for non-specific prediction (as in NetPhos 2.0);
    7. Answer - the string "YES" for positive predictions, else a dot.

  • Sequence - the input sequence as processed by NetPhos, with an overview of the positions of the predicted sites.

  • Graphics - a plot of scores illustrating the predictions. NOTE: for each residue only the highest score is shown.


GFF

The output in GFF ( GFF version 2) provides essentially the same information as the classical format described above. The only differences, apart from the syntax, are as follows:
  • the sequence context of the residues is not provided
  • the positive predictions are indicated by "Y" (not "YES")
This option has been provided for the benefit of the users who have access to software parsing GFF.


Example

The NetPhos 3.1 output for the UniProtKB entry P53_HUMAN (P04637). The predictions are for tyrosine only, showing only the highest scoring prediction for each residue and skipping all the predictions with scores lower than 0.250.


Classical format

>P53_HUMAN	393 amino acids
#
# netphos-3.1b prediction results
#
# Sequence		   # x   Context     Score   Kinase    Answer
# -------------------------------------------------------------------
# P53_HUMAN               18 T   LSQETFSDL   0.582   CKI        YES
# P53_HUMAN               55 T   EQWFTEDPG   0.598   CKII       YES
# P53_HUMAN               81 T   PAAPTPAAP   0.704   unsp       YES
# P53_HUMAN              102 T   PSQKTYQGS   0.472   cdc2        . 
# P53_HUMAN              118 T   LHSGTAKSV   0.654   PKC        YES
# P53_HUMAN              123 T   AKSVTCTYS   0.455   GSK3        . 
# P53_HUMAN              125 T   SVTCTYSPA   0.684   PKC        YES
# P53_HUMAN              140 T   QLAKTCPVQ   0.479   cdc2        . 
# P53_HUMAN              150 T   WVDSTPPPG   0.599   unsp       YES
# P53_HUMAN              155 T   PPPGTRVRA   0.829   unsp       YES
# P53_HUMAN              170 T   SQHMTEVVR   0.482   unsp        . 
# P53_HUMAN              211 T   DDRNTFRHS   0.951   unsp       YES
# P53_HUMAN              230 T   GSDCTTIHY   0.438   GSK3        . 
# P53_HUMAN              231 T   SDCTTIHYN   0.454   GSK3        . 
# P53_HUMAN              253 T   RPILTIITL   0.467   CaM-II      . 
# P53_HUMAN              256 T   LTIITLEDS   0.492   cdc2        . 
# P53_HUMAN              284 T   RDRRTEEEN   0.865   unsp       YES
# P53_HUMAN              304 T   PPGSTKRAL   0.939   unsp       YES
# P53_HUMAN              312 T   LPNNTSSSP   0.459   CaM-II      . 
# P53_HUMAN              329 T   GEYFTLQIR   0.443   GSK3        . 
# P53_HUMAN              377 T   KGQSTSRHK   0.932   unsp       YES
# P53_HUMAN              387 T   LMFKTEGPD   0.440   GSK3        . 
#
    MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDI   #     50
    EQWFTEDPGPDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQ   #    100
    KTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDST   #    150
    PPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGN   #    200
    LRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRP   #    250
    ILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELP   #    300
    PGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALEL   #    350
    KDAQAGKEPGGSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD          #    400
%1  .................T................................   #     50
%1  ....T.........................T...................   #    100
%1  .................T......T........................T   #    150
%1  ....T.............................................   #    200
%1  ..........T.......................................   #    250
%1  .................................T................   #    300
%1  ...T..............................................   #    350
%1  ..........................T................


GFF

The corresponding output in GFF (the graph is not shown again):

##gff-version 2
##source-version netphos-3.1b
##date 2016-07-12
##Type Protein P53_HUMAN
##Protein P53_HUMAN
##MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP
##DEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAK
##SVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHE
##RCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNS
##SCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELP
##PGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPG
##GSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD                           
##end-Protein
# seqname            source        feature      start   end   score  N/A   ?
# ---------------------------------------------------------------------------
P53_HUMAN            netphos-3.1b  phos-CKI        18    18   0.582  . .  YES
P53_HUMAN            netphos-3.1b  phos-CKII       55    55   0.598  . .  YES
P53_HUMAN            netphos-3.1b  phos-unsp       81    81   0.704  . .  YES
P53_HUMAN            netphos-3.1b  phos-cdc2      102   102   0.472  . .   . 
P53_HUMAN            netphos-3.1b  phos-PKC       118   118   0.654  . .  YES
P53_HUMAN            netphos-3.1b  phos-GSK3      123   123   0.455  . .   . 
P53_HUMAN            netphos-3.1b  phos-PKC       125   125   0.684  . .  YES
P53_HUMAN            netphos-3.1b  phos-cdc2      140   140   0.479  . .   . 
P53_HUMAN            netphos-3.1b  phos-unsp      150   150   0.599  . .  YES
P53_HUMAN            netphos-3.1b  phos-unsp      155   155   0.829  . .  YES
P53_HUMAN            netphos-3.1b  phos-unsp      170   170   0.482  . .   . 
P53_HUMAN            netphos-3.1b  phos-unsp      211   211   0.951  . .  YES
P53_HUMAN            netphos-3.1b  phos-GSK3      230   230   0.438  . .   . 
P53_HUMAN            netphos-3.1b  phos-GSK3      231   231   0.454  . .   . 
P53_HUMAN            netphos-3.1b  phos-CaM-II    253   253   0.467  . .   . 
P53_HUMAN            netphos-3.1b  phos-cdc2      256   256   0.492  . .   . 
P53_HUMAN            netphos-3.1b  phos-unsp      284   284   0.865  . .  YES
P53_HUMAN            netphos-3.1b  phos-unsp      304   304   0.939  . .  YES
P53_HUMAN            netphos-3.1b  phos-CaM-II    312   312   0.459  . .   . 
P53_HUMAN            netphos-3.1b  phos-GSK3      329   329   0.443  . .   . 
P53_HUMAN            netphos-3.1b  phos-unsp      377   377   0.932  . .  YES
P53_HUMAN            netphos-3.1b  phos-GSK3      387   387   0.440  . .   . 


PhosphoBase



PhosphoBase is a database of phosphorylation sites originally developed at the Center for Biological Sequence Analysis at the Technical University of Denmark. It has now moved to Phospho.ELM at http://phospho.elm.eu.org/. It is hosted by EMBL.

Phospho.ELM is becoming the primary resource for phosphorylation data. CBS continues to contribute to the database alongside many other research groups.

Software Downloads




GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Correspondence: Technical Support: