DTU Health Tech

Department of Health Technology

EasyGibbs - 1.0

Motif recognition in protein sequences by Gibbs sampler.


EasyGibbs Prediction method training server.

Submission


Paste in training examples,

or upload training examples


Paste in evaluation examples,

or, upload evaluation examples


Valid format:
column format. Example: Training set,

(data from SYFPEITHI and MHCpep. Ref, see the Abstract)
Valid formats:
column format, Example: Evaluation set
(Geluk et al., Diabetes 47 1594-1600 (1998))
or fasta format, Example: gp120.

Instructions: Paste in or upload training examples to train a prediction method. To evaluate the performance of the method Paste in or upload evaluation examples as well.

Please read the CBS access policies for information about limitations on the daily number of submissions.

Advanced options

Motif parameters

Motif length (min 1, max 51).

Matrix parameters

Clustering method.
Henikoff & Henikoff 1/nr method
Cluster at 62% identity
No clustering

Weight on prior.

General parameters


Sorting of output
Sort output on predicted values
Don't sort output

Sampling parameters

Start temperature


End temperature


Number of temperature steps (default 10, max 1000)


Random seed (-1: default, 0: seed on time, \>0: choose seed)


Number of iterations per training example (default 20, max 1000)



Background model
Flat (all amino acids have probability0.05)
Swissprot
From training data

Cutoff for counting an example as a positive example.



Position specific weighting

Weights on the different positions (format example: "1,3,1,1,1,1,1,1,3". Leave blank to choose default: equal weight on all positions)

Load saved prediction method

Paste in parameters,

or upload parameter file


CITATIONS

For publication of results, please cite:

Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach. Nielsen M, Lundegaard C, Worning P, Hvid CS, Lamberth K, Buus S, Brunak S, Lund O. Bioinformatics. 2004 20:1388-97

EasyGibbs. To be published

Usage instructions



1. Specify the training sequences

All the input sequences must be in one-letter amino acid code. The allowed alphabet (not case sensitive) is as follows:

A C D E F G H I K L M N P Q R S T V W Y

The training sequences can be input in the following two ways:

  • Paste a set of sequences, one sequence per line (just the amino acids) into the upper left window. Look here to see an example of the format.
  • You can also select a file (in the same format) on your local disk, either by typing the file name into the lower left window or by browsing the disk.

3. Select evaluation examples (Optional)

The evaluation examples can either be one example per line (optionally followed by an assigned value: example. ) or in fasta format fasta example. .

3. Customize your run by changing some of the advanced options (Optional)

4. Submit the job

Click on the "Submit" button. The status of your job (either 'queued' or 'running') will be displayed and constantly updated until it terminates and the server output appears in the browser window.

At any time during the wait you may enter your e-mail address and simply leave the window. Your job will continue; you will be notified by e-mail when it has terminated. The e-mail message will contain the URL under which the results are stored; they will remain on the server for 24 hours for you to collect them.

Output format



DESCRIPTION

Example of output is found below. The output is divided into the folowinng sections:
  • Description of training data
  • Prediction method
  • Parameters for training of Gibbs method
  • Prediction data
  • Evaluation of predictions (if assignments are supplied by user)
  • Predictions This section contain a line "Peptide Start res Motif Prediction (Assign) Sequence"
    Peptide: Peptide number
    Start res: 1st residue in motif
    Motif: Motif found in sequence
    Prediction: Prediction score for motif
    Assign: Assignment of sequence (if supplied by user)
    Sequence: Sequence containing motif



EXAMPLE OUTPUT

Description of training data

Length of motif: 9
Number of training data: 456
Number of positive training examples: 456

Parameters for training of Gibbs method

Clustering using the Henikoff & Henikoff 1/nr method
Weight on prior: 50.000000
Start temperature: 0.150000
End temperature: 0.000100
Number of temperature steps: 10.000000
Using default seed
Number of iterations per train example: 20
Using amino acid background distribution from SWISSPROT
Equal weights on all positions

Figure: Visualization of the binding motif using the logo program.
A short explanation of HLA supertypes can be found here.


Alignment generated by Gibbs sampler
Matrix generated by Gibbs sampler

Prediction data

Number of evaluation data: 57
Predicting using a matrix method

Evaluation of predictions

Pearson coefficient for N= 57 data: -0.30553
Aroc value: 0.31622
Threshold for counting example as positive: 50.000000

Predictions

Peptide   Start res Motif       Prediction Assign Sequence
1         8         FWSFGSEDG    6.549     5.400  MASPGSGFWSFGSEDGSGDS
2         1         FGSEDGSGD    3.304    62.000  FGSEDGSGDSENPGRARAWC
3         6         ARAWCQVAQ    3.363   100.000  ENPGRARAWCQVAQKFTGGI
4         1         QVAQKFTGG    1.638   100.000  QVAQKFTGGIGNKLCALLYG
5         8         LYGDAEKPA    4.603   100.000  GNKLCALLYGDAEKPAESGG
6         7         ESGGSQPPR    1.463   100.000  DAEKPAESGGSQPPRAAARK
7         8         ARKAACACD    3.170   100.000  SQPPRAAARKAACACDQKPC
8         12        CSKVDVNYA   -1.160     2.400  AACACDQKPCSCSKVDVNYA
9         12        LHATDLLPA    6.138     0.500  SCSKVDVNYAFLHATDLLPA
10        2         LHATDLLPA    6.138     0.200  FLHATDLLPACDGERPTLAF
11        12        QDVMNILLQ    1.590   100.000  CDGERPTLAFLQDVMNILLQ
12        11        YVVKSFDRS    4.430     0.700  LQDVMNILLQYVVKSFDRST
13        1         YVVKSFDRS    4.430    19.000  YVVKSFDRSTKVIDFHYPNE
14        5         FHYPNELLQ    6.535     5.000  KVIDFHYPNELLQEYNWELA
15        7         WELADQPQN    7.249     5.000  LLQEYNWELADQPQNLEEIL
16        11        MHCQTTLKY    5.006   100.000  DQPQNLEEILMHCQTTLKYA
17        9         YAIKTGHPR    9.089   100.000  MHCQTTLKYAIKTGHPRYFN
18        8         YFNQLSTGL    5.320     0.500  IKTGHPRYFNQLSTGLDMVG
19        12        AADWLTSTA    1.734     1.400  QLSTGLDMVGLAADWLTSTA
20        5         WLTSTANTN    4.697    41.000  LAADWLTSTANTNMFTYEIA
21        5         FTYEIAPVF    3.031    85.000  NTNMFTYEIAPVFVLLEYVT
22        4         MREIIGWPG    7.542    48.000  LKKMREIIGWPGGSGDGIFS
23        9         FSPGGAISN    0.479    80.000  PGGSGDGIFSPGGAISNMYA
24        9         YAMMIARFK    3.854   100.000  PGGAISNMYAMMIARFKMFP
25        11        EVKEKGMAA    3.955    25.000  MMIARFKMFPEVKEKGMAAL
26        4         EKGMAALPR    4.336    40.000  EVKEKGMAALPRLIAFTSEH
27        6         FTSEHSHFS    8.330     0.200  PRLIAFTSEHSHFSLKKGAA
28        3         FSLKKGAAA    5.693   100.000  SHFSLKKGAAALGIGTDSVI
29        10        ILIKCDERG    1.002    24.000  ALGIGTDSVILIKCDERGKM
30        10        MIPSDLERR   -0.681   100.000  LIKCDERGKMIPSDLERRIL
31        8         RILEAKQKG    2.013    38.000  IPSDLERRILEAKQKGFVPF
32        10        FLVSATAGT    5.278     4.000  EAKQKGFVPFLVSATAGTTV
33        11        YGAFDPLLA    6.422     7.000  LVSATAGTTVYGAFDPLLAV
34        1         YGAFDPLLA    6.422   100.000  YGAFDPLLAVADICKKYKIW
35        10        WMHVDAAWG    9.248     2.700  ADICKKYKIWMHVDAAWGGG
36        1         MHVDAAWGG    1.533    43.000  MHVDAAWGGGLLMSRKHKWK
37        9         WKLSGVERA    7.137     0.800  LLMSRKHKWKLSGVERANSV
38        9         SVTWNPHKM    4.403    13.000  LSGVERANSVTWNPHKMMGV
39        5         HKMMGVPLQ    3.161    34.000  TWNPHKMMGVPLQCSALLVR
40        8         LVREEGLMQ    5.674    17.000  PLQCSALLVREEGLMQNCNQ
41        3         GLMQNCNQM    2.757    41.000  EEGLMQNCNQMHASYLFQQD
42        5         YLFQQDKHY    5.432    22.000  MHASYLFQQDKHYDLSYDTG
43        5         LSYDTGDKA    4.032    31.000  KHYDLSYDTGDKALQCGRHV
44        12        VFKLWLMWR    0.431   100.000  DKALQCGRHVDVFKLWLMWR
45        5         LWLMWRAKG    8.337    33.000  DVFKLWLMWRAKGTTGFEAH
46        7         FEAHVDKCL    2.003   100.000  AKGTTGFEAHVDKCLELAEY
47        12        YNIIKNREG    3.638    34.000  VDKCLELAEYLYNIIKNREG
48        2         YNIIKNREG    3.638     4.000  LYNIIKNREGYEMVFDGKPQ
49        2         EMVFDGKPQ    2.819    67.000  YEMVFDGKPQHTNVCFWYIP
50        7         WYIPPSLRT    5.691     0.600  HTNVCFWYIPPSLRTLEDNE
51        3         LRTLEDNEE    0.154     5.000  PSLRTLEDNEERMSRLSKVA
52        4         SRLSKVAPV    3.410   100.000  ERMSRLSKVAPVIKARMMEY
53        10        YGTTMVSYQ    2.533    10.000  PVIKARMMEYGTTMVSYQPL
54        3         TMVSYQPLG    1.012    10.000  GTTMVSYQPLGDKVNFFRMV
55        7         FRMVISNPA   11.762     0.700  GDKVNFFRMVISNPAATHQD
56        2         SNPAATHQD    3.258    65.000  ISNPAATHQDIDFLIEEIER
57        8         FLIEEIERL    3.306    20.000  ATHQDIDFLIEEIERLGQDL

Article Abstract


REFERENCE

Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach. Nielsen M, Lundegaard C, Worning P, Hvid CS, Lamberth K, Buus S, Brunak S, Lund O. Bioinformatics. 2004 20:1388-97

ABSTRACT

MOTIVATION: Prediction of which peptides will bind a specific major histocompatibility complex (MHC) constitutes an important step in identifying potential T-cell epitopes suitable as vaccine candidates. MHC class II binding peptides have a broad length distribution complicating such predictions. Thus, identifying the correct alignment is a crucial part of identifying he core of an MHC class II binding motif. In this context, we wish to describe a novel Gibbs motif sampler method ideally suited for recognizing such weak sequence motifs. The method is based on the Gibbs sampling method, and it incorporates novel features optimized for the task of recognizing the binding motif of MHC classes I and II. The method locates the binding motif in a set of sequences and characterizes the motif in terms of a weight-matrix. Subsequently, the weight-matrix can be applied to identifying effectively potential MHC binding peptides and to guiding the process of rational vaccine design.
RESULTS: We apply the motif sampler method to the complex problem of MHC class II binding. The input to the method is amino acid peptide sequences extracted from the public databases of SYFPEITHI and MHCPEP and known to bind to the MHC class II complex HLA-DR4(B1*0401). Prior identification of information-rich (anchor) positions in the binding motif is shown to improve the predictive performance of the Gibbs sampler. Similarly, a consensus solution obtained from an ensemble average over suboptimal solutions is shown to outperform the use of a single optimal solution. In a large-scale benchmark calculation, the performance is quantified using relative operating characteristics curve (ROC) plots and we make a detailed comparison of the performance with that of both the TEPITOPE method and a weight-matrix derived using the conventional alignment algorithm of ClustalW. The calculation demonstrates that the predictive performance of the Gibbs sampler is higher than that of ClustalW and in most cases also higher than that of the TEPITOPE method.


GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Correspondence: Technical Support: