Services
EasyGibbs - 1.0
Motif recognition in protein sequences by Gibbs sampler.
EasyGibbs Prediction method training server.
Submission
Usage instructions
1. Specify the training sequences
All the input sequences must be in one-letter amino acid code. The allowed alphabet (not case sensitive) is as follows:
The training sequences can be input in the following two ways:
- Paste a set of sequences, one sequence per line (just the amino acids) into the upper left window. Look here to see an example of the format.
- You can also select a file (in the same format) on your local disk, either by typing the file name into the lower left window or by browsing the disk.
3. Select evaluation examples (Optional)
The evaluation examples can either be one example per line (optionally followed by an assigned value: example. ) or in fasta format fasta example. .3. Customize your run by changing some of the advanced options (Optional)
4. Submit the job
Click on the "Submit" button. The status of your job (either 'queued' or 'running') will be displayed and constantly updated until it terminates and the server output appears in the browser window.At any time during the wait you may enter your e-mail address and simply leave the window. Your job will continue; you will be notified by e-mail when it has terminated. The e-mail message will contain the URL under which the results are stored; they will remain on the server for 24 hours for you to collect them.
Output format
DESCRIPTION
Example of output is found below. The output is divided into the folowinng sections:
- Description of training data
- Prediction method
- Parameters for training of Gibbs method
- Prediction data
- Evaluation of predictions (if assignments are supplied by user)
- Predictions
This section contain a line "Peptide Start res Motif Prediction (Assign) Sequence"
Peptide: Peptide number
Start res: 1st residue in motif
Motif: Motif found in sequence
Prediction: Prediction score for motif
Assign: Assignment of sequence (if supplied by user)
Sequence: Sequence containing motif
EXAMPLE OUTPUT
Description of training data
Length of motif: 9Number of training data: 456
Number of positive training examples: 456
Parameters for training of Gibbs method
Clustering using the Henikoff & Henikoff 1/nr methodWeight on prior: 50.000000
Start temperature: 0.150000
End temperature: 0.000100
Number of temperature steps: 10.000000
Using default seed
Number of iterations per train example: 20
Using amino acid background distribution from SWISSPROT
Equal weights on all positions

Figure: Visualization of the binding motif using the logo program.
A short explanation of HLA supertypes can be found here.
Alignment generated by Gibbs sampler
Matrix generated by Gibbs sampler
Prediction data
Number of evaluation data: 57Predicting using a matrix method
Evaluation of predictions
Pearson coefficient for N= 57 data: -0.30553Aroc value: 0.31622
Threshold for counting example as positive: 50.000000
Predictions
Peptide Start res Motif Prediction Assign Sequence 1 8 FWSFGSEDG 6.549 5.400 MASPGSGFWSFGSEDGSGDS 2 1 FGSEDGSGD 3.304 62.000 FGSEDGSGDSENPGRARAWC 3 6 ARAWCQVAQ 3.363 100.000 ENPGRARAWCQVAQKFTGGI 4 1 QVAQKFTGG 1.638 100.000 QVAQKFTGGIGNKLCALLYG 5 8 LYGDAEKPA 4.603 100.000 GNKLCALLYGDAEKPAESGG 6 7 ESGGSQPPR 1.463 100.000 DAEKPAESGGSQPPRAAARK 7 8 ARKAACACD 3.170 100.000 SQPPRAAARKAACACDQKPC 8 12 CSKVDVNYA -1.160 2.400 AACACDQKPCSCSKVDVNYA 9 12 LHATDLLPA 6.138 0.500 SCSKVDVNYAFLHATDLLPA 10 2 LHATDLLPA 6.138 0.200 FLHATDLLPACDGERPTLAF 11 12 QDVMNILLQ 1.590 100.000 CDGERPTLAFLQDVMNILLQ 12 11 YVVKSFDRS 4.430 0.700 LQDVMNILLQYVVKSFDRST 13 1 YVVKSFDRS 4.430 19.000 YVVKSFDRSTKVIDFHYPNE 14 5 FHYPNELLQ 6.535 5.000 KVIDFHYPNELLQEYNWELA 15 7 WELADQPQN 7.249 5.000 LLQEYNWELADQPQNLEEIL 16 11 MHCQTTLKY 5.006 100.000 DQPQNLEEILMHCQTTLKYA 17 9 YAIKTGHPR 9.089 100.000 MHCQTTLKYAIKTGHPRYFN 18 8 YFNQLSTGL 5.320 0.500 IKTGHPRYFNQLSTGLDMVG 19 12 AADWLTSTA 1.734 1.400 QLSTGLDMVGLAADWLTSTA 20 5 WLTSTANTN 4.697 41.000 LAADWLTSTANTNMFTYEIA 21 5 FTYEIAPVF 3.031 85.000 NTNMFTYEIAPVFVLLEYVT 22 4 MREIIGWPG 7.542 48.000 LKKMREIIGWPGGSGDGIFS 23 9 FSPGGAISN 0.479 80.000 PGGSGDGIFSPGGAISNMYA 24 9 YAMMIARFK 3.854 100.000 PGGAISNMYAMMIARFKMFP 25 11 EVKEKGMAA 3.955 25.000 MMIARFKMFPEVKEKGMAAL 26 4 EKGMAALPR 4.336 40.000 EVKEKGMAALPRLIAFTSEH 27 6 FTSEHSHFS 8.330 0.200 PRLIAFTSEHSHFSLKKGAA 28 3 FSLKKGAAA 5.693 100.000 SHFSLKKGAAALGIGTDSVI 29 10 ILIKCDERG 1.002 24.000 ALGIGTDSVILIKCDERGKM 30 10 MIPSDLERR -0.681 100.000 LIKCDERGKMIPSDLERRIL 31 8 RILEAKQKG 2.013 38.000 IPSDLERRILEAKQKGFVPF 32 10 FLVSATAGT 5.278 4.000 EAKQKGFVPFLVSATAGTTV 33 11 YGAFDPLLA 6.422 7.000 LVSATAGTTVYGAFDPLLAV 34 1 YGAFDPLLA 6.422 100.000 YGAFDPLLAVADICKKYKIW 35 10 WMHVDAAWG 9.248 2.700 ADICKKYKIWMHVDAAWGGG 36 1 MHVDAAWGG 1.533 43.000 MHVDAAWGGGLLMSRKHKWK 37 9 WKLSGVERA 7.137 0.800 LLMSRKHKWKLSGVERANSV 38 9 SVTWNPHKM 4.403 13.000 LSGVERANSVTWNPHKMMGV 39 5 HKMMGVPLQ 3.161 34.000 TWNPHKMMGVPLQCSALLVR 40 8 LVREEGLMQ 5.674 17.000 PLQCSALLVREEGLMQNCNQ 41 3 GLMQNCNQM 2.757 41.000 EEGLMQNCNQMHASYLFQQD 42 5 YLFQQDKHY 5.432 22.000 MHASYLFQQDKHYDLSYDTG 43 5 LSYDTGDKA 4.032 31.000 KHYDLSYDTGDKALQCGRHV 44 12 VFKLWLMWR 0.431 100.000 DKALQCGRHVDVFKLWLMWR 45 5 LWLMWRAKG 8.337 33.000 DVFKLWLMWRAKGTTGFEAH 46 7 FEAHVDKCL 2.003 100.000 AKGTTGFEAHVDKCLELAEY 47 12 YNIIKNREG 3.638 34.000 VDKCLELAEYLYNIIKNREG 48 2 YNIIKNREG 3.638 4.000 LYNIIKNREGYEMVFDGKPQ 49 2 EMVFDGKPQ 2.819 67.000 YEMVFDGKPQHTNVCFWYIP 50 7 WYIPPSLRT 5.691 0.600 HTNVCFWYIPPSLRTLEDNE 51 3 LRTLEDNEE 0.154 5.000 PSLRTLEDNEERMSRLSKVA 52 4 SRLSKVAPV 3.410 100.000 ERMSRLSKVAPVIKARMMEY 53 10 YGTTMVSYQ 2.533 10.000 PVIKARMMEYGTTMVSYQPL 54 3 TMVSYQPLG 1.012 10.000 GTTMVSYQPLGDKVNFFRMV 55 7 FRMVISNPA 11.762 0.700 GDKVNFFRMVISNPAATHQD 56 2 SNPAATHQD 3.258 65.000 ISNPAATHQDIDFLIEEIER 57 8 FLIEEIERL 3.306 20.000 ATHQDIDFLIEEIERLGQDL
Article Abstract
REFERENCE
Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach. Nielsen M, Lundegaard C, Worning P, Hvid CS, Lamberth K, Buus S, Brunak S, Lund O. Bioinformatics. 2004 20:1388-97ABSTRACT
MOTIVATION: Prediction of which peptides will bind a specific major histocompatibility complex (MHC) constitutes an
important step in identifying potential T-cell epitopes suitable as vaccine candidates. MHC class II binding peptides have a
broad length distribution complicating such predictions. Thus, identifying the correct alignment is a crucial part of identifying
he core of an MHC class II binding motif. In this context, we wish to describe a novel Gibbs motif sampler method ideally
suited for recognizing such weak sequence motifs. The method is based on the Gibbs sampling method, and it incorporates
novel features optimized for the task of recognizing the binding motif of MHC classes I and II. The method locates the binding
motif in a set of sequences and characterizes the motif in terms of a weight-matrix. Subsequently, the weight-matrix can be
applied to identifying effectively potential MHC binding peptides and to guiding the process of rational vaccine design.
RESULTS: We apply the motif sampler method to the complex problem of MHC class II binding. The input to the method is amino acid peptide sequences extracted from the public databases of SYFPEITHI and MHCPEP and known to bind to the MHC class II complex HLA-DR4(B1*0401). Prior identification of information-rich (anchor) positions in the binding motif is shown to improve the predictive performance of the Gibbs sampler. Similarly, a consensus solution obtained from an ensemble average over suboptimal solutions is shown to outperform the use of a single optimal solution. In a large-scale benchmark calculation, the performance is quantified using relative operating characteristics curve (ROC) plots and we make a detailed comparison of the performance with that of both the TEPITOPE method and a weight-matrix derived using the conventional alignment algorithm of ClustalW. The calculation demonstrates that the predictive performance of the Gibbs sampler is higher than that of ClustalW and in most cases also higher than that of the TEPITOPE method.
RESULTS: We apply the motif sampler method to the complex problem of MHC class II binding. The input to the method is amino acid peptide sequences extracted from the public databases of SYFPEITHI and MHCPEP and known to bind to the MHC class II complex HLA-DR4(B1*0401). Prior identification of information-rich (anchor) positions in the binding motif is shown to improve the predictive performance of the Gibbs sampler. Similarly, a consensus solution obtained from an ensemble average over suboptimal solutions is shown to outperform the use of a single optimal solution. In a large-scale benchmark calculation, the performance is quantified using relative operating characteristics curve (ROC) plots and we make a detailed comparison of the performance with that of both the TEPITOPE method and a weight-matrix derived using the conventional alignment algorithm of ClustalW. The calculation demonstrates that the predictive performance of the Gibbs sampler is higher than that of ClustalW and in most cases also higher than that of the TEPITOPE method.
GETTING HELP
Correspondence:
Technical Support: