DTU Health Tech

Department of Health Technology

We recently made large changes to the webserver infrastructure, so you might experience errors. Please report issues to health-master@dtu.dk

BepiPred - 3.0

Prediction of potential B-cell epitopes from protein sequence

Paste or upload protein sequence(s) as fasta format to predict potential B-cell epitopes. Prediction can take a few minutes per sequence.

Submit data


Max 50 sequences and 300,000 amino acids per submission. Each sequence must be between 10 and 6000 amino acids. For sequences with more than 1023 amino acids, predictions are truncated after the 1023th position.

... or load some sample data:

FASTA format example

Or upload your own FASTA file:
Top epitope percentage cutoff



Threshold for predicting B-cell epitope residues (0.0-1.0):

Use sequential smoothing (linear epitope prediction mode) on B-cell epitope probability score graphs (see instructions):  Yes   No


The BepiPred-3.0 server predicts both linear and discontinous B-cell epitopes from protein sequence, using neural networks trained on state of the art protein language embeddings of epitope and non-epitope amino acids determined from crystal structures. A sequential smoothing (rolling mean) can be optionally used.

Inputs

The BepiPred-3.0 server requires protein sequence(s) in fasta format, and can not handle nucleic acid sequences.

  • Protein sequence(s)
    Paste protein sequence(s) in fasta format into the field. You can also load an example sequence, by clicking 'Load Data'.
    You may also upload a fasta formatted file with protein sequence(s).
  • Top epitope percentage cutoff
    Specify a percentage cutoff a B-cell epitope predictions (20%, 50%, 70% or all %'s). The outputs are top % most likely B-cell epitope predictions.
  • Threshold
    A second output file is a fasta formatted containing B-cell epitope predictions at the set classification threshold.
    You can specify classification threshold here (default is 0.1512)
  • Sequential smoothing on graphical output (yes/no)
    A third output file is a .html file with a graphical display of B-cell epitope predictions on the protein sequence(s).
    You may specify to use a sequential smoothing (rolling mean) that makes it easier to identify patches of linear B-cell epitopes.
    But the resulting predictions have a higher risk of including false positive epitope residues (default is 'no').

Outputs

A total of 4 output file types are generated, where epitope and non-epitope residues are indicated with uppercase and lowercase letters respectively.
One of these is a .html file containing interactive plots, which can also be used directly on the result page.
  • Interactive plot(s) (HTML)
    A html file containing the graphical outputs for B-cell epitope predictions.
    The optimal threshold is often protein specific.
    These figures allows a user manually set the threshold for each and get the corresponding B-cell epitope predictions.
  • Raw output (CSV)
    A csv file containing the B-cell epitope probability scores for each residue of the protein sequence(s).
    The sequentially smoothed (linear epitope scores) are also provided.
  • B Cell epitope predictions (FASTA)
    A fasta file of the B-cell epitope predictions for the protein sequence(s) at the specified threshold.
    B-cell epitope residues are indicated with uppercase.
  • Top % epitope candidates (FASTA)
    Fasta files with the percentwise most likely B-cell epitopes predictions for the protein sequence(s). This output includes prediction based on BepiPred-3.0 scoring as well as BepiPred-3.0 scoring with sequential smoothing (liner epitope scoring).
    B-cell epitope residues are indicated with uppercase.

Graphical output

In the graphical output, B-cell epitope predictions are illustrated with bar plots.
The threshold for predicting B-cell epitopes is often protein-specific,
and single threshold is unlikey to be optimal for all proteins.
We believe this intuitive interface allows researchers to maximize their precision of B-cell epitope prediction.

Graph output without sequential smoothing (discontinous B-cell epitope prediction)


The x and y axis are protein sequence positions and BepiPred-3.0 epitope scores.
Residues with a higher score are more likely to be part of a B-cell epitope.
The threshold can be set by using the slider, which moves a dashed line along the y-axis.
Epitope predictions are updated according to the slider.
The B-cell epitope predictions at the set threshold can be downloaded by clicking the button 'Download epitope prediction'.

Graph output with sequential smoothing (linear B-cell epitope prediction)


If you chose to use the sequential smoothing (rolling mean) option, the graphical output will look different.
Using this option is more useful for detecting linear epitopes.
But it is important to note, that some residues in the predicted linear epitope are false positives,
meaning that they do not interact directly with an antibody.
This is because BepiPred-3.0 is trained on PDB crystal structures of ab-ag complexes,
and to predict antigen residues that are in contact with an antibody (within 4 angstrom).

Here, one can download the data used for training, testing and evaluating this method.

IEDB Linear Epitopes

File Format: Fasta
Header: <Positive/Negative>ID_<IEDB_Epitope_ID>
Two datasets constructed from linear epitopes extracted from IEDB. The first contains 4072 sequences.
In the reduced dataset, sequences with more than 20% sequence identity to the BP3C50ID training set were removed, leaving 3560 sequences.


Epitope: Uppercased
Non-Epitope: Lowercased

Downloads:
IEDB Linear Epitope Data
IEDB Linear Epitope Data Reduced

PDB Structural Datasets

File Format: Fasta
Header: <PDBID>_<Chain_ID>
These datasets were constructed from crystal structures, deposited in the the PDB database before 29/09/2021. Note that because of the epitope annotation strategy used to construct datasets (BP3_training set, BP3C50ID training set, BP3C50ID external test set), sequences may differ slightly from what is in the PDB database.

Epitope: Uppercased
Non-Epitope: Lowercased

Downloads:
BP3_without epitope_collapse
BP3_training set
BP3C50ID training set
BP3C50ID external test set

Please cite:

Joakim Clifford, Magnus Haraldson Høie, Sebastian Deleuran, Bjoern Peters, Morten Nielsen and Paolo Marcatili BepiPred-3.0: Improved B-cell epitope prediction using protein language models doi: https://doi.org/10.1002/pro.4497

Abstract

B -cell epitope prediction tools are of great medical and commercial interest due to their practical applications in vaccine development. The introduction of protein language models (LM), trained on unprecedented large datasets of protein sequences and structures, tap into a powerful numericrepresentation that can be exploited to accurately predict local and global protein structural features from amino acid sequences only. In this paper, we present BepiPred-3.0, a sequence-based epitope prediction tool that, by exploiting LM embeddings, greatly improves the prediction accuracy for both linear and conformational epitope prediction on several independent test sets. Furthermore, by carefully selecting additional input variables and epitope residue annotation strategy, performance can be further improved, thus achieving unprecedented predictive power. Our tool can predict epitopes across hundreds of sequences in minutes. It is freely available as a web server with a user-friendly interface to navigate the results, as well as a standalone downloadable package.

Graphical Abstract

Software Downloads




GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0). If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Correspondence: Technical Support: