DTU Health Tech

Department of Health Technology

NetAllergen - 1.0

Protein allergenicity prediction

The NetAllergen-1.0 server predicts protein allergenicity integrating MHC class II presentation propensity.

Submission


Paste in sequence data (maximum 100 sequences)

or upload sequence data


All sequences must be submitted in amino acid fasta file format.
Valid format examples: example.fa
Classification threshold:
Use BLAST to improve prediction.

CITATIONS

For publication of results, please cite:
NetAllergen, a random forest model integrating MHC-II presentation propensity for improved allergenicity prediction
Yuchen Li, Peter Wad Sackett, Morten Nielsen, Carolina Barra
Published: 16 October 2023, Bioinformatics Advances, vbad151, https://doi.org/10.1093/bioadv/vbad151

Instructions

NetAllergen-1.0 is a predictive model based on the random forest algorithm. It incorporates novel MHC class II presentation propensity features to improve the allergenicity prediction.

NetAllergen-1.0 predicts allergenicity from protein sequences. The program only accepts amino acid sequences in fasta format. The user could paste sequences into the blank window or upload a fasta file by clicking "Browse...".


Options

Classification threshold
The user could customize the threshold for classifying allergens and non-allergens. The default threshold is 0.207.

Use BLAST to improve prediction
The program will search the query sequence against the allergen dataset used for model training. If the E-values of BLAST are below 1E-16 threshold (highly similar sequences), the query sequence will be classified as allergens.

Output

ID
Protein ID from FASTA file

Evalue_BLAST
E-value from BLAST search against the positive allergen dataset (only available when "Use BLAST to improve prediction" is selected)

Class_BLAST
Binary classification for BLAST based on E-value threshold (1E-16): 1 is for allergen, and 0 for non-allergen (only available when "Use BLAST to improve prediction" is selected)

Score_60F
Predictive score from the 60F random forest model

Class_60F
Binary classification for 60F based predictive score threshold: 1 is for allergen, and 0 for non-allergen

Class_combined
Binary classification combining the Class_BLAST and Class_60F (only available when "Use BLAST to improve prediction" is selected) If any of the two classifications indicates allergen, the combined class will predict allergen.

NetAllergen, a random forest model integrating MHC-II presentation propensity for improved allergenicity prediction


Abstracts

Motivation
Allergy is a pathological immune reaction towards innocuous protein antigens. Although only a narrow fraction of plant or animal proteins induce allergy, atopic disorders affect millions of children and adults and cost billions in healthcare systems worldwide. In-silico predictors can aid in the development of more innocuous food sources. Previous allergenicity predictors used sequence similarity, common structural domains, and amino acid physicochemical features. However, these predictors strongly rely on sequence similarity to known allergens and fail to predict protein allergenicity accurately when similarity diminishes.
Results
To overcome these limitations, we collected allergens from AllergenOnline, a curated database of IgE-inducing allergens, carefully removed allergen redundancy with a novel protein partitioning pipeline, and developed a new allergen prediction method, introducing MHC presentation propensity as a novel feature. NetAllergen outperformed a sequence similarity-based BLAST baseline approach, and previous allergenicity predictor AlgPred 2 when similarity to known allergens is limited.

Supplementary material

Here, you will find the data set used for training and evaluation.


Training dataset

NA_3014.fa

Evaluation dataset

APV_2780.fa
IND_5856.fa


References

NetAllergen, a random forest model integrating MHC-II presentation propensity for improved allergenicity prediction
Yuchen Li, Peter Wad Sackett, Morten Nielsen, Carolina Barra
Published: 16 October 2023, Bioinformatics Advances, vbad151, https://doi.org/10.1093/bioadv/vbad151

Version history


1.0 Oct 2023:

  • NetAllergen, a random forest model integrating MHC-II presentation propensity for improved allergenicity prediction

Software Downloads




GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Correspondence: Technical Support: