Services
NetAllergen - 1.0
Protein allergenicity prediction
The NetAllergen-1.0 server predicts protein allergenicity integrating MHC class II presentation propensity.
Submission
CITATIONS
For publication of results, please cite:
NetAllergen, a random forest model integrating MHC-II presentation propensity for improved allergenicity prediction
Yuchen Li, Peter Wad Sackett, Morten Nielsen, Carolina Barra
Published: 16 October 2023, Bioinformatics Advances, vbad151, https://doi.org/10.1093/bioadv/vbad151
Instructions
NetAllergen-1.0 is a predictive model based on the random forest algorithm. It incorporates novel MHC class II presentation propensity features to improve the allergenicity prediction.
NetAllergen-1.0 predicts allergenicity from protein sequences. The program only accepts amino acid sequences in fasta format. The user could paste sequences into the blank window or upload a fasta file by clicking "Browse...".
Options
- Classification threshold
- The user could customize the threshold for classifying allergens and non-allergens. The default threshold is 0.207.
- Use BLAST to improve prediction
- The program will search the query sequence against the allergen dataset used for model training. If the E-values of BLAST are below 1E-16 threshold (highly similar sequences), the query sequence will be classified as allergens.
Output
- ID
- Protein ID from FASTA file
- Evalue_BLAST
- E-value from BLAST search against the positive allergen dataset (only available when "Use BLAST to improve prediction" is selected)
- Class_BLAST
- Binary classification for BLAST based on E-value threshold (1E-16): 1 is for allergen, and 0 for non-allergen (only available when "Use BLAST to improve prediction" is selected)
- Score_60F
- Predictive score from the 60F random forest model
- Class_60F
- Binary classification for 60F based predictive score threshold: 1 is for allergen, and 0 for non-allergen
- Class_combined
- Binary classification combining the Class_BLAST and Class_60F (only available when "Use BLAST to improve prediction" is selected) If any of the two classifications indicates allergen, the combined class will predict allergen.
NetAllergen, a random forest model integrating MHC-II presentation propensity for improved allergenicity prediction
Abstracts
Motivation
Allergy is a pathological immune reaction towards innocuous protein antigens. Although only a narrow fraction of plant or animal proteins induce allergy, atopic disorders affect millions of children and adults and cost billions in healthcare systems worldwide. In-silico predictors can aid in the development of more innocuous food sources. Previous allergenicity predictors used sequence similarity, common structural domains, and amino acid physicochemical features. However, these predictors strongly rely on sequence similarity to known allergens and fail to predict protein allergenicity accurately when similarity diminishes.
Results
To overcome these limitations, we collected allergens from AllergenOnline, a curated database of IgE-inducing allergens, carefully removed allergen redundancy with a novel protein partitioning pipeline, and developed a new allergen prediction method, introducing MHC presentation propensity as a novel feature. NetAllergen outperformed a sequence similarity-based BLAST baseline approach, and previous allergenicity predictor AlgPred 2 when similarity to known allergens is limited.
Supplementary material
Here, you will find the data set used for training and evaluation.
Training dataset
Evaluation dataset
References
NetAllergen, a random forest model integrating MHC-II presentation propensity for improved allergenicity prediction
Yuchen Li, Peter Wad Sackett, Morten Nielsen, Carolina Barra
Published: 16 October 2023, Bioinformatics Advances, vbad151, https://doi.org/10.1093/bioadv/vbad151
Version history
1.0 |
Oct 2023:
|