An example of output is found below. The results page is composed of the following sections:
- Training data & Neural network architecture
Information about the data used to train the ANNs, including the number of datapoints and the parameters used to train the ANN ensemble. It is also reported whether repeated flanks are found in the data, and if sequences were removed from the dataset (because shorted than the specified motif).
You can inspect the distribution of the training data before and after rescaling. If the linear rescale produces a distribution that is too skewed towards zero, you might consider running the analysis again using a logarithmic transformation.
- Performance measures
- Predictive performance is estimated in cross-validation on the training set, and given as Root mean square error (RMSE), Pearson and Spearman correlations.
- For a visual depiction of the correlation between observed vs. predicted values, inspect
the "scatterplot" figure.
- The "complete alignment core" file reports the prediction for each sequence and the
core of the alignment. This file consists of several columns:
- Core: the predicted binding core for the sequence
- P1: position of the first residue of the core within the sequence
- Measure: the target value
- Prediction: the score predicted by the ensemble
- Peptide: complete sequence of the training example
- Gap_pos: starting position of the deletion, if any
- Gap_lgt: length of the deletion, if any
- Insert_pos: starting position of the insertion, if any
- Insert_lgt: length of the insertion, if any
- Core+Gap: the binding core including inserted or deleted amino acids, if any
- P1_rel: reliability of the starting position of the core. It gives a confidence measure on the location of the core (reliability scores are described in this paper.).
- The trained "model", i.e. the set of network weights optimized on the training data, can be dowloaded to local disk using the relative link. The model file can than be uploaded to server at any moment to obtain prediction on new data.
A sequence logo representation of the motif. The height of each column, and the relative height of AA letters, represent the information content in bits at each position of the alignment. Logos are generated using the Seq2Logo
The amino acid preferences at each position in the alignment may also be viewed in a Log-odds matrix (or frequency matrix) format, with positive values indicating favored residues and negative values disallowed amino acids.
If you provided evaluation data upon submission, you will find the predictions here.
For evaluation files in peptide format with associated values (i.e. a similar format as for the training data), performance measures will also be available.
If the submission was in FASTA format, the source protein sequence ID is also shown here, and in the case of peptides shared by multiple entries, the sequence IDs are listed separated by / (slash).