Output format (version 2.1)



DESCRIPTION

An example of output is found below. The results page is composed of the following sections:
  1. Training data & Neural network architecture
  2. Information about the data used to train the ANNs, including the number of datapoints and the parameters used to train the ANN ensemble. It is also reported whether repeated flanks are found in the data, and if sequences were removed from the dataset (because shorted than the specified motif).
    You can inspect the distribution of the training data before and after rescaling. If the linear rescale produces a distribution that is too skewed towards zero, you might consider running the analysis again using a logarithmic transformation.

  3. Performance measures


  4. Sequence motif
  5. A sequence logo representation of the motif. The height of each column, and the relative height of AA letters, represent the information content in bits at each position of the alignment. Logos are generated using the Seq2Logo program.
    The amino acid preferences at each position in the alignment may also be viewed in a Log-odds matrix (or frequency matrix) format, with positive values indicating favored residues and negative values disallowed amino acids.

  6. Evaluation data
  7. If you provided evaluation data upon submission, you will find the predictions here.
    For evaluation files in peptide format with associated values (i.e. a similar format as for the training data), performance measures will also be available. If the submission was in FASTA format, the source protein sequence ID is also shown here, and in the case of peptides shared by multiple entries, the sequence IDs are listed separated by / (slash).



EXAMPLE OUTPUT


Version: 2.1
Run ID: 12857
Run Name: DRB1_0301.example

Training data

Read 1715 unique sequences
View data distribution
(See Instructions for optimal data distribution)
Pre-processing: Linear rescale

Neural network architecture

Motif length: 9
Flanking region (PFR) size: 3
Number of hidden neurons: 5,15
Peptide length encoding: 13
Flank length encoding: 0
Maximum length of deletions in alignment: 0
Maximum length of insertions in alignment: 0
Amino acid numerical encoding: Blosum
Number of training cycles: 500
Number of NN seeds: 4
Number of networks in final ensemble: 40
Stop training on best test-set performance: Yes
Cross-validation setup: Simple
Folds for cross-validation : 5
Method to create subsets: Random


RESULTS

Performance measures - motif length 9

RMSE = 0.149188
Pearson correlation coefficient = 0.735081
Spearman rank coefficient = 0.731830

View scatterplot of Predicted vs. Observed values
Download complete alignment core on the training data

Save the trained MODEL. You may use this model for a new submission

Sequence motif

Cores realigned with offset correction

Click here if you have problems visualizing this image

Figure: Visualization of the sequence motif using the Seq2Logo program

View a Log-odds matrix or Frequency matrix representation of the motif


Evaluation data

Uploaded 1068 peptides from 6 FASTA entries

See the predictions on the evaluation set




DOWNLOAD
a compressed archive with all results files