## Performance of SignalP 4.1

### Correlation

In the SignalP 4.0 article, we show that SignalP 4.0 is superior in performance to SignalP 3.0 and ten competing methods (five dedicated signal peptide predictors and five transmembrane topology predictors with built-in signal peptide models), when the performance is measured by Matthews Correlation Coefficient (MCC).

Matthews Correlation Coefficient is a very widely used measure for performance in bioinformatics. It is defined thus:

$\mathrm{MCC}=\frac{\mathrm{tp}×\mathrm{tn}-\mathrm{fp}×\mathrm{fn}}{\sqrt{\left(\mathrm{tp}+\mathrm{fp}\right)\left(\mathrm{tp}+\mathrm{fn}\right)\left(\mathrm{tn}+\mathrm{fp}\right)\left(\mathrm{tn}+\mathrm{fn}\right)}}$
where
• tp is the number of true positives (signal peptides predicted as such)
• tn is the number of true negatives (non-signal peptides predicted as such)
• fp is the number of false positives (erroneous signal peptide predictions)
• fn is the number of false negatives (missed signal peptides)
and it takes the value of 1 for a perfect prediction, 0 for a random (non-informative) prediction, and -1 for a consistently wrong prediction.

In Table E (pp. 10-11) of the supplementary materials you can see the MCC values for SignalP and the competing methods.

### Sensitivity, false positive rate and cutoff choice

However, SignalP 4.0 is not superior to SignalP 3.0 according to all performance measures. Notably, the sensitivity is lower when you use the default cutoff. Sensitivity is the proportion of the true signal peptides that are correctly predicted:

$\mathrm{Sens}=\frac{\mathrm{tp}}{\mathrm{tp}+\mathrm{fn}}$

All prediction methods that make a classification from a numerical output have a choice to make: where to place the cutoff (also known as threshold) for the output? If you use a high cutoff, you will get few false positives, but also a low sensitivity; if you lower the cutoff, you will get a better sensitivity at the price of more false positives. The false positive rate is defined as:

$\mathrm{FPR}=\frac{\mathrm{fp}}{\mathrm{fp}+\mathrm{tn}}$

There is no single correct answer to the problem of choosing the cutoff, it depends on the contet in which the prediction method is used. For SignalP, we have used a cutoff on the D-score (see the Output format for a definition) that maximizes the MCC.

### ROC curves

The trade-off between sensitivity and false positive rate is often illustrated graphically as a so-called ROC curve which has false positive rate on the x-axis and sensitivity on the y-axis for varying values of the cutoff. The better a predition method is, the closer to the upper left corner the ROC curve will be, while a random (non-informative) prediction will follow the diagonal. This is an excellent way to compare different predictors, since it is not dependent on cutoff choice.

Below, you can see ROC curves for SignalP 3 and 4 for the three different organism groups. Note: in contrast to the values in Table E, these are not evaluation performances; they are made by applying the finished methods to the Total data set before homology reduction.   These ROC curves show that:

• When there are TM segments in the data ("all data"), SignalP 4.0 is clearly better than SignalP 3.0 (compare the pink and green curves)
• When TM segments are excluded from the data ("no TM"), SignalP 4.0 performance is practically equal to that of SignalP 3.0 — except in the Gram-positives, where it is better (compare the blue and red curves)
• SignalP 4.0 and 3.0 default cutoffs are placed at very different points on the ROC curves, leading to lower sensitivity (and much lower FP rates) in SignalP 4.0.

### The cutoff choice in SignalP 4.1

SignalP 4.1 offers the users an option of using cutoff values which reproduce the sensitivity of SignalP 3.0. The price is, of course, a slightly higher false positive rate.

In the table below, the performace values are shown for SignalP 3.0, SignalP 4.1 with default cutoff, and SignalP 4.1 with "sensitive" (SignalP-3.0 compliant) cutoff. Note, again, that these are not evaluation performances and should not be used to compare SignalP to competing methods, they are merely for the purpose of comparing SignalP versions.

Method Cutoff,
SignalP-noTM
Cutoff,
SignalP-TM
Sensitivity FP rate,
no TM
FP rate,
all data
MCC,
no TM
MCC,
all data
Eukaryotic data
SignalP 3.0 0.43 0.988 0.008 0.117 0.978 0.781
SignalP 4.1 default 0.45 0.50 0.967 0.003 0.011 0.972 0.955
SignalP 4.1 sensitive 0.34 0.34 0.988 0.009 0.043 0.976 0.903
Gram-positive data
SignalP 3.0 0.45 0.961 0.008 0.033 0.937 0.814
SignalP 4.1 default 0.57 0.45 0.950 0.000 0.001 0.973 0.967
SignalP 4.1 sensitive 0.42 0.42 0.961 0.000 0.003 0.978 0.958
Gram-negative data
SignalP 3.0 0.44 0.955 0.004 0.061 0.949 0.691
SignalP 4.1 default 0.57 0.51 0.924 0.000 0.001 0.957 0.949
SignalP 4.1 sensitive 0.42 0.42 0.955 0.002 0.006 0.963 0.937