Skip to main content

Table 1 Total classification error (TER), false negative (FNR) and false positive (FPR) rates, and area under the ROC curve (AUC) for increasing proportions of mislabeled observations with the five tested classification models

From: “Noisy beets”: impact of phenotyping errors on genomic predictions for binary traits in Beta vulgaris

misLabels (%) minFreq errType KNN LR RF SVM-Lin SVM-Rbf
0 0.1950 TER 0.0000 0.0000 0.0085 0.0000 0.0001
FNR 0.0000 0.0000 0.0067 0.0000 0.0001
FPR 0.0020 0.0020 0.0054 0.0020 0.0036
AUC 1.0000 0.9980 0.9946 0.9980 0.9961
1 0.1870 TER 0.0039 0.0153 0.0092 0.0077 0.0008
FNR 0.0042 0.0153 0.0076 0.0078 0.0007
FPR 0.0038 0.0046 0.0036 0.0095 0.0044
AUC 0.9961 0.9954 0.9964 0.9905 0.9955
2.5 0.1870 TER 0.0045 0.0291 0.0102 0.0145 0.0004
FNR 0.0049 0.0283 0.0094 0.0139 0.0004
FPR 0.0041 0.0094 0.0032 0.0174 0.0023
AUC 0.9959 0.9905 0.9968 0.9825 0.9977
5 0.2114 TER 0.0088 0.0897 0.0236 0.0471 0.0047
FNR 0.0096 0.0864 0.0213 0.0466 0.0043
FPR 0.0052 0.0484 0.0052 0.0496 0.0073
AUC 0.9948 0.9516 0.9951 0.9503 0.9918
7.5 0.2520 TER 0.0160 0.1431 0.0342 0.0708 0.0087
FNR 0.0159 0.1386 0.0307 0.0688 0.0077
FPR 0.0071 0.0920 0.0077 0.0748 0.0116
AUC 0.9928 0.9080 0.9921 0.9251 0.9882
10 0.2439 TER 0.0292 0.2011 0.0553 0.1111 0.0205
FNR 0.0294 0.1963 0.0521 0.1105 0.0188
FPR 0.0100 0.1462 0.0173 0.1134 0.0242
AUC 0.9898 0.8538 0.9827 0.8866 0.9754
12.5 0.2846 TER 0.0396 0.2286 0.0679 0.1275 0.0328
FNR 0.0393 0.2247 0.0625 0.1297 0.0285
FPR 0.0139 0.1680 0.0214 0.1277 0.0381
AUC 0.9861 0.8320 0.9786 0.8723 0.9614
15 0.2927 TER 0.0536 0.2714 0.0924 0.1687 0.0484
FNR 0.0533 0.2637 0.0867 0.1687 0.0439
FPR 0.0254 0.2237 0.0358 0.1705 0.0535
AUC 0.9746 0.7763 0.9642 0.8292 0.9460
17.5 0.2764 TER 0.0691 0.2903 0.1098 0.1887 0.0635
FNR 0.0692 0.2867 0.1017 0.1889 0.0595
FPR 0.0323 0.2425 0.0549 0.1903 0.0686
AUC 0.9677 0.7575 0.9451 0.8097 0.9091
20 0.2846 TER 0.0924 0.3095 0.1258 0.2166 0.0835
FNR 0.0948 0.3068 0.1207 0.2212 0.0767
FPR 0.0402 0.2608 0.0594 0.2149 0.0906
AUC 0.9598 0.7391 0.9406 0.7851 0.9081
25 0.3984 TER 0.1334 0.3415 0.1947 0.2550 0.1377
FNR 0.1325 0.3344 0.1829 0.2582 0.1141
FPR 0.0800 0.2976 0.1320 0.2559 0.1454
AUC 0.9198 0.7024 0.8680 0.7441 0.8532
30 0.3659 TER 0.2073 0.3693 0.2522 0.3079 0.1989
FNR 0.2079 0.3700 0.2477 0.3156 0.1745
FPR 0.1518 0.3439 0.1930 0.3067 0.2099
AUC 0.8481 0.6561 0.8069 0.6933 0.7901
40 0.4309 TER 0.3681 0.4382 0.3884 0.4044 0.3551
FNR 0.3723 0.4376 0.3916 0.4087 0.3088
FPR 0.3254 0.4223 0.3546 0.4051 0.3639
AUC 0.6745 0.5777 0.6453 0.5949 0.6351
50 0.5203 TER 0.5111 0.5134 0.5194 0.5130 0.5116
FNR 0.5214 0.5120 0.5199 0.5161 0.5238
FPR 0.5208 0.5165 0.5170 0.5147 0.5137
AUC 0.4792 0.4834 0.4830 0.4853 0.4862
  1. Reported values of classification performance are average validation results from a 5-fold cross-validation scheme repeated 100 times (per model, per mislabel proportion). MinFreq is the frequency of the minority class (low-root vigor). In italic the best performing method (in terms of AUC) for each percentage of noisy lables
  2. KNN K-nearest neighbours, LR ridge logistic regression, RF random forest, SVM-Lin SVM with linear kernel, SVM-Rbf SVM with radial basis function