Skip to main content

Table 1 Total classification error (TER), false negative (FNR) and false positive (FPR) rates, and area under the ROC curve (AUC) for increasing proportions of mislabeled observations with the five tested classification models

From: “Noisy beets”: impact of phenotyping errors on genomic predictions for binary traits in Beta vulgaris

misLabels (%)

minFreq

errType

KNN

LR

RF

SVM-Lin

SVM-Rbf

0

0.1950

TER

0.0000

0.0000

0.0085

0.0000

0.0001

FNR

0.0000

0.0000

0.0067

0.0000

0.0001

FPR

0.0020

0.0020

0.0054

0.0020

0.0036

AUC

1.0000

0.9980

0.9946

0.9980

0.9961

1

0.1870

TER

0.0039

0.0153

0.0092

0.0077

0.0008

FNR

0.0042

0.0153

0.0076

0.0078

0.0007

FPR

0.0038

0.0046

0.0036

0.0095

0.0044

AUC

0.9961

0.9954

0.9964

0.9905

0.9955

2.5

0.1870

TER

0.0045

0.0291

0.0102

0.0145

0.0004

FNR

0.0049

0.0283

0.0094

0.0139

0.0004

FPR

0.0041

0.0094

0.0032

0.0174

0.0023

AUC

0.9959

0.9905

0.9968

0.9825

0.9977

5

0.2114

TER

0.0088

0.0897

0.0236

0.0471

0.0047

FNR

0.0096

0.0864

0.0213

0.0466

0.0043

FPR

0.0052

0.0484

0.0052

0.0496

0.0073

AUC

0.9948

0.9516

0.9951

0.9503

0.9918

7.5

0.2520

TER

0.0160

0.1431

0.0342

0.0708

0.0087

FNR

0.0159

0.1386

0.0307

0.0688

0.0077

FPR

0.0071

0.0920

0.0077

0.0748

0.0116

AUC

0.9928

0.9080

0.9921

0.9251

0.9882

10

0.2439

TER

0.0292

0.2011

0.0553

0.1111

0.0205

FNR

0.0294

0.1963

0.0521

0.1105

0.0188

FPR

0.0100

0.1462

0.0173

0.1134

0.0242

AUC

0.9898

0.8538

0.9827

0.8866

0.9754

12.5

0.2846

TER

0.0396

0.2286

0.0679

0.1275

0.0328

FNR

0.0393

0.2247

0.0625

0.1297

0.0285

FPR

0.0139

0.1680

0.0214

0.1277

0.0381

AUC

0.9861

0.8320

0.9786

0.8723

0.9614

15

0.2927

TER

0.0536

0.2714

0.0924

0.1687

0.0484

FNR

0.0533

0.2637

0.0867

0.1687

0.0439

FPR

0.0254

0.2237

0.0358

0.1705

0.0535

AUC

0.9746

0.7763

0.9642

0.8292

0.9460

17.5

0.2764

TER

0.0691

0.2903

0.1098

0.1887

0.0635

FNR

0.0692

0.2867

0.1017

0.1889

0.0595

FPR

0.0323

0.2425

0.0549

0.1903

0.0686

AUC

0.9677

0.7575

0.9451

0.8097

0.9091

20

0.2846

TER

0.0924

0.3095

0.1258

0.2166

0.0835

FNR

0.0948

0.3068

0.1207

0.2212

0.0767

FPR

0.0402

0.2608

0.0594

0.2149

0.0906

AUC

0.9598

0.7391

0.9406

0.7851

0.9081

25

0.3984

TER

0.1334

0.3415

0.1947

0.2550

0.1377

FNR

0.1325

0.3344

0.1829

0.2582

0.1141

FPR

0.0800

0.2976

0.1320

0.2559

0.1454

AUC

0.9198

0.7024

0.8680

0.7441

0.8532

30

0.3659

TER

0.2073

0.3693

0.2522

0.3079

0.1989

FNR

0.2079

0.3700

0.2477

0.3156

0.1745

FPR

0.1518

0.3439

0.1930

0.3067

0.2099

AUC

0.8481

0.6561

0.8069

0.6933

0.7901

40

0.4309

TER

0.3681

0.4382

0.3884

0.4044

0.3551

FNR

0.3723

0.4376

0.3916

0.4087

0.3088

FPR

0.3254

0.4223

0.3546

0.4051

0.3639

AUC

0.6745

0.5777

0.6453

0.5949

0.6351

50

0.5203

TER

0.5111

0.5134

0.5194

0.5130

0.5116

FNR

0.5214

0.5120

0.5199

0.5161

0.5238

FPR

0.5208

0.5165

0.5170

0.5147

0.5137

AUC

0.4792

0.4834

0.4830

0.4853

0.4862

  1. Reported values of classification performance are average validation results from a 5-fold cross-validation scheme repeated 100 times (per model, per mislabel proportion). MinFreq is the frequency of the minority class (low-root vigor). In italic the best performing method (in terms of AUC) for each percentage of noisy lables
  2. KNN K-nearest neighbours, LR ridge logistic regression, RF random forest, SVM-Lin SVM with linear kernel, SVM-Rbf SVM with radial basis function