Thanks a lot, Eibe. It's very clear now.

However, how the standard deviation helps understanding the result?

Edward

On 2 Jan 2018 12:02 p.m., "Eibe Frank" wrote:

The Explorer, the KnowledgeFlow, and the CLI do not have the option to perform repeated k-fold cross-validation. Only the Experimenter does, and it uses averaging: if you run k-fold cross-valiation, repeated r times, it will collect r * k separate results and compute the average and standard deviation from these r *k results.

Cheers,

Eibe

Thank Eibe.

How about performing k-fold cross-validation for N times (say 10 times performing cross-validation) is the resulting confusion matrix is the sum of all the 100 confusion matrix? (same concept you described previously)

Edward

The Explorer uses pooling to compute performance statistics. The command-line interface and the KnowledgeFlow use the same approach. This makes it easier to draw ROC curves, etc.

Due to this approach, the confusion matrix is computed from the pooled results for the k test sets in a k-fold cross-validation.

One disadvantage of this approach is that it does not produce estimates of variance. The Experimenter uses averaging rather than pooling and it will give you estimates of variance.

Cheers,

Eibe

Thanks Peter.

Why sum not average?

Edward

> In the Explorer, is the confusion matrix computed as the sum or average of

> the 10 confusion matrices (ie, sum or average of the confusion matrices of

> each of the 10 folds)?

Sum - from the 10 test folds.

Cheers, Peter

