4 Jan
2018
4 Jan
'18

7:51 a.m.

It will give you some idea of the spread in the k * r estimates from a
k-fold cross-validation repeated r times. If the spread is small, it will
be more likely that performance on new data (sampled from the same
distribution) is close to the mean of the k * r estimates.
Cheers,
Eibe
On Wed, Jan 3, 2018 at 3:52 PM, Edward Wiskers
wrote:

Hi Eibe,
However, how the standard deviation helps understanding the result?
Edward
On 3 Jan 2018 8:44 a.m., "Eibe Frank"

No, they don’t calculate estimates of variance.
What kind of estimate of variance are you interested in? If you would
like to obtain prediction intervals when performing regression, consider
WEKA classes that implement the IntervalEstimator interface, such as
GaussianProcesses or RegressionByDiscretization.
Cheers,
Eibe
Peter Schotland
*Sent: *Tuesday, 2 January 2018 5:38 PM
*To: *Weka machine learning workbench list. <wekalist(a)list.waikato.ac.nz>
*Subject: *Re: [Wekalist] Confusion matrix in the
Explorerforcross-validationmode
Hi Eibe,
As follow-up, does the RandomCommittee MetaClassifier (or any of the
ensemble methods) compute a variance?
Thank you,
Peter
On Jan 1, 2018 11:02 PM, "Eibe Frank"
The Explorer, the KnowledgeFlow, and the CLI do not have the option to
perform repeated k-fold cross-validation. Only the Experimenter does, and
it uses averaging: if you run k-fold cross-valiation, repeated r times, it
will collect r * k separate results and compute the average and standard
deviation from these r *k results.
Cheers,
Eibe
Edward Wiskers
*Sent: *Tuesday, 2 January 2018 4:52 PM
*To: *Weka machine learning workbench list. <wekalist(a)list.waikato.ac.nz>
*Subject: *Re: [Wekalist] Confusion matrix in the Explorer
forcross-validationmode
Thank Eibe.
How about performing k-fold cross-validation for N times (say 10 times
performing cross-validation) is the resulting confusion matrix is the sum
of all the 100 confusion matrix? (same concept you described previously)
Edward
On 2 Jan 2018 11:47 a.m., "Eibe Frank"
The Explorer uses pooling to compute performance statistics. The
command-line interface and the KnowledgeFlow use the same approach. This
makes it easier to draw ROC curves, etc.
Due to this approach, the confusion matrix is computed from the pooled
results for the k test sets in a k-fold cross-validation.
One disadvantage of this approach is that it does not produce estimates
of variance. The Experimenter uses averaging rather than pooling and it
will give you estimates of variance.
Cheers,
Eibe
Edward Wiskers
*Sent: *Monday, 1 January 2018 1:40 PM
*To: *Weka machine learning workbench list. <wekalist(a)list.waikato.ac.nz>
*Subject: *Re: [Wekalist] Confusion matrix in the Explorer for
cross-validationmode
Thanks Peter.
Why sum not average?
Edward
On 1 Jan 2018 7:03 a.m., "Peter Reutemann"
wrote:

In the Explorer, is the confusion matrix computed
as the sum or average

of
the 10 confusion matrices (ie, sum or average of
the confusion matrices

of
each of the 10 folds)?

Sum - from the 10 test folds.
Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
