No, you aren't missing anything. Most
classifiers are sensitive to the
sequence the data arrives in. If your data is ordered by class label and
you use a percentage split with preserved order, then the last class
label will be underrepresented, maybe not represented at all, in the
training set. That's the reason, why one normally performs 10 runs of
10-fold cross-validation in order to get reasonable numbers. Before each
run of cross-validation, the data is randomized (and for nominal classes
stratified again to get a similar class distribution in the different
This makes sense indeed, but please one last question : If the inferior
results are also present on a **regression** problem, does that then mean
that the model is an unstable one, suggesting possibly over-fitting, etc???
Once again, if your data is sorted according to the class attribute
(this time a numeric number, e.g., in ascending order), then the
training data from a split with preserved order is not representative of
the entire dataset (you chopped off the highest values). Every
classification/regression scheme tries to fit its model onto the
training data, if the training data is chosen poorly, it will (most
likely) perform poorly on the test data as well.
Regarding unstable/over-fitting... It really depends on the scheme, some
schemes may need to see the full range of the numeric class in order to
build a useful model, others might be more robust.
I guess i must have given more information about
the dataset that i have used originally : The dataset
consists of US + EU stock market indices (% daily differences)
with the goal of predicting a specific numerical stock market index
(again % daily difference), in other words a regression problem.
That essentially means that the class attribute is not
ordered by any means,because the target variable (a % stock market index)
fluctuates up and down every day.
Forgive me for insisting on this but may be there is
an underlying WEKA problem (?)
The dataset is available should you want to use it
for testing purposes.