Mark Hall-9 wrote:
Use the FilteredClassifier - it "learns" the filter on the training
data, filters the training data before passing it to the classifier, and
then uses the learned filter to process any test instances before
passing them to the classifier for classification.
I was excited to read your suggestion after a full day of searching the
internet for a solution to my problem.
I have a largish dataset that I am using Weka to explore. It goes like this:
today I will analyze as much data as I can, and create a trained classifier.
I'll save this model as a file. Then tomorrow I will acquire a new batch of
data, and want to use the saved model to predict the class for the new data.
This repeats every day. Eventually I will update the saved model, but for
now assume that it is static.
Due to the size and frequency of this task, I want to run this
automatically, which means the command line or similar. However, my problem
exists in the Explorer, as well.
My question has to do with the fact that, as my dataset grows, the list of
possible labels for attributes also grows. In fact, the number of
attributes can grow too. (A concrete example: an attribute is a SKU and an
instance is how ofter a certain customer bought that SKU in a week.) How do
I get Weka to handle a training dataset and a test dataset that are
dissimilar in this way? I believe I understand what "not compatible sets"
are. It is suggested that I run "batch filtering" (
Batch Filtering Hint ), but I am unable to get this to work. Following your
hint toward FilteredClassifier generates a "not compatible" error for me,
even when I manually edit the two .arff files to have the same attributes
(but different labels).
java -Xmx1280M weka.classifiers.meta.FilteredClassifier -t
-T .\test_data.arff -c 14 ^
-F "weka.filters.supervised.attribute.Discretize -R first-last" ^
-W weka.classifiers.trees.J48 -- -C 0.25 -M 2 ^
I would love another hint.
View this message in context:
Sent from the WEKA mailing list archive at Nabble.com