Hi Aldebaro, plus other users who I know will be interested,
There are several things you can try:
1. If you run an UpdateableClassifier (e.g. IBk) from the command line
and specify both a training and testing data set, Weka will read and
process the instances one at a time from the file, greatly reducing the
memory requirements. I think this is like what you are suggesting.
2. If your data is sparse (contains many 0 values) consider using the
sparse instances data format.
3. It should be possible to convert all of the double values used by
instances into floats. This would halve the memory requirements for
storage of instances, but I'm guessing you would have to change a lot of
the code before it would agree to compile.
4. You could try to implement some sort of dagging approach, where
models are constructed from sequential chunks of the data and vote
towards classfications. We are currently looking into such techniques
ourselves, so you could hold off a while and wait until our solution
Hope this helps!
Aldebaro Klautau wrote:
I have a training sequence with around 160,000
vectors of dimension 40 and I am running my
first simulations on a PC with 160MB of RAM.
I guess it's slow mainly due to memory swap and
I am considering modifying the code such that it
sequentially reads the training data. Do you think
it would speed up things? Any suggestions besides
buying more RAM :) ?
Wekalist mailing list