I was wondering if I could get some assistance. I'm new to WEKA and data mining in general. I'm currently using several of WEKA's classification algorithms on some test data.
I'm using IBL, J48, and OneR.
The data I'm working with involves about 10 attributes and about 25,000 instances -- with nominal and numeric data.
Can you tell me how using the replacing missing values filter affects my data with running of these algorithms with respect to accuracy?
I understand that this filter option replaces missing values for nominal and numeric attributes with the modes and means of the training data.
When I run the IBL, and J48, and OneR classifications, my % accuracy is greatly increased when I use the replace missing values option.
When I do not use the replace missing values, my results are significantly less accurate by like 25%.
Is using this filter a good thing / does it create any biases?
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
thanks again Davide!
On Thu, May 1, 2008 at 7:35 AM, <wekalist-request(a)list.scms.waikato.ac.nz>
> Send Wekalist mailing list submissions to
> Message: 2
> Date: Wed, 30 Apr 2008 11:50:29 +0200
> From: Davide Cittaro <davide.cittaro(a)ifom-ieo-campus.it>
> Subject: Re: [Wekalist] Re: Wekalist Digest, Vol 62, Issue 59
> To: "Weka machine learning workbench list."
> Message-ID: <3139AE30-B10A-41A9-8A84-E19BD8861C26(a)ifom-ieo-campus.it>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
> On Apr 30, 2008, at 11:07 AM, G W wrote:
> > thanks a lot David
> > you really helped me
> > :)
> > if I got it right, now I can play with the arff file outside of WEKA?
> Yes, although it would be easier if you save in CSV format, so that
> you can easily parse within any other application...
> > I thought there would be a way inside WEKA to see the clusters...
> Nope... AFAIK you only can plot cluster assignments. To look at
> clusters you have many solution. I currently use PHYLIP (which is for
> biological data, but if you can compute the distance matrix you should
> be ok) or R. Actually R has a great cluster facility with many more
> options available...
> Davide Cittaro
> Give a man a fish, and he eats for a day. Teach a man to phish, and if
> he gets caught he'll be eating that fish through a straw