I am wondering whether there is a way to use weka to make predictions
for data points in the test file for which the class value is not
The current setup assumes that there is a training file and test file
and the class values are known for both files. But what happens if one
has built a model on the training data and has some test data of the
where X is the class we want to predict but we don't know it
a priori, all we know is that it is one of the classes our
model was trained on. Is there a way to use weka this way? I
tried to find out whether there are any command-line options
or a way to encode this in the test file, but unfortunately failed.
The current cvs release of SMO supports multi-class discrimination via n-squared binary classifiers for each pair of classes. Does it also support multi-class discrimination via n binary classifiers with X vs. not X for each class?
Thanks for your reply and I would like to ask one more question.
I am running j48 as below, how can I have the result having attribute Humidity
removed from in its leaf without changing the pruning confidence factor?
C:\>java weka.classifiers.j48.J48 -t golf.arff -c 1
J48 pruned tree
Play = DontPlay
| Temperature <= 71: rain (2.0)
| Temperature > 71: sunny (3.0)
Play = Play
| Humidity <= 78
| | Humidity <= 70: sunny (3.0/1.0)
| | Humidity > 70: overcast (2.0)
| Humidity > 78: rain (4.0/1.0)
Number of Leaves : 5
Size of the tree : 9
Do You Yahoo!?
Get your free @yahoo.com.hk address at http://mail.english.yahoo.com.hk
No problem, just that if you want to use dates you should use version
3-3-5. Dates haven't made it into a stable release yet.
> I am using WEKA 3.2.3/28 May 2002 which I thought was the last stable
> version. Is there a problem with it ?
> Do you Yahoo!?
> Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
Can any one indicate to me where I can find the desicion stump algorithm(text format) ?
Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en français !
Testez le nouveau Yahoo! Mail
I got a little problems with arff containing date data. Dates are in the dd-MM-yy format hence the hearder looks like this :
@attribute timestamp date "dd-MM-yy"
@attribute yyy real
and so on...Using the command : java weka.core.Instance file.arff, I receive the message : no valid attribute type or invalid enumeration, read Token[date], line 3
Any idea of what is going on ?
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
OneR is a learning scheme in Weka. It takes a single parameter: the minimum
number of instances that must be covered by each rule that is generated. I
am using window based version 3.2. When I test the OneR on the weather data,
it seems that this parameter doesn't work: setting as 20, I still get the
result as the same as the one usin gthe default value (6).
Anyone can explain?
Wekalist mailing list
I'm trying to classify istances with nominal attributes with a large set of
possible value using J4.8.
The problem is that due to the large numbers of atttribute values and the
small numbers of training instances, it often happens that the generated
tree has leaves whose concept description doesn't correspond to any possible
In that case, the class (predictable feature) assigned to the leaf is the
one which correspond to the most common class for the training instances in
the whole father's subtree.
That policy cause to my classification project (text categorization) the
most frequent cause of mistakes. I would like to change the policy making
the algorithm simply returning a default new class, or better returning
'no-class prediction' in that case.
Does anybody knows where I do have to change the weka source files to change
-- Tor Vergata Univ (Rome) --
~~~~ Non Accipimus Brevem Vitam,Sed Facimus ~~~~~
# ~~(Seneca DBV)~~ #
# MARCO 'penny' PENNACCHIOTTI .~.
# /V\ L I N U X
# GEO: Zagarolo (Rome) -Italy- // \\ Don't fear
# E-MAIL: penny(a)gmx.it /( )\ ___the Penguin_____
# ICQ: 26991970 Legolas ^^-^^
# WebMaster: Greenpeace GdA Castelli Romani #
> Also, there are some questions as well:
> 1. Are the result generated by j48 and options totally the same as the original
> C version by Quinlan?
They are as close as we could get using a Java implementation that was written
from scratch. Depending on the dataset there will be some small differences
in the trees produced.
There are some options in C4.5 that are not available in J48 and vice versa.
> 2. What does binary split mean by? I still have multiway splits when opted for
That sounds like a bug to me. You should get binary splits if you've chosen
that option. Does that happend in the Explorer or from the command line?
> 3. Does the program do the same things when handling Real or Numeric value