hi, i'm getting a weird bug in weka, in Experimenter or in Explorer,
when I try to add classifiers to a meta-classifier (i.e. when i am
using weka.gui.GenericObjectEditor) I click on 'choose' but the tree
menu doesn't pop up, it just flashes and disappears. It's weird
because I remember it worked when i did a previous assignment. i can't
think of anything that i may have changed that would make it not
i am just running weka directly from /home/ml/weka-3-5/weka.jar of a
i'm useing KDE3.2, i tried other windows managers but didn't get any difference.
i took a look at the code and it the error doesn't seem to be coming
from the actual popup menu's code, but from somewhere mysterious. in
the end i just hacked a dialogue box with text input so i could do my
assignment, but other people might be getting the same problem.
I am trying to use Weka random forests' option with a dataset with 200 features (10,000 instances) - with 15 features in each try for 2000 trees. However, whenever I try to run this configuration, I get an out of memory error. It seems like WEKA needs 0.5GB (!) for such configuration. Any ideas why? How can I estimate the memory needed for Weka to generate random forests?
I am running PCA on 6 attributes one run of output is shown below. The output is telling me that the sixth- ie., the last attribute column in my input file is not being considered. The sixth column does not show up in the correlation matrix and it is not visible in the Princ components emitted in the output. I re-ordered the columns in the input file- same result- the last, sixth, ratio is always dropped by the PCA analysis.
I have set PrincipalComponents property 'maximumAttributeNames' to 6 [from the default of 5]. I am wondering if this is a bug that anyone else has seen? Or do I have to modify something? I am running v. 3.4.7
Thanks to those who responded on a part of my original query dealing with PCA and intuiting significance of the orginial attrs.
Thanks in advance.
----- Original Message -----
From: Arjun Khanna
Sent: Thursday, April 27, 2006 10:27 PM
Subject: [Wekalist] Question on PCA
Hi: I am running PCA on 6 variables. My goals are two fold (1) Obviously, I'd like to understand how many PC's are required to explain say 95% of the variance but I am also (2) interested in getting a sense, or better still an empirical, understanding of how important my original attributes are.
I present my weak output below. The output 'Ranking' of the attributes appears as the final part of the output. I ran pca with 'transformBackToOriginal' set to true.
Two questions that I could use your help on:
(1)By looking at the data output I can tell that the RATIO2 explains more of the variance than RATIO1. But how do I answer the question "How much more variance does RATIO2" explain compared to RATIO1? Or how do I arrive at an estimate of that?
(2) I see that RATIO6 disappears. It does not appear in the correlation matrix and it does not appear in the eigen vectors. I verified that the maximumAttributeNames switch in PCA is set to 6. So I am assuming that RATIO6 is not necessary in terms of explaining the data variance. That may be so- but I'd like to understand what RATIOS are subsuming RATIO6. How do I tell that?
Thanks in advance. Output below.
=== Run information ===
Evaluator: weka.attributeSelection.PrincipalComponents -R 0.95 -O
Search: weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N -1
Evaluation mode: evaluate on all training data
=== Attribute Selection on all input data ===
Attribute Evaluator (unsupervised):
Principal Components Attribute Transformer
1 0.16 0.02 0.07 0.24
0.16 1 0.19 0.07 0.33
0.02 0.19 1 -0.06 0.08
0.07 0.07 -0.06 1 0.14
0.24 0.33 0.08 0.14 1
eigenvalue proportion cumulative
1.5835 0.3167 0.3167 0.597RATIO5+0.572RATIO2+0.435RATIO1+0.26 RATIO3+0.243RATIO4
1.08849 0.2177 0.5344 -0.709RATIO3+0.626RATIO4-0.219RATIO2+0.212RATIO1+0.11 RATIO5
0.90369 0.18074 0.71514 -0.674RATIO1+0.65 RATIO4+0.324RATIO3+0.131RATIO2-0.04RATIO5
0.78667 0.15733 0.87247 0.539RATIO3+0.532RATIO1-0.424RATIO2-0.357RATIO5+0.345RATIO4
0.63765 0.12753 1 0.708RATIO5-0.654RATIO2+0.181RATIO3-0.17RATIO1-0.091RATIO4
V1 V2 V3 V4 V5
0.4352 0.2124 -0.6737 0.5316 -0.1703 RATIO1
0.5719 -0.2193 0.1312 -0.4238 -0.6542 RATIO2
0.2604 -0.7094 0.3244 0.5394 0.181 RATIO3
0.2429 0.6257 0.6496 0.3453 -0.0908 RATIO4
0.5973 0.11 -0.0404 -0.3572 0.7085 RATIO5
PC space transformed back to original space.
(Note: can't evaluate attributes in the original space)
1 2 RATIO2
1 1 RATIO1
1 5 RATIO5
1 3 RATIO3
1 4 RATIO4
Selected attributes: 2,1,5,3,4 : 5
Wekalist mailing list
I was just reading an old mailing list question on multiclassifiers -
Before I continue, let me illustrate my problem:
I've got a data set which looks like the following:
(In order of increasing rows)..
1)Feature1 | Feature2 | .... | Feature30 | Class1 | Class 2 | ... |Class 7
2)Feature1 | Feature2 | .... | Feature30 | Class1 | Class 2 | ... |Class 7
3)Feature1 | Feature2 | .... | Feature30 | Class1 | Class 2 | ... |Class 7
Would there be a way to have Weka predict classes, based on the feature
set given on the left? I understand that multiclass regression would
work if there are more than 2 classes - so for a "Class" column at the
end, I wouldn't be limited to Class 1 or 2, but could have as many as I
But, those would work only if the data was classified into *one* of the
given classes. In this case however, each row can have more than one
class simultaneously. Could anyone shed some light on this?
Social Robotics Lab
Department of Computer Science, Yale University
Hi all. I'm trying to use Weka libraries to build a Bayes classifier, that
should classify text documents into categories. I'm new to weka and i need
some advice about how it works.
I don't have any xml description of my documents, I just have term vectors
I even have a set of documents already classified that should help the
building of a new classifier.
What i want to get, is a classifier that generates a probabilistic
distribution for a new document over all of my categories.
Now i've seen that to build a classifier, say it NaiveBayesMultinomial, i
need an "Instances" object. To build an Instances i need another
Instances or an ARFF file... that's a problem ... i'd like to avoid this
step and generate the Instances object only from TFIDF vectors, is it
possible? If not, i think i should add all of the terms to an ARFF file
and store it somewhere, and that's a problem. I should even generate it
run-time, but i'm going to deal with a lot of documents, (should be 5
millions), and this could be a real problem.
The other problem is that the classifier should change its behaviour
dinamically, maybe changing its attributes and maybe growing of size,
is it difficoult to realize with weka classifiers?
If there's anyone that has afforded my some problem, some help would be
Im working on a Network Security project where I use Weka 's machine
learning algorithms to do the learning of several attakcs.In this regard I
plan to use the JRip algo (Ripper algo implementation) on my
dataset.Apparently my dataset has 7 attributes all Real values:
-no of pkts
-no of bytes
In my case I need to classify based on the first 5 attributes.
1.I need to know how I can do the classification based on these attributes
2. I need to know how I can apply the JRIP algorithm to numeric values
[I applied discretise..but Im not sure How the diff values are categorised
Thanks a lot in advance,
This is not a question and may not be new to anyone and if it has
already been discussed - sorry in advance.
I am using the M5P model tree, with 10 fold cross validation, on a
moderate sized data set (15,000 cases with 51 variables) as a test
before we move to a larger data set (120,000 cases with 200+ variables)
We are doing this test to see the advantages of switching to a dual core
processor machine from a single processor.
Weka version 3-4-7 took full advantage of the dual core machine. The
M5P tree, using 10 fold cross validation, was built in less than an hour
and a half. The first fold on the single processor has taken more than
two and a half hours. We expected some increase but nothing like this.
3.4 gig processor, hyper threaded
4 gig RAM
3.2 gig processor
1.5 gig ram
James Leibert, Ph.D.
DHS- Disability Services Division
"Without work, all life goes rotten-
but if work is soulless,
life stifles and dies." Albert Camus
Caution: This e-mail and attached documents, if any, may contain
information that is protected by state or federal law. E-mail containing
private or protected information should not be sent over a public
(nonsecure) Internet unless it is encrypted pursuant to DHS standards.
This e-mail should be forwarded only on a strictly need-to-know basis.
If you are not the intended recipient, please: (1) notify the sender
immediately, (2) do not forward the message, (3) do not print the
message and (4) erase the message from your system.
Unfortunately, there isn't much menu driven support for text mining in
To my knowledge all you can do is classication/clustering/attribute
selection (feature selection) tasks using WEKA explorer, if you have
text datasets in ARFF format. Converting the textual data to ARFF format
is something you would have to do on your own.
Have a look at previous posts on text classification in WEKA list
archive (http://list.scms.waikato.ac.nz/mailman/listinfo/wekalist) for
>Date: Fri, 28 Apr 2006 09:56:47 +1000
>From: "Saleh Wasimi"
>Subject: [Wekalist] Fwd: Web and text mining
> Dear Reader
>I'm trying to find out if WEKA has any modules/features/processes for text
>mining or web mining. I'm not literate in Java, therefore I'm seeking
>menu-driven type data mining procedure.
>Thanking you in anticipation.
My name is Paulo, and i am working in a project releted to data mining
on a computer grid, i am trying to run cross-validation in a parallel
way. I would like to know how cross-validation works in weka?
Thanks for any help...
### Paulo Roberto Baptista Valentim ###