Dear Weka Users,
in order to better understand the operating principles of parameter
optimization performed by GridSearch and CVParameterSelection I have had a
look at the code and I got a couple of doubts about the results produced,
in particular about the nested crossvalidations setting.
Both parameter search class and the classifier panel class perform
crossvalidation, therefore there is an "outer" crossvalidation, set by the
classifier panel class, that divides the whole dataset in n-fold subsets
and applies the parameter optimization to each of them collecting
fold-by-fold the predictions for all the instances, but there is also an
inner crossvalidation, set by the parameter search, that for each value of
the investigated parameter further splits each sub-dataset in m-fold
sub-sub-datasets, applies the classifier on each of them collecting the
predictions and finally returns the parameter best value (or parameters
values best combination).
This means that, most likely, each fold of the outer cv will correspond to
a classification model with different parameter(s) value(s), i.e. a
different model, therefore, whether I understood correctly, I am a bit
confused by the meaning of the predictions vector collected in this way.
The only explanation I deduced, is that this procedure does not aim to
validate the model but on the contrary the procedure itself.
Similar consideration can be done for the AttributeSelectedClassifer; also
in this case we have two nested crossvalidations and the model which the
different fold of the outer cv are performed with is likely different for
each fold. Therefore the summary statistics is calculated on a predictions
vector which different m-fold portions are likely related to m-fold
Thanks in advance for your precious help.
I am a contributor to an arff-parser in python
Within this module, I found a unit-test which checks that the following
arff file can be read:
@relation "software metric"
@attribute number_of_files numeric
@attribute "lines of code" numeric
@attribute 'defect density' numeric
Apparently, this checks that liac-arff acts consistent with the WEKA
arff parser, which parses this file without an error message.
Contrarily, the documentation
(http://weka.wikispaces.com/ARFF+%28developer+version%29) states that
'The *@data* declaration is a single line denoting the start of the data
segment in the file.', which makes me think that WEKA should not read
this arff file.
Hi, I have a couple of questions. Note: I'm using the java API.
The first is related to the .arff format, is it possible to have
relational attributes within other relational attributes?
Next, once I have generated a Bayesian network , is it possible to
extract the network information? The data I need is how each node in the
network is weighted, so I can find out what attributes from the data
affected the prediction the most.
Thanks in advance!
Jan M. Ørstavik
I have a simple dataset that I load into the Explorer, with 2 attributes -
one is a string with words separated by spaces and the other is just an
I add the String to Word Vector filter and when I apply it nothing happens.
I would expect a number of new attributes to be created but it would appear
as if nothing is done.
Is there some special prerequisite that must be in place before the filter
will function properly or must I make some changes to the filter
configuration from the default?
I have Weka v 3.6.13 installed.
I have uploaded the sample arff file too
I would be grateful for any assistance
View this message in context: http://weka.8497.n7.nabble.com/String-to-Word-Vector-filter-problem-tp35584…
Sent from the WEKA mailing list archive at Nabble.com.
weka-3-6-12 on OS X.
My dataset is composed by 160.000 instances, each instance have 28
attributes. I try to divide it in sub-dataset, so currently work with 200
The format of dataset is sparce ARFF.
I run from explorer interface this command
*Apriori -N 10 -T 0 -C 0.5 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1*
After few minutes weka seem to stop the work (bird stop and its state from
1 to 0), after the situation is the following:
- In the associator output weka print the run information and list the
- The last row of log is *11:24:18: Command: weka.associations.Apriori
-N 10 -T 0 -C 0.5 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1 *
- Status is *"Building model on training data..."*
Someone can suggest me something to try?
Sourceforge and the package manager are working for me. Perhaps you are
using a proxy. Take a look at:
manager-Using a HTTP proxy
for information on using the package manager with a proxy.
From: <wekalist-bounces(a)list.waikato.ac.nz> on behalf of
Reply-To: "Weka machine learning workbench list."
Date: Wednesday, 30 September 2015 12:14 am
To: "Weka machine learning workbench list." <wekalist(a)list.waikato.ac.nz>
Subject: [Wekalist] WekaPackageManager not working
> Hey everybody. I'm not able to get the weka.core.WekaPackageManager to work,
> neither with the GUI nor with the command line. I'm on Windows 7 using Weka
> GUI results in
> java.net.UnknownHostException: weka.sourceforge.net
> java -cp "C:\Program Files\Weka-3-7\weka.jar" weka.core.WekaPackageManager
> results in
> Unable to download repository zip archve (weka.sourceforge.net) - trying
> legacy routine...
> java.net.UnknownHostException: weka.sourceforge.net
> Even setting `weka.core.wekaPackageRepositoryURL` to
> `http://www.cs.waikato.ac.nz/ml/weka/packageMetaData` in
> `C:\Users\USR\wekafiles\props\PackageRepository.props` as proposed [here]
> doesn't help.
> Can somebody provide a correct repository URL? May the access be blocked
> because of a firewall?
> Thanks in advance, I appreciate your time!
> : http://forums.pentaho.com/archive/index.php/t-90698.html
> _______________________________________________ Wekalist mailing list Send
> posts to: Wekalist(a)list.waikato.ac.nz List info and subscription status:
> http://list.waikato.ac.nz/mailman/listinfo/wekalist List etiquette:
Hey everybody. I'm not able to get the weka.core.WekaPackageManager to work, neither with the GUI nor with the command line. I'm on Windows 7 using Weka 3.7.13.
GUI results in
java -cp "C:\Program Files\Weka-3-7\weka.jar" weka.core.WekaPackageManager
Unable to download repository zip archve (weka.sourceforge.net) - trying legacy routine...
Even setting `weka.core.wekaPackageRepositoryURL` to `http://www.cs.waikato.ac.nz/ml/weka/packageMetaData` in `C:\Users\USR\wekafiles\props\PackageRepository.props` as proposed [here] doesn't help.
Can somebody provide a correct repository URL? May the access be blocked because of a firewall?
Thanks in advance, I appreciate your time!
I am rying to set up a series of decision tree runs on
various datasets using batch script via command line. The
model building and testing runs perfectly, and works
superbly well. My problem is having so many run output
files, I need to have a better way of keeping track of them.
Running same classifiers from Weka Exprorer I get a "== Run
information ==" header. It includes classifier scheme,
relation, number of instances, list of attributes, etc. The
entire section is missing in the output files from command
line. I think I tried all options - both general and
specific to the tree classifiers - and nothing managed to
get that run information displayed in command line.
How do I get that Run Information included with the output?
It would be absolutely essential to documenting the results
from various runs.
This is the command I run:
java -cp weka.jar weka.classifiers.trees.J48 -C 0.25 -M 50
-t "train.arff" > "model-output.txt"
Thanks a lot for the help!
So I have several attributes like "Heart Failure Diagnosis Priority" and
"Pneumonia Diagnosis Priority" that are nominal attributes, but are
represented by a number. When I build a logistic regression model, I get a
term like "[HeartFailureDiagnosisPriority] * 0.12" when I would expect
terms like "[HeartFailureDiagnosisPriority=1] * 0.12" and
"[HeartFailureDiagnosisPriority=2] * 0.24". How can I get WEKA to treat
this like a typical nominal attribute?
Main reason I care is because a Heart Failure Diagnosis Priority of 0 or 2
is less significant than a Heart Failure Diagnosis Priority of 1. I don't
want WEKA to choose a coefficient that is monotonically increasing or
decreasing with the value of Heart Failure Diagnosis...
Thanks in advance!!
I do have 10,000 instance and my accuracy result is 68%. Now I want to know
which of instance were not detected correctly after I provide the list of
features in the system.
Is it possible for me to know, which of the instance is detected
I am using SMO algorithm.