We want to run 10x10 cross validation on data containing text. We have
to create the appropriate .arff file, but this places the Class
attribute first in the attribute list (instead of last).
When running a Weka experiment from the command line, one can specify -c
1 (that the class is first). We prefer to use the Experimenter, but can
we specify where the Class attribute will be in the gui? It appears we
Alternatively, can we specify 10x10 in the command line? We appear to
only be able to specify the number of folds, not the number of runs.
I am using Weka Cluster (SimpleKMeans) for document clustering. Each line of the data file is a document and each column is the weight of a term. Would anyone tell me how I could know which doucments does each cluster in result contain? I would use it for further evaluation. I mean, after I devide the doucments to several vectors, I would like to know which documents does a certain cluster contains.
Thank you for your help.
Dear Users Weka,
I am working with classification of documents using Neural Network Atificial.
I would like to know as I do to proceed in the experiments. For example, I have
experiments with several topologias (neurons in the layer would intermediate), but mine
doubts it is in as to alter the aleatory seed.
Do I need to choose an aleatory seed of 10 for example and will I make a simulation for
this seed, after ready with the same training file I do have to alter the aleatory seed and
to place 11? !
And so on? or does he/she have other way to do? please somebody helps me in that step.
Cassiana Fagundes da Silva
Programa Interdisciplinar de Pós-Graduação - PIPCA
Mestranda em Computacao Aplicada - UNISINOS
I am a student of Computation, that is making my final work to obtain the
title of Degree in Computation in the Central University of Venezuela.
I am making an application for the search and customized recommendation of
information in Internet.
In order to obtain the personalización, I know that I need to use
techniques of Mining Data and learning.
I unloaded WEKA, the tutorial of weka and another documentation, but I
dont know that technical she would have to use to solve my doubts.
The application which I am developing for the University, is a finder done
in Java that uses Agents who make the document search in Internet and
after they make recommendations of information to the users.
The application must make search and recommendation of information related
solely to the area of Computation. When a user makes a search must
indicate in that category wishes to find the resource (appear to the user
7 general categories: Books, Publications and Documents, the News,
Hardware, Software, Programming, Forums and Groups).
The first problem that I must solve is the following one: After which the
search agents unload documents, I must verify that those documents are of
the computation area and in addition to the category that the user has
indicated. The information that I have is: the key phrases of found
documents (these are extracted using KEA 2.0) and the category in which
the user made the search. If a document is of computation and in addition
it corresponds with the category, it is given to the user. On the basis of
this: how I could determine if a document belongs to the computation
The second problem that I must solve is the information recommendation.
The tool must be able to determine which is the profile or preferences of
the users (which is the field within the computation that prefers the
user, for example: networks, systems, data bases, artificial intelligence,
among others.) thus to be able to recommend documents to them. The users
count in addition with a library of links where they can keep that links
that consider interesting, soon to visit them again. The information on
which it is counted with respect to the users is: the key words that they
have used when making the searches, the categories in which have made the
searches and the key phrases of the documents that they visit more in
their library. On the basis of this: how I could determine that type of
documents (of which area within the computation) are those that a user
The techniques to use must allow that the effective agents learn
constantly when a document is or not of computation and so that they adapt
to the changes in the preferences of the users.
Really excuse long it for my mail but I need to know as they would be the
data mining techniques that could help me to solve these problems. Also
they excuse, but I dont know write English very well.
Thank you very much and I wait for its quick respuesta.
Please is urgent.
Hi WEKA users
I am a newbie with WEKA.
I am interested in applying Naive Bayes classifier on my dataset, but I found NaiveBayes and SimpleNaiveBayes.
What are the differences between them?
How can I modify them?
i've a problem with weka.I must to clustering on file with 900 istances and 6000 attributes, but i obtain "out of memory" message.
I would like to know what the Weka limits are.
I have planned a tool which interacts with Weka, but I negotiate file of big dimensions.
Yahoo! Mail: 6MB di spazio gratuito, 30MB per i tuoi allegati, l'antivirus, il filtro Anti-spam
> From: Ken Geis [mailto:email@example.com]
> Liaw, Andy wrote:
> >>From: Ken Geis [mailto:firstname.lastname@example.org]
> >>I've narrowed it down to a more specific question. Does
> >>anyone know of
> >>an algorithm for supervised discretization of an attribute
> where the
> >>class is continuous?
> > As in the construction of a classification or regression tree?
> > Andy
> Definitely not as in a classification tree. That implies a discrete
> class. More like the process at each level of a regression tree, but
> most regression tree algorithms involve binary splits. I
> want to find
> the optimal (probably error-based) n-way split of an attribute with
> respect to a continuous class.
If you grow a regression tree using one continuous attribute at a time, that
ought to do it. Another alternative I can think of is using Haar wavelets,
but that's not available in Weka, I guess.
> From: Ken Geis [mailto:email@example.com]
> I asked a general question earlier today. If there have been any
> responses, I haven't gotten them yet, as I'm in digest mode.
> I've narrowed it down to a more specific question. Does
> anyone know of
> an algorithm for supervised discretization of an attribute where the
> class is continuous?
As in the construction of a classification or regression tree?
> Wekalist mailing list
> https://list.scms.waikato.ac.nz/mailman/listin> fo/wekalist
I asked a general question earlier today. If there have been any
responses, I haven't gotten them yet, as I'm in digest mode.
I've narrowed it down to a more specific question. Does anyone know of
an algorithm for supervised discretization of an attribute where the
class is continuous?
I provided the quarterly sales figure of a company and the industry group's economic index for past five years into a training file. In test file, I gave the next four quarter's industry index anticipated and gave ? for sales figure and ran a prediction. Can I assume the data returned by prediction as anticipated sales of the company for the next four quarters?