I am new to Weka and I wonder if it has build in stop and stem options. Also
is it possible to generate n-gram features in Weka? Or it has to be prepared
by other tools first.
Thanks a lot:)
For the use of data mining, normally datasets been selected by more number
of instances than the number of variables. My personal interview with data
mining practitioner, many have stated the instances should be 10 times
greater than the number of variables used. Some argue that, it depends on
the accuracy of the result desired.
Is there any literature to state the number of variables to the number of
instances for the use of any analysis?(For example: use of simple
statistical tools or Data mining).
Thankyou for your help.
Great, thank you very much.
I obtained the all folds results. However, I can't to see the accuracy in results. How can I see the accuracy for each fold together with the results obtained?
> > How can I see the results for all fols when I use 10 fold
> > cross-validation (induction of J48), for example?
> Use the KnowledgeFlow, e.g., with the following setup:
> --dataSet--> CrossValidationFoldMaker
> --training/test--> J48
> --text--> TextViewer
> > Weka provide only
> > one result, what is this result?
> The result of all 10 test set evaluations together.
> Cheers, Peter
> Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
> http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174
> Wekalist mailing list
How can I see the results for all fols when I use 10 fold cross-validation (induction of J48), for example? Weka provide only one result, what is this result? I need the result for each fold.
MSc. Márcio Porto Basgalupp
USP - University of São Paulo
Home Page: http://www.icmc.usp.br/~marciopb
I have following questions according to WEKA classifiers:
1. What do the numbers in brackets after rules generated by RIDOR refer
to?? 2. Could you give me some moredetails about NNge method. Already I
have found only the basic idea of non - nested nearest naighbourhood
algorithm. Could you give me more detailed description of how it works and
what the output means. I mean especially the relation between exemplars
and hyperrectangles plus singles discussed at the end of the rules output.
Are those just overlaped and non - overlaped instances??
Thank you in advance for your assistance
> Dear all,
> I wonder if there is a discretization method that somehow employs
> the assumption that the attribute follows a normal distribution and
> instead of splitting in equally sized bins tries to split in a number
> of bins (eg. 3 or 5) that mentain the normality of the distribution
> (e.g. based on standard deviations we could create appropriate bins
> where for example the size of the medium bin is larger).
> I have used this method in the past in ad-hoc ways (not through Weka)
> and I would like to implement it in Weka but I don't know if someone
> has something similar or if such a method is formally described
> anywhere and if it would make sense to others rather than my specific
> I would appreciate any feedback.
That discretizatrion method is used in SAX
(http://www.cs.ucr.edu/~eamonn/SAX.htm), for time series. One of the
papers that describe the method is http://www.cs.ucr.edu/~eamonn/SAX.pdf
Juan José Rodríguez, http://pisuerga.inf.ubu.es/juanjo
Lenguajes y Sistemas Informáticos, Universidad de Burgos