I am using Java & Weka to try to find out the
association rules for a set of keywords (35,000+
keywords) in a large text (around a million
I am doing teh following:
Algorithm used : Apriori
Number of keywords = N
Number of sentences in text = K
1. Create N Attributes - each attribute corresponds to
one unique keyword - attribute name=keyword, attribute
value=0 or 1 .
2. Create K sparse instances - each instance
represents a single sentence - each instance contains
N number of attributes (representing the keywords),
with value 1 if that keyword is present in that
3. Run buildAssociations for Apriori.
I have following concerns :
- Am I doing this correctly? Is there any other
efficient way to do this?
- I am getting out of heap space error for
buildAssociations when I use all keywords and all
sentences. How can I resolve this ? I am already using
the -Xmx2024m option.
Do I need to divide the text into smaller groups of
sentences and then run the code for these seperately ?
If so, any guidelines to combine the Association rules
If I still get out of heap space error, is there any
way to divide the keywords(attributes) into sets of
manageable size and then run buildAssociations ?
- Is Apriori the right algorithm to use for this
- If I have 2 sets of keywords and I want to find the
associations of keywords in set1 with keywords in set2
(using a large text), is there any way to do this ?
Any help is appreciated. Thank you in advance.
Looking for last minute shopping deals?
Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping
I would like to know if GridSearch (with LIBSVM and 10-fold CV as a test option) implements the following algorithm:
1. Split the dataset into 90% train and 10% test set.
2. Split the train set further into 10 folds (10 pairs of train/test sets).
3. Try a (c,g) pair in each of the 10 pairs of train/test sets created in step (2) and calculate accuracies (1 accuracy for each pair, so 10 accuracies in total).
4. Find the average of the 10 accuracies found in step (3).
5. Repeat steps (3) and (4) for all (c,g) pairs.
6. Pick (c,g) pair yielding the highest average accuracy.
7. Train the 90% set created in step (1) by using (c,g) pair in step (6).
8. Find the testing accuracy for 10% set created in step (1).
9. Repeat steps (1) through (8) 10 times.
10. Display the average accuracy.
Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.
I wonder which Weka algorithm could perform a search of local maxima
in an spatial grid, in other words, having the coordinates of a map
(x,y,z... up to 6 coordinates) and then the altitude/height at each
point (h) (in total 7 numerical variables) I would like to localise
the peaks (local maxima) of that topological surface. Which Weka
algorithm can perform this?
Thanks a lot,
Josep Maria Campanera Alsina
Investigador Juan de la Cierva
Departament de Fisicoquímica
Facultat de Farmàcia
Avgda Joan XXIII, s/n
08028 Barcelona · Catalonia · Spain
Tel: +34 93 4035988
Fax: +34 93 4035987
I am using Weka in a process used to create matrices for input values and need to get the rules obtained by JRip or PART in an if-then format that can be added into a SAS program (or, if necessary, another statistical software package might be possible to use). The JRip program produces rules such as:
(Var1 < 1) and (Var2 = hello) => class = bye (5.0/1.0)
What I need is a convertor that will change the format to:
If ((Var1 < 1) and (Var2 = hello)) then class = bye;
Else if ((Var1 = 1) and (Var2 = hello)) then class = bye;
Else if ....
or something similar.
Are there any previously written programs that can be used to convert this output? Does anyone have an idea about how to make this easier (other then using "replace all" in Microsoft Word and just soldiering through it)?
I am really sorry to bother you guys again.:( But here I ran into
some trouble again.
When I run NaiveBayesSimple classifier like this:
java -Xmx1024m weka.classifiers.bayes.NaiveBayesSimple -l
trained.model -T test.arff -p 0 -c 1
and I got this error:
java.lang.IllegalArgumentException: Can't normalize array. Sum is zero.
I wonder what shall I do. Shall I use another NB classifier? Or shall
I set some parameters?
The attributes in the arff file are binary.
Lainaus perdana <arie.perdana(a)gmail.com>:
> Dear All,
> I got project, where my project have to understand about NaiveBayes and
> BayesNet in weka experiment tool, how the weka analysis the data, and
> execute the results in NaiveBayes and BayesNet. I need help, if anyone
> documentation tutorial description of the NaiveBayes and BayesNet for
> 3.5.7. your help is really apprciated.
> I download documentation in weka website for BayesNet package 3.5.7 but
> actually is for bayesNet 3.5.6. Please help me. thanks
Hi. These basic classifiers do not change over versions. Description of
Naive Bayes in all its simplicity can be found from Wikipedia, and there is
a paper by Bouckaert et al. describing the implementation of the Bayes
> View this message in context:
> Sent from the WEKA mailing list archive at Nabble.com.
yst. terv | Best wishes
I got project, where my project have to understand about NaiveBayes and
BayesNet in weka experiment tool, how the weka analysis the data, and
execute the results in NaiveBayes and BayesNet. I need help, if anyone have
documentation tutorial description of the NaiveBayes and BayesNet for Weka
3.5.7. your help is really apprciated.
I download documentation in weka website for BayesNet package 3.5.7 but
actually is for bayesNet 3.5.6. Please help me. thanks
View this message in context: http://www.nabble.com/Documentation-%22Description-of-the-naivebayes-and-ba…
Sent from the WEKA mailing list archive at Nabble.com.