What would be the common method of predicting the values of a 'class' based
on classification. I am using two methods right now:
i) I include the instances with an unknown 'class' value in the dataset; I
then simply use the dataset as training set and it will provide predicted
values for 'class'
ii) I split the dataset into a separate training set (with all values known,
including 'class') and a test set (with all values known, except 'class')
and use the 'supplied test set' option to get the predicted values for
I noticed that these two methods will give very different predictions; but
which method would be recommended?
Sent from: http://weka.8497.n7.nabble.com/
I’m using Weka in my undergraduate thesis. I have a short question that i hope you can answer for me.
In the explorer I’m using the StringToWordVector filter to split my sentences into words but i would also like to use the Snowball stemmer. My sentences are in Swedish and i can’t find the option where you change the language for the Snowball algorithm, can anyone explain this for me?
I have created an SMOreg model using a training set of data. The list of attributes is large (2400 approx). I have a second set of data, with target values, that I want to use to optimise my model. The metric that I am interested in is the correlation coefficient. Using WEKA I am not fully clear on how to use the second data set in order to improve the performance of the original model. I would be grateful if someone could outline the steps that I should follow.
In addition, should I look at attribute selection before optimisation, given the large number of attributes?
Tá an t-eolas atá le fáil sa ríomhphost seo faoi iontaoibh agus tá sé ceaptha le haghaidh aird an fhaighteora bheartaithe/na bhfaighteoirí beartaithe amháin. Más rud é go bhfuair tú an ríomhphost seo go hearráideach, ná húsáid agus ná tarchuir é ar mhaithe le haon chuspóir, le do thoil; ina áit sin cuir ar an eolas muid láithreach agus scrios gach cóip den ríomhphost seo ó do chóra(i)s ríomhaireachta. Ach amháin sa chás gur comhaontaíodh a leithéid go sonrach ag ár n-ionadaí údaraithe, is le húdar an ríomhphoist amháin na tuairimí a chuirtear in iúl ann, agus ní léiríonn siad tuairim ná ní chuireann siad ceangal ar aon chaoi eile ar Institiúid Teicneolaíochta Bhaile Átha Luain. Déan teagmháil le administrator(a)ait.ie nó cuir glao ar 090 6468000. The information contained in this email is confidential and is designated solely for the attention of the intended recipient(s). If you have received this email in error, please do not use or transmit it for any purpose but rather notify us immediately and delete all copies of this email from your computer system(s). Unless otherwise specifically agreed by our authorised representative, the views expressed in this email are those of the author only and shall not represent the view of or otherwise bind Athlone Institute of Technology. Contact administrator(a)ait.ie or telephone 090 6468000.
I am building an MLP in my Java prototype which takes quite some time to complete (a few hours).
I have not found a single way to retrieve the current epoch number during the classifier building process.
The only way I have identified to check the "progress" of the training is by setting the GUI to true, because we have that information in the GUI.
However I don't want that GUI to appear, since it requires user input and is generally unnecessary for me.
Is there really no way to retrieve the current epoch number from a training MLP?
(More generally, is there no support to check the progress of any classifier?)
Thank you for your help,
I am working on a supervised text classification and I am facing some hard
time to figure what math field should I search in order to read/understand
those first results I got (on image attached) after using the
And besides of that I would like to plot a basic graph to see those
attributes distribution before apply any classification algorithm and after
applying it as well.
Enviado via UCSMail.
I installed the scatterPlot3D package and also Java 3D 1.5.1. However, when
I try to visualize different attributes from my data set in a 3D scatter
plot, nothing happens, both in Explorer and in KnowledgeFlow. What am I
missing? Should I do some preprocessing before visualizing 3D? (I have a
Windows 7 operating system, 64 bit, 6 Gb of RAM).
I was involved in a project that uses weka. We had to evaluate multiple
classifiers on big files. While I was waiting for the evaluation to
finish, i dug through the code of weka, searching for a way to
parallelize the process. This has led me to a few ideas concerning a
restructuring of class weka.classifiers.evaluation.Evaluation. Is there
interest in these proposals? I would be willing to contribute to these
proposals in the form of discussion and also code if they get accepted.
If the interest is given I would be happy to formulate them further and
Have a good day,
I am new to data mining. I am a Junior in college and I am working with real
data from a grocery store. I would like to run the apriori algorithm on ~75k
transactions containing ~12k different items. I have the massive text file
that contains the VERY sparse matrix of each transaction with each item as a
seperate attribute. I need to use all cores of our super computer to analyse
the data. However, weka server only seems to allow the use of classifiers
for multi-core processing. is there a way to run run the association
algorithm in weka server, or using all cores on our supers to perform this
Sent from: http://weka.8497.n7.nabble.com/
My project requires forecasting inventory volume of 2000 products in 12 month period. We have 5 years historical data of those products for training the model. The entire process and model have been built in knowledge flow. In order to predict volume for each product, I have created separate input files for all products and used parameterizing a job template to control the flow and time series forecasting jobs to get predicted volume. For small sample testing, it went through well. But I got problem to insert all 2000 files settings into Datagrid component, it looks like we could only insert instances(file location + name) manually. I am wondering if we could directly load data into Datagrid or simply copy and paste all instances. Hope someone could help me on this.