I am currently performing experiments on text classification using Weka,
and I am ionterested on using Stacked Generalization applied to an
heuristic classifier I have bilt and some other classification schemas. I
have read in the papers suggested by the Weka book and documentation
(including the paper by Ting and Witten on "SG: when does it work?") that
the level-1 generalizer would optimally be a linear model. Among the Weka
classifiers able to induce linear models, which ones would you suggest as
level-1 learners for Stacking?
Thank you and best regards
Jose Maria Gomez Hidalgo
Departamento de Inteligencia Artificial
Universidad Europea de Madrid - CEES
28670 - Villaviciosa de Odon - MADRID Tfno: +34 91 664 78 00 Ext. 670
e-mail: jmgomez(a)dinar.esi.uem.es WWW: http://www.esi.uem.es/~jmgomez/
In case anyone is interested, I have modified the genetic algorithm
attribute selector so that you can limit the number of attributes it
selects. This is helpful when using the GA on datasets with very large
numbers of attributes such as micro-array gene expression experiments and
makes it much more feasible to use the GA as a "wrapper" feature selector
for time consuming classifiers.
If anyone would like to try it out, email me and I'll send you the code.
I'd welcome any suggestions for rewriting it in a more efficient and
aesthetically pleasing manner, but it does do the job...
I have a training sequence with around 160,000
vectors of dimension 40 and I am running my
first simulations on a PC with 160MB of RAM.
I guess it's slow mainly due to memory swap and
I am considering modifying the code such that it
sequentially reads the training data. Do you think
it would speed up things? Any suggestions besides
buying more RAM :) ?
Is there any way to obtain the domain (attribute) values of a numeric
attribute? It only seems possible for nominal and string attributes
(Attribute object extracted form Instances). Sorry about my ignorance...
Many thanks for your help!
All the best
I don't want to re-implement the wheel. I am looking for a Java class
collection, which allows me to manipulate data sets and to extract
information from them.
- Data set manipulation: Fed in a comma-delimited data set and get out
20 data sets (10 fold cross-validation with stratification).
- Data set manipulation: categorize attribute X
- Data set related information: extract from data set: number of rows,
number of attributes, attributes domain values etc.
It would be really nice to be able to use already existing classes to
shorten the implementation time. Many thanks for your help.
All the best
Is anyone looking into using wavelets? Just came across "Statistical
Modeling by Wavelets" by Brani Vidakovic, John Wiley & Sons. They look
pretty promising although the multivariate cases left me spinning :)