I am getting different result for classifiers in Weka and in Python.
I am interested in ROC-AUC score. Weka is giving me 90% for Random Forest
and Python around 80%.
I understand Weka's ROC score is weighted, but that is huge difference
This is the config for Python
forest = RandomForestClassifier(n_estimators=300, random_state=1,
I have also tried with no parameter set
forest = RandomForestClassifier()
But still no luck...
Any help here?
Sent from: https://weka.8497.n7.nabble.com/
Hi Eibe and Peter
I am using AttributeSelectedClassifier for feature selection. I just want
to see the consistency of feature selection algorithms across different
folds of k-fold CV and have no interest to evaluate the prediction accuracy
based on the selected features.
My question: is there still the role of classifiers (i.e. RF) in feature
selection? In other words, if I select RF as a classifier and select
features and then I use SVM and select features, can I expect different
subset of features selected? Do feature selection algorithms use RMSE etc
as a fitness function to select the features?
Thank you in advance.
I'm a beginner in weka; my research is classification text in Arabic tweets using CNN and RNN deep learning, I already install the package wekadeeplearning4j, so
1- I collect a dataset from Twitter about 10000 tweets in the Arabic language to classification text in multilabel into 4 class (c,r.d,o)
2- I didn't know what option I used in the preprocess tabe especially filters(DL4JSTRING TO WORD2 VECTOR OR AFFECTIVE TWEET TWEET TO EMBEDDING FEATURE VECTOR ) before going to the classifier and what I uploaded embedding file in embeddinghandler for Arabic
3-in classify tabe I choose dl4jmllpclassifier as
3.1 in layer specification I choose 4 layers in orderly
convolution layer(same parameters)
3.2 instance iterater i choose
in its option I choose location of word vector is
output layer (4 class)
THE PROBLEM : WHEN I CHOOSE CNNTEXEMBEDDING THE START OR RUN IS HIDDEN OR NOT ACTIVE
my config copy:weka.classifiers.functions.Dl4jMlpClassifier -S 1 -cache-mode MEMORY -early-stopping "weka.dl4j.earlystopping.EarlyStopping -maxEpochsNoImprovement 0 -valPercentage 0.0" -normalization "Standardize training data" -iterator "weka.dl4j.iterators.instance.sequence.text.cnn.CnnTextEmbeddingInstanceIterator -stopWords \"weka.dl4j.text.stopwords.Dl4jRainbow \" -tokenPreProcessor \"weka.dl4j.text.tokenization.preprocessor.CommonPreProcessor \" -tokenizerFactory \"weka.dl4j.text.tokenization.tokenizer.factory.DefaultTokenizerFactory \" -truncationLength 100 -wordVectorLocation \"D:\\\\RESALAH\\\\DEEP LEARNING\\\\cc.ar.300.bin.gz\" -bs 512" -iteration-listener "weka.dl4j.listener.EpochListener -eval true -n 5" -layer "weka.dl4j.layers.ConvolutionLayer -nFilters 64 -mode Same -cudnnAlgoMode PREFER_FASTEST -rows 5 -columns 5 -paddingColumns 0 -paddingRows 0 -strideColumns 1 -strideRows 1 -nOut 64 -activation \"weka.dl4j.activations.ActivationReLU \" -name \"Convolution layer 1\"" -layer "weka.dl4j.layers.ConvolutionLayer -nFilters 32 -mode Same -cudnnAlgoMode PREFER_FASTEST -rows 3 -columns 3 -paddingColumns 0 -paddingRows 0 -strideColumns 1 -strideRows 1 -nOut 32 -activation \"weka.dl4j.activations.ActivationReLU \" -name \"Convolution layer 2\"" -layer "weka.dl4j.layers.GlobalPoolingLayer -collapseDimensions true -pnorm 2 -poolingType MAX -name \"GlobalPooling layer\"" -layer "weka.dl4j.layers.OutputLayer -lossFn \"weka.dl4j.lossfunctions.LossMCXENT \" -nOut 4 -activation \"weka.dl4j.activations.ActivationSoftmax \" -name \"Output layer 2\"" -logConfig "weka.core.LogConfiguration -append true -dl4jLogLevel WARN -logFile C:\\Users\\Toshiba\\wekafiles\\wekaDeeplearning4j.log -nd4jLogLevel INFO -wekaDl4jLogLevel INFO" -config "weka.dl4j.NeuralNetConfiguration -biasInit 0.0 -biasUpdater \"weka.dl4j.updater.Sgd -lr 0.001 -lrSchedule \\\"weka.dl4j.schedules.ConstantSchedule -scheduleType EPOCH\\\"\" -dist \"weka.dl4j.distribution.Disabled \" -dropout \"weka.dl4j.dropout.Disabled \" -gradientNormalization None -gradNormThreshold 1.0 -l1 NaN -l2 NaN -minimize -algorithm STOCHASTIC_GRADIENT_DESCENT -updater \"weka.dl4j.updater.Adam -beta1MeanDecay 0.9 -beta2VarDecay 0.999 -epsilon 1.0E-8 -lr 0.001 -lrSchedule \\\"weka.dl4j.schedules.ConstantSchedule -scheduleType EPOCH\\\"\" -weightInit XAVIER -weightNoise \"weka.dl4j.weightnoise.Disabled \"" -numEpochs 50 -numGPUs 1 -averagingFrequency 10 -prefetchSize 24 -queueSize 0 -zooModel "weka.dl4j.zoo.CustomNet "
Hello Weka team
I am using a dataset which has an output variable as numeric and hence
could be predicted via regression models (e.g Linear Regression). Now I
want to use Logistic Regression so I need to make this output variable to
nominal, so my question is how can I use it via Weka explorer. I have used
the 'Discretize' filter but it does not work for me.
My aim is to evaluate if the nominal value or numeric value provide better
results. I will use ML models like Linear/ Logistic Regression, SVM, RF etc.
I want to know when we select different seeds (i.e. 1 to 5) and then
select features, we get different subsets of features for different seeds.
However, for some datasets, it gives the same subset of features regardless
of the seed number used. Do you know why this happens? Why with some data,
the different number of seeds matter and for some other data, it does not
affect the results?
Im try using trees.CDT algotithm and receive this error:
Exception in thread "Thread-8" java.lang.IllegalAccessError: tried to access
method weka.classifiers.trees.REPTree$Tree.singleVariance(DDD)D from class
Can anyone help me to understand whats wrong?
Sent from: https://weka.8497.n7.nabble.com/
I have a list of differentially expressed candidates from the proteomics
study of disease vs healthy dataset.
Now I would like to perform a machine learning analysis (similar to lasso
penalized regression) on the differentially expressed candidates to find
out the panel of markers which could classify disease from healthy subjects
with an AUC value.
Could you please help me with the following?
- how to analyse
- any reference example from weka data to make my own matrix to upload to
the weka experiment?
- steps to follow
Hi Peter and Eibe
I was reading an article which mentioned that ' we performed experiments
with the same training sample of a dataset and then repeated the experiment
with a different training sample of the same dataset' using Weka.
My question is if we use the whole dataset and use 10 fold CV, it is data
from the same training sample but what does it mean of using a different
training sample from the same dataset?