Thank you for your help!
I have downloaded both arff files, and they are loaded perfectly. However,
StringToWordVector still won't work, as different instances caused errors
for this filter.
I tried to use Pyscript to replace some symbols like "+" to see whether
this is a cause, the script works in the pyScripting console but it won't
stop executing inside KnowledgeFlow's PyScriptExecutor.
Are you able to make StringToWordVector Filter work for the 'review' column
of amazon_baby dataset?
Does weka has an easy-to-use filter to help replace symbols in string?
It's probably because of all the "" on
"Lamaze Peekaboo, I Love You","we just got this book for our one-year-old
and she loves it. It\'s so nice that she can\'t bite chunks out of it
she can a board book or rip it like a paper one. She
can chew on it, pull
on it, and carry it all over the house! She loves the little flaps that
open and is learning simple words like ""daddy,""
We love it.",5
I don't know how weka csv parsing works but this is likely tripping it up.
Usually there's a setting to indicate that you're doing quoting, but if
don't want to deal with that you can probably
strip most of the "" out of
there since you probably don't need them. To get a feel for how things
work, try loading that one line on its own and mess around with it till
get through all the error messages.
Weka uses a backslash for escaping double quotes within text. In the
past, it couldn't handle the more common technique of doubling up
quotes within cells (used by LibreOffice, Excel, etc).
I loaded the CSV files using the Spreadsheet file viewer in ADAMS and
saved them as ARFF files:
Let me know when you've downloaded them.
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
2016-12-04 21:18 GMT+08:00 <wekalist-request(a)list.waikato.ac.nz>nz>:
Send Wekalist mailing list submissions to
To subscribe or unsubscribe via the World Wide Web, visit
or, via email, send a message with subject or body 'help' to
You can reach the person managing the list at
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wekalist digest..."
1. error on Loading large text csv file to weka (Daniel LIAO)
2. Re: error on Loading large text csv file to weka (Archie Russell)
3. Re: How to install Groovy to have access to Groovy console?
4. Re: error on Loading large text csv file to weka (Peter Reutemann)
5. Can we change the spliting criterion in classifiers (MAROU)
Date: Sun, 4 Dec 2016 08:43:14 +0800
From: Daniel LIAO <pystrategyexplorer(a)gmail.com>
Subject: [Wekalist] error on Loading large text csv file to weka
Content-Type: text/plain; charset="utf-8"
Dear weka people,
I have encountered two problems when loading and filter large files (mainly
1. I can't load amazon_baby.csv
and reported an error message just saying "4 problem encountered on line:
34", I have no idea where went wrong.
2. I am able to load and filter with StringToWordVector (on text attribute)
through Explorer on dataset people_wiki.csv
<https://drive.google.com/open?id=0BwfMX-a2pWKsa3luLUZacHVCcVk>, but it
takes a few minutes to load and process. Though weka completed without any
error message, I didn't get any filtered result. Was it because the file is
too large? I wonder is there a way to quickly load and filter, like doing
it instance by instance in Weka?
thanks in advance,