Hi Felipe,

Could you please help me one more time?

I'm extracting word embedding vector using the pre-trained embeddings from w2v.twitter.edinburgh10M.400d.csv.gz, as recommended in the AffectiveTweets documentation:

...
TweetToEmbeddingsFeatureVector weFilter = new TweetToEmbeddingsFeatureVector();
weFilter.setInputFormat(instances);
CSVEmbeddingHandler handler = new CSVEmbeddingHandler();
handler.setEmbeddingsFile(new File("/home/joncarv/test-embeddings/w2v.twitter.edinburgh10M.400d.csv.gz"));
weFilter.setEmbeddingHandler(handler);
instances = Filter.useFilter(instances, weFilter);
...

Is this correct?

Thanks a lot!

On Wed, 19 Sep 2018 at 22:34, Felipe Bravo <felipebravom@gmail.com> wrote:
Yes, you should get the same attributes on both the training and testing sets.
Cheers

On Thu, Sep 20, 2018 at 6:57 AM Jonnathan Carvalho <joncarv@gmail.com> wrote:
Hi Felipe,

So, I can use the TweetToEmbeddingFeatureVector to calculate features for the tweets in the test set independently from the training set...
Is that correct?

Thanks a lot!

Cheers,
Jonnathan

On Mon, 17 Sep 2018 at 00:07, Felipe Bravo <felipebravom@gmail.com> wrote:
Hi Jonnathan,
Word embeddings are usually trained from large copora.  The TweetToEmbeddingFeatureVector will calculate features from pretrained word embeddings in CSV format. Words from the training and testing sets not included in the embedding file will be discarded. The AffectiveTweets package provides embeddings trained from a big corpus of tweets that can be downloaded on the following link: https://github.com/felipebravom/AffectiveTweets/releases/download/1.0.0/w2v.twitter.edinburgh10M.400d.csv.gz

You can also train your own embeddings using the WekaDeepLearning4j package.

Cheers,
Felipe

On Sun, Sep 16, 2018 at 10:49 AM Jonnathan Carvalho <joncarv@gmail.com> wrote:
Hi Felipe,

Thanks a lot for the explanation!!

Considering that when we extract word n-grams from a corpus, the vocabulary is based on the words that appear in the training instances, how does it work when we are using word embeddings?
More specifically, I have one dataset of tweets divided into training and test instances, and I need to extract word embeddings to train a SVM classifier.
How can I achieve this using the TweetToEmbeddingFeatureVector? Any example using Java code?

Cheers!
Jonnathan

On Thu, 23 Aug 2018 at 19:50, Felipe Bravo <felipebravom@gmail.com> wrote:
Hi Jonathan,
Word embeddings project discrete words into high-dimensional dense vectors with the aim of preserving word meaning into the embedding space. More details here: http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
The TweetToEmbeddingFeatureVector creates a sentence-level representation by aggregating the embedding values of the words within a sentence. Aggregation can be done by averaging, adding, or concatenation. The default configuration of the filter uses pre-trained word vectors of 100 dimensions  and averages the word vectors withing a sentence. This is because you are getting 100 attributes (e.g., embedding-0, embedding-1, etc).
You can also train your own word embeddings using the filters provided by the WekaDeepLearning4j package.
I hope this helps.
Cheers,
Felipe

On Thu, Aug 23, 2018 at 3:16 PM Jonnathan Carvalho <joncarv@gmail.com> wrote:
Hi Felipe,

As you have suggested, I used the AffectiveTweets package to get the word embeddings for tweets in Weka, using its default parameters, but I couldn’t understand what the generated dimensions mean (from embedding-0 to embedding-99)...

Do you recommend any reading?

Thanks a lot!

Cheers,
Jonnathan.

On 22 Aug 2018, at 01:16, Felipe Bravo <felipebravom@gmail.com> wrote:

Hi,
Yes you can get a document-level representation from pre-trained embeddings using the AffectiveTweets package (https://github.com/felipebravom/AffectiveTweets) or can even train your own embeddings using the deeplearning package (https://deeplearning.cms.waikato.ac.nz/). 
Cheers,
Felipe

On Wed, Aug 22, 2018 at 11:03 AM Jonnathan Carvalho <joncarv@gmail.com> wrote:
Hi, All!

I’m trying to figure out what word embeddings is...

Is it possible to use the dense feature representation generated by this technique with learning algorithms such as SVM? Or with neural networks only?

Does Weka support word embeddings?

Thanks!
Cheers!
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@list.waikato.ac.nz
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@list.waikato.ac.nz
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@list.waikato.ac.nz
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@list.waikato.ac.nz
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@list.waikato.ac.nz
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@list.waikato.ac.nz
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@list.waikato.ac.nz
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@list.waikato.ac.nz
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)