Extract word matrix and context matrix from output of trainWordEmbedding / word2vec

조회 수: 15 (최근 30일)
Daniel Ringel
Daniel Ringel 2018년 7월 13일
답변: Jayanti 2025년 2월 14일 14:21
When I use trainWordEmbedding on a set of documents to train a word embedding that I can then use word2vec with, I get an object "emb" as output that I can input into word2vec. Using word2vec I then get, for each word, the vectors that I can then further process.
However, I would like to also receive as output the underlying word matrix and context matrix (as well as the value of the loss of the training). Does anyone know how I can access these data?
  댓글 수: 1
Christopher Creutzig
Christopher Creutzig 2018년 11월 26일
What exactly do you mean by “word matrix” and “context matrix”?
I guess the “context matrix” is what (some) other people call the cooccurrence matrix in the skip-gram model? We do not currently have a way to compute that.

댓글을 달려면 로그인하십시오.

답변 (1개)

Jayanti
Jayanti 2025년 2월 14일 14:21
Hi Daniel,
By word matrix I assume you want the unique words in the document. When you use “trainWordEmbedding” to train a word embedding model on a set of documents, it returns an object called “emb”. This object includes a property named “Vocabulary”, which contains the unique words from the model, stored as a string vector. You can access these unique words using the following code:
emb = trainWordEmbedding(filename);
words = emb.Vocabulary;
By context matrix I assume you mean cooccurrence matrix. However, I couldn't find specific documentation on accessing a co-occurrence matrix directly through the “trainWordEmbedding” or “word2vec”.
Hope this will be helpful!

카테고리

Help CenterFile Exchange에서 Text Analytics Toolbox에 대해 자세히 알아보기

제품


릴리스

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by