Extract word matrix and context matrix from output of trainWordEmbedding / word2vec
조회 수: 15 (최근 30일)
이전 댓글 표시
When I use trainWordEmbedding on a set of documents to train a word embedding that I can then use word2vec with, I get an object "emb" as output that I can input into word2vec. Using word2vec I then get, for each word, the vectors that I can then further process.
However, I would like to also receive as output the underlying word matrix and context matrix (as well as the value of the loss of the training). Does anyone know how I can access these data?
댓글 수: 1
Christopher Creutzig
2018년 11월 26일
What exactly do you mean by “word matrix” and “context matrix”?
I guess the “context matrix” is what (some) other people call the cooccurrence matrix in the skip-gram model? We do not currently have a way to compute that.
답변 (1개)
Jayanti
2025년 2월 14일 14:21
Hi Daniel,
By word matrix I assume you want the unique words in the document. When you use “trainWordEmbedding” to train a word embedding model on a set of documents, it returns an object called “emb”. This object includes a property named “Vocabulary”, which contains the unique words from the model, stored as a string vector. You can access these unique words using the following code:
emb = trainWordEmbedding(filename);
words = emb.Vocabulary;
By context matrix I assume you mean cooccurrence matrix. However, I couldn't find specific documentation on accessing a co-occurrence matrix directly through the “trainWordEmbedding” or “word2vec”.
Hope this will be helpful!
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Text Analytics Toolbox에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!