fastText word embedding support package

조회 수: 1 (최근 30일)
Peter Mayhew
Peter Mayhew 2018년 11월 24일
댓글: Ismat Mohd Sulaiman 2021년 7월 7일
I'm using the Text Analytics Toolbox and the Pretrained fastText word embedding support package. Is it possible for me to add addional words to the pretrained vocabulary?

채택된 답변

Peter Mayhew
Peter Mayhew 2018년 11월 28일
To answer my own question, the following example code shows how to add words to the embedding vocabulary. This requires a new embedding object to be created.
>> emb = fastTextWordEmbedding;
>> vocab = emb.Vocabulary;
>> mat = word2vec(emb, vocab);
>> newvocab = [vocab "New Word 1" "New Word 2"];
>> newmat = [mat; randn(2,300)];
>> newemb = wordEmbedding(newvocab, newmat);
In addition, I have confirmed it is possible to use the fastText pretrained 2 Million words (600 billion tokens) rather than the default 1 Million words (16 billion token) which is provided with the MATLAB fastTextWordEmbedding function.
To do this, replace the "wiki-news-300d-1M.vec.zip" file with the alternative pre-trained word vectors file from https://fasttext.cc/docs/en/english-vectors.html

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Migrate GUIDE Apps에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by