How was the exampleWordEmbedding example in the text analytics toolbox trained, in detail?

Question

William Smith 2017년 11월 19일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/368006-how-was-the-examplewordembedding-example-in-the-text-analytics-toolbox-trained-in-detail

답변: Christopher Creutzig 2020년 3월 9일

The documentation for readWordEmbedding gives a pre-trained embedding, saying only that it was "derived by analyzing text from Wikipedia".

How was it trained?

Should we consider it a 'high quality' word embedding, better than anything a user could generate without extensive work and CPU time? Or is it a quick and dirty starting point, and we are encouraged to train our own for better performance?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Christopher Creutzig 2020년 3월 9일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/368006-how-was-the-examplewordembedding-example-in-the-text-analytics-toolbox-trained-in-detail#answer_419231

The embedding is rather low-dimensional (50 dimensions) and has a small vocabulary (with 9999 words). It is unlikely to be “high quality” unless your analysis just happens to need precisely this dataset.

For production use, it is much more likely you'll find fastTextWordEmbedding useful, which downloads data from https://www.mathworks.com/matlabcentral/fileexchange/66229-text-analytics-toolbox-model-for-fasttext-english-16-billion-token-word-embedding for you.