readW2Vbin - MATLAB utility to read binary word2vec embedding model file
이 제출물을 팔로우합니다
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다
Use `readW2Vbin` to read a pre-trained word2vec word embedding model in the binary format. It assumes that the file is written in the following format.
- The data before the first `0x20` (space) are ascii characters representing the number of vocabularies of the model , while the data between the first `0x20` and the first `0x10` (newline) represent the dimension of the word vector. (e.g.,`[ 51 48 48 48 48 48 48 32 51 48 48 10] ` means 3 milion words embedded into 300 dimensions. )
- The main body, which consists of sequence of word-vector pairs, begins right after the newline character. One word-vector pair consists of a sequence of bytes that represents a word, space (0x20), and a sequence of binary data that represents the embedded vector corresponding to the word in single precision (32bit) format. The length of the vector data is 4bytes times number of dimensions (e.g., 1200 bytes for 300 dimension).
This function was tested with the "GoogleNews-vectors-negative300.bin" from the word2vec web (https://code.google.com/archive/p/word2vec/). It took about a minute to read the 3.5GB file.
인용 양식
Toru Ikegami (2026). matlab_word2vec_binary_reader (https://github.com/mathworks/matlab_word2vec_binary_reader/releases/tag/v1.2), GitHub. 검색 날짜: .
일반 정보
- 버전 1.2 (6.15 KB)
-
GitHub에서 라이선스 보기
MATLAB 릴리스 호환 정보
- R2019b에서 R2020a까지의 릴리스와 호환
플랫폼 호환성
- Windows
- macOS
- Linux
| 버전 | 퍼블리시됨 | 릴리스 정보 | Action |
|---|---|---|---|
| 1.2 | See release notes for this release on GitHub: https://github.com/mathworks/matlab_word2vec_binary_reader/releases/tag/v1.2 |
||
| 1.1 | See release notes for this release on GitHub: https://github.com/mathworks/matlab_word2vec_binary_reader/releases/tag/v1.1 |
||
| 1.0 |
