Main Content

writeWordEmbedding

Write word embedding file

Description

writeWordEmbedding(emb,filename) writes the word embedding emb to the file filename. The function writes the vocabulary in UTF-8 in word2vec text format.

example

Examples

collapse all

Train a word embedding and write it to a text file.

Load the example data. The file sonnetsPreprocessed.txt contains preprocessed versions of Shakespeare's sonnets. The file contains one sonnet per line, with words separated by a space. Extract the text from sonnetsPreprocessed.txt, split the text into documents at newline characters, and then tokenize the documents.

filename = "sonnetsPreprocessed.txt";
str = extractFileText(filename);
textData = split(str,newline);
documents = tokenizedDocument(textData);

Train a word embedding using trainWordEmbedding.

emb = trainWordEmbedding(documents)
Training: 100% Loss: 2.72515  Remaining time: 0 hours 0 minutes.
emb = 
  wordEmbedding with properties:

     Dimension: 100
    Vocabulary: ["thy"    "thou"    "love"    "thee"    "doth"    "mine"    "shall"    "eyes"    "sweet"    "time"    "nor"    "beauty"    "yet"    "art"    "heart"    "o"    "thine"    "hath"    "fair"    "make"    "still"    ...    ] (1x401 string)

Write the word embedding to a text file.

filename = "exampleSonnetsEmbedding.vec";
writeWordEmbedding(emb,filename)

Read the word embedding file using readWordEmbedding.

emb = readWordEmbedding(filename)
emb = 
  wordEmbedding with properties:

     Dimension: 100
    Vocabulary: ["thy"    "thou"    "love"    "thee"    "doth"    "mine"    "shall"    "eyes"    "sweet"    "time"    "nor"    "beauty"    "yet"    "art"    "heart"    "o"    "thine"    "hath"    "fair"    "make"    "still"    ...    ] (1x401 string)

Input Arguments

collapse all

Input word embedding, specified as a wordEmbedding object.

Name of the file, specified as a string scalar, character vector, or a 1-by-1 cell array containing a character vector.

Data Types: string | char | cell

Version History

Introduced in R2017b