Main Content

removeLongWords

Remove long words from documents or bag-of-words model

Description

example

newDocuments = removeLongWords(documents,len) removes words of length len or greater from documents.

example

newBag = removeLongWords(bag,len) removes words of length len or greater from the bagOfWords object bag.

Examples

collapse all

Remove the words with seven or greater characters from a document.

document = tokenizedDocument("An example of a short sentence");
newDocument = removeLongWords(document,7)
newDocument = 
  tokenizedDocument:

   4 tokens: An of a short

Remove the words with seven or greater characters from a bag-of-words model.

documents = tokenizedDocument([ ...
    "an example of a short sentence"
    "a second short sentence"]);
bag = bagOfWords(documents);
newBag = removeLongWords(bag,7)
newBag = 
  bagOfWords with properties:

          Counts: [2x5 double]
      Vocabulary: ["an"    "of"    "a"    "short"    "second"]
        NumWords: 5
    NumDocuments: 2

Input Arguments

collapse all

Input documents, specified as a tokenizedDocument array.

Input bag-of-words model, specified as a bagOfWords object.

Minimum length of words to remove, specified as a positive integer. The function removes words with len or greater characters.

Output Arguments

collapse all

Output documents, returned as a tokenizedDocument array.

Output bag-of-words model, returned as a bagOfWords object.

Version History

Introduced in R2017b