removeInfrequentWords

Remove words with low counts from bag-of-words model

collapse all in page

Syntax

newBag = removeInfrequentWords(bag,count)

newBag = removeInfrequentWords(bag,count,'IgnoreCase',true)

Description

newBag = removeInfrequentWords(bag,count) removes the words that appear at most count times in total from the bag-of-words model bag. The function, by default, is case sensitive.

newBag = removeInfrequentWords(bag,count,'IgnoreCase',true) removes the words that appear at most count times in total ignoring case. If words differ only by case, then the corresponding counts are merged.

Examples

Remove Infrequent Words

Open Live Script

Remove the words that appear two times or fewer from a bag-of-words model.

Create a bag-of-words model from an array of tokenized documents.

documents = tokenizedDocument([
    "an example of a short sentence"
    "a second short sentence"
    "another example"
    "a short example"]);
bag = bagOfWords(documents)

bag = 
  bagOfWords with properties:

          Counts: [4x8 double]
      Vocabulary: ["an"    "example"    "of"    "a"    "short"    "sentence"    "second"    "another"]
        NumWords: 8
    NumDocuments: 4

Remove the words that appear two times or fewer from the bag-of-words model.

count = 2;
newBag = removeInfrequentWords(bag,count)

newBag = 
  bagOfWords with properties:

          Counts: [4x3 double]
      Vocabulary: ["example"    "a"    "short"]
        NumWords: 3
    NumDocuments: 4

Input Arguments

`bag` — Input bag-of-words model
`bagOfWords` object

Input bag-of-words model, specified as a bagOfWords object.

`count` — Count threshold to remove words
positive integer

Count threshold to remove words, specified as a positive integer. The function removes the words that appear count times in total or fewer.

Version History

Introduced in R2017b

See Also

bagOfWords | bagOfNgrams | removeInfrequentNgrams | removeWords | removeEmptyDocuments | topkwords | tfidf | tokenizedDocument

Topics

How useful was this information?

Unrated 1 star 2 stars 3 stars 4 stars 5 stars