wordCloudCounts

Count words for word cloud creation

Syntax

T = wordCloudCounts(str)

Description

T = wordCloudCounts(str) tokenizes and preprocesses the text in str for word cloud creation and returns a table T of words and frequency counts. The function supports English, Japanese, German, and Korean text.

example

Examples

collapse all

Word Cloud Frequency Counts

Open Live Script

Extract the text from sonnets.txt using extractFileText.

str = extractFileText("sonnets.txt");

View the first sonnet.

i = strfind(str,"I");
ii = strfind(str,"II");
start = i(1);
fin = ii(1);
extractBetween(str,start,fin-1)

ans = 
    "I
     
       From fairest creatures we desire increase,
       That thereby beauty's rose might never die,
       But as the riper should by time decease,
       His tender heir might bear his memory:
       But thou, contracted to thine own bright eyes,
       Feed'st thy light's flame with self-substantial fuel,
       Making a famine where abundance lies,
       Thy self thy foe, to thy sweet self too cruel:
       Thou that art now the world's fresh ornament,
       And only herald to the gaudy spring,
       Within thine own bud buriest thy content,
       And tender churl mak'st waste in niggarding:
         Pity the world, or else this glutton be,
         To eat the world's due, by the grave and thee.
     
       "

Tokenize and preprocess the sonnets text and create a table of word frequency counts.

T = wordCloudCounts(str);
head(T)

     Word     Count
    ______    _____

    "thy"      281 
    "thou"     235 
    "love"     188 
    "thee"     162 
    "eyes"      90 
    "doth"      88 
    "make"      63 
    "mine"      63

Input Arguments

collapse all

`str` — Input text
string array | character vector | cell array of character vectors

Input text, specified as a string array, character vector, or cell array of character vectors.

For string input, the wordcloud and wordCloudCounts functions use English, Japanese, German, and Korean tokenization, stop word removal, and word normalization.

Example: ["an example of a short document";"a second short document"]

Data Types: string | char | cell

Output Arguments

collapse all

`T` — Table of word counts
table

Table of words counts sorted in order of importance. The table has columns:

`Word`	String scalar of the word.
`Count`	The number of times the word appears in the documents. The function groups the counts of words that differ only by case or have a common stem according to `normalizeWords`. For example, the function groups the counts for "walk", "Walking", "walking", and "walks".

More About

collapse all

Language Considerations

For string input, the wordcloud and wordCloudCounts functions use English, Japanese, German, and Korean tokenization, stop word removal, and word normalization.

Version History

Introduced in R2017b

wordCloudCounts

Syntax

Description

Examples

Word Cloud Frequency Counts

Input Arguments

`str` — Input text
string array | character vector | cell array of character vectors

Output Arguments

`T` — Table of word counts
table

More About

Language Considerations

Version History

See Also

Topics

wordCloudCounts

Syntax

Description

Examples

Word Cloud Frequency Counts

Input Arguments

str — Input text string array | character vector | cell array of character vectors

Output Arguments

T — Table of word counts table

More About

Language Considerations

Version History

See Also

Topics

`str` — Input text
string array | character vector | cell array of character vectors

`T` — Table of word counts
table