wordcloud

Create word cloud chart from text data

collapse all in page

Syntax

wordcloud(tbl,wordVar,sizeVar)

wordcloud(words,sizeData)

wordcloud(C)

wordcloud(___,Name,Value)

wordcloud(parent,___)

wc = wordcloud(___)

Description

wordcloud(tbl,wordVar,sizeVar) creates a word cloud chart from the table tbl. The variables wordVar and sizeVar in the table specify the words and word sizes respectively.

example

wordcloud(words,sizeData) creates a word cloud chart from elements of words with word sizes specified by SizeData.

example

wordcloud(C) creates a word cloud chart from the unique elements of categorical array C with sizes corresponding to their frequency counts. If you have Text Analytics Toolbox™, then C can be a string array, character vector, or a cell array of character vectors.

example

wordcloud(___,Name,Value) specifies additional WordCloudChart properties using one or more name-value pair arguments.

example

wordcloud(parent,___) creates the word cloud in the figure, panel, or tab specified by parent.

wc = wordcloud(___) returns the WordCloudChart object. Use wc to modify properties of the word cloud after creating it. For a list of properties, see WordCloudChart Properties.

Note

Text Analytics Toolbox extends the functionality of the wordcloud (MATLAB^®) function. It adds support for creating word clouds directly from string arrays, and creating word clouds from bag-of-words models, bag-of-n-gram models, and LDA topics. For the wordcloud (Text Analytics Toolbox) reference page, see wordcloud (Text Analytics Toolbox).

Examples

collapse all

Create Word Cloud from Table

Open Live Script

Load the example data sonnetsTable. The table tbl contains a list of words in the variable Word, and the corresponding frequency counts in the variable Count.

load sonnetsTable
head(tbl)

       Word        Count
    ___________    _____

    {'''tis'  }      1  
    {''Amen'' }      1  
    {''Fair'  }      2  
    {''Gainst'}      1  
    {''Since' }      1  
    {''This'  }      2  
    {''Thou'  }      1  
    {''Thus'  }      1

Plot the table data using wordcloud. Specify the words and corresponding word sizes to be the Word and Count variables respectively.

figure
wordcloud(tbl,'Word','Count');
title("Sonnets Word Cloud")

Figure contains an object of type wordcloud. The chart of type wordcloud has title Sonnets Word Cloud.

Prepare Text Data for Word Clouds

Open Live Script

If you have Text Analytics Toolbox™ installed, then you can create word clouds directly from string arrays. For more information, see wordcloud (Text Analytics Toolbox). If you do not have Text Analytics Toolbox, then you must preprocess the text data manually.

This example shows how to create a word cloud from plain text by reading it into a string array, preprocessing it, and passing it to the wordcloud function.

Read the text from Shakespeare's Sonnets with the fileread function and convert it to string.

sonnets = string(fileread("sonnets.txt"));
extractBefore(sonnets,"II")

ans = 
    "THE SONNETS
     
     by William Shakespeare
     
     
     
     
       I
     
       From fairest creatures we desire increase,
       That thereby beauty's rose might never die,
       But as the riper should by time decease,
       His tender heir might bear his memory:
       But thou, contracted to thine own bright eyes,
       Feed'st thy light's flame with self-substantial fuel,
       Making a famine where abundance lies,
       Thy self thy foe, to thy sweet self too cruel:
       Thou that art now the world's fresh ornament,
       And only herald to the gaudy spring,
       Within thine own bud buriest thy content,
       And tender churl mak'st waste in niggarding:
         Pity the world, or else this glutton be,
         To eat the world's due, by the grave and thee.
     
       "

Split sonnets into a string array whose elements contain individual words. To do this, remove the punctuation characters and join all the string elements into a 1-by-1 string and then split on the space characters. Then, remove words with fewer than five characters and convert the words to lowercase.

punctuationCharacters = ["." "?" "!" "," ";" ":"];
sonnets = replace(sonnets,punctuationCharacters," ");
words = split(join(sonnets));
words(strlength(words)<5) = [];
words = lower(words);
words(1:10)

ans = 10x1 string
    "sonnets"
    "william"
    "shakespeare"
    "fairest"
    "creatures"
    "desire"
    "increase"
    "thereby"
    "beauty's"
    "might"

Convert sonnets to a categorical array and then plot using wordcloud. The function plots the unique elements of C with sizes corresponding to their frequency counts.

C = categorical(words);
figure
wordcloud(C);
title("Sonnets Word Cloud")

Figure contains an object of type wordcloud. The chart of type wordcloud has title Sonnets Word Cloud.

Specify Word Sizes

Open Live Script

Create a word cloud from plain text by reading it into a string array, preprocessing it, and passing it to the wordcloud function.

Read the text from Shakespeare's Sonnets with the fileread function and convert it to string.

sonnets = string(fileread('sonnets.txt'));
extractBefore(sonnets,"II")

ans = 
    "THE SONNETS
     
     by William Shakespeare
     
     
     
     
       I
     
       From fairest creatures we desire increase,
       That thereby beauty's rose might never die,
       But as the riper should by time decease,
       His tender heir might bear his memory:
       But thou, contracted to thine own bright eyes,
       Feed'st thy light's flame with self-substantial fuel,
       Making a famine where abundance lies,
       Thy self thy foe, to thy sweet self too cruel:
       Thou that art now the world's fresh ornament,
       And only herald to the gaudy spring,
       Within thine own bud buriest thy content,
       And tender churl mak'st waste in niggarding:
         Pity the world, or else this glutton be,
         To eat the world's due, by the grave and thee.
     
       "

punctuationCharacters = ["." "?" "!" "," ";" ":"];
sonnets = replace(sonnets,punctuationCharacters," ");
words = split(join(sonnets));
words(strlength(words)<5) = [];
words = lower(words);
words(1:10)

ans = 10×1 string
    "sonnets"
    "william"
    "shakespeare"
    "fairest"
    "creatures"
    "desire"
    "increase"
    "thereby"
    "beauty's"
    "might"

Find the unique words in sonnets and count their frequency. Create a word cloud using the frequency counts as size data.

[numOccurrences,uniqueWords] = histcounts(categorical(words));
figure
wordcloud(uniqueWords,numOccurrences);
title("Sonnets Word Cloud")

Figure contains an object of type wordcloud. The chart of type wordcloud has title Sonnets Word Cloud.

Specify Word Colors

Open Live Script

Load the example data sonnetsTable. The table tbl contains a list of words in the Word variable, and corresponding frequency counts in the Count variable.

load sonnetsTable
head(tbl)

       Word        Count
    ___________    _____

    {'''tis'  }      1  
    {''Amen'' }      1  
    {''Fair'  }      2  
    {''Gainst'}      1  
    {''Since' }      1  
    {''This'  }      2  
    {''Thou'  }      1  
    {''Thus'  }      1

Plot the table data using wordcloud. Specify the words and corresponding word sizes to be the Word and Count variables respectively. To set the word colors to random values, set 'Color' to a random matrix or RGB triplets with one row for each word.

numWords = size(tbl,1);
colors = rand(numWords,3);
figure
wordcloud(tbl,'Word','Count','Color',colors);
title("Sonnets Word Cloud")

Figure contains an object of type wordcloud. The chart of type wordcloud has title Sonnets Word Cloud.

Create Word Cloud Using Text Analytics Toolbox

If you have Text Analytics Toolbox installed, then you can create word clouds directly from string arrays. If you do not have Text Analytics Toolbox, then you must preprocess the text data manually. For an example showing how to create a word cloud without Text Analytics Toolbox, see Prepare Text Data for Word Clouds.

Extract the text from sonnets.txt using extractFileText.

str = extractFileText("sonnets.txt");
extractBefore(str,"II")

ans = 

    "THE SONNETS
     
     by William Shakespeare
     
     
     
     
       I
     
       From fairest creatures we desire increase,
       That thereby beauty's rose might never die,
       But as the riper should by time decease,
       His tender heir might bear his memory:
       But thou, contracted to thine own bright eyes,
       Feed'st thy light's flame with self-substantial fuel,
       Making a famine where abundance lies,
       Thy self thy foe, to thy sweet self too cruel:
       Thou that art now the world's fresh ornament,
       And only herald to the gaudy spring,
       Within thine own bud buriest thy content,
       And tender churl mak'st waste in niggarding:
         Pity the world, or else this glutton be,
         To eat the world's due, by the grave and thee.
     
       "

Display the words from the sonnets in a word cloud.

figure
wordcloud(str);

Input Arguments

collapse all

`tbl` — Input table
table

Input table, with columns specifying the words and word sizes. Specify the words and the corresponding word sizes in the variables given by wordVar and sizeVar input arguments respectively.

Data Types: table

`wordVar` — Table variable for word data
string scalar | character vector | numeric index | logical vector

Table variable for word data, specified as a string scalar, character vector, numeric index, or a logical vector.

`sizeVar` — Table variable for size data
string scalar | character vector | numeric index | logical vector

Table variable for size data, specified as a string scalar, character vector, numeric index, or a logical vector.

`C` — Input categorical data
categorical array

Input categorical data, specified as a categorical array. The function plots each unique element of C with size corresponding to histcounts(C).

Data Types: categorical

`words` — Input words
string vector | cell array of character vectors

Input words, specified as a string vector or cell array of character vectors.

Data Types: string | cell

`sizeData` — Word size data
numeric vector

Word size data, specified as a numeric vector.

`parent` — Parent container
`Figure` object | `Panel` object | `Tab` object | `TiledChartLayout` object | `GridLayout` object

Parent container, specified as a Figure, Panel, Tab, TiledChartLayout, or GridLayout object.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'HighlightColor','red' sets the highlight color to red.

The WordCloudChart properties listed here are only a subset. For a complete list, see WordCloudChart Properties.

`MaxDisplayWords` — Maximum number of words to display
100 (default) | nonnegative integer

Maximum number of words to display, specified as a non-negative integer. The software displays the MaxDisplayWords largest words.

`Color` — Word color
`[0.3804 0.3804 0.3804]` (default) | RGB triplet | character vector containing a color name | matrix

Word color, specified as an RGB triplet, a character vector containing a color name, or an N-by-3 matrix where N is the length of WordData. If Color is a matrix, then each row corresponds to an RGB triplet for the corresponding word in WordData.

RGB triplets and hexadecimal color codes are useful for specifying custom colors.

An RGB triplet is a three-element row vector whose elements specify the intensities of the red, green, and blue components of the color. The intensities must be in the range [0,1]; for example,[0.4 0.6 0.7].
A hexadecimal color code is a character vector or a string scalar that starts with a hash symbol (#) followed by three or six hexadecimal digits, which can range from 0 to F. The values are not case sensitive. Thus, the color codes "#FF8800", "#ff8800", "#F80", and "#f80" are equivalent.

Alternatively, you can specify some common colors by name. This table lists the named color options, the equivalent RGB triplets, and hexadecimal color codes.

Color Name	Short Name	RGB Triplet	Hexadecimal Color Code
`"red"`	`"r"`	`[1 0 0]`	`"#FF0000"`
`"green"`	`"g"`	`[0 1 0]`	`"#00FF00"`
`"blue"`	`"b"`	`[0 0 1]`	`"#0000FF"`
`"cyan"`	`"c"`	`[0 1 1]`	`"#00FFFF"`
`"magenta"`	`"m"`	`[1 0 1]`	`"#FF00FF"`
`"yellow"`	`"y"`	`[1 1 0]`	`"#FFFF00"`
`"black"`	`"k"`	`[0 0 0]`	`"#000000"`
`"white"`	`"w"`	`[1 1 1]`	`"#FFFFFF"`

Here are the RGB triplets and hexadecimal color codes for the default colors MATLAB uses in many types of plots.

RGB Triplet	Hexadecimal Color Code	Appearance
`[0 0.4470 0.7410]`	`"#0072BD"`
`[0.8500 0.3250 0.0980]`	`"#D95319"`
`[0.9290 0.6940 0.1250]`	`"#EDB120"`
`[0.4940 0.1840 0.5560]`	`"#7E2F8E"`
`[0.4660 0.6740 0.1880]`	`"#77AC30"`
`[0.3010 0.7450 0.9330]`	`"#4DBEEE"`
`[0.6350 0.0780 0.1840]`	`"#A2142F"`

Example: 'blue'

Example: [0 0 1]

`HighlightColor` — Word highlight color
`[0.7529 0.2980 0.0431]` (default) | RGB triplet | character vector containing a color name

Word highlight color, specified as an RGB triplet, or a character vector containing a color name. The software highlights the largest words with this color.

RGB triplets and hexadecimal color codes are useful for specifying custom colors.

An RGB triplet is a three-element row vector whose elements specify the intensities of the red, green, and blue components of the color. The intensities must be in the range [0,1]; for example,[0.4 0.6 0.7].
A hexadecimal color code is a character vector or a string scalar that starts with a hash symbol (#) followed by three or six hexadecimal digits, which can range from 0 to F. The values are not case sensitive. Thus, the color codes "#FF8800", "#ff8800", "#F80", and "#f80" are equivalent.

Alternatively, you can specify some common colors by name. This table lists the named color options, the equivalent RGB triplets, and hexadecimal color codes.

Color Name	Short Name	RGB Triplet	Hexadecimal Color Code
`"red"`	`"r"`	`[1 0 0]`	`"#FF0000"`
`"green"`	`"g"`	`[0 1 0]`	`"#00FF00"`
`"blue"`	`"b"`	`[0 0 1]`	`"#0000FF"`
`"cyan"`	`"c"`	`[0 1 1]`	`"#00FFFF"`
`"magenta"`	`"m"`	`[1 0 1]`	`"#FF00FF"`
`"yellow"`	`"y"`	`[1 1 0]`	`"#FFFF00"`
`"black"`	`"k"`	`[0 0 0]`	`"#000000"`
`"white"`	`"w"`	`[1 1 1]`	`"#FFFFFF"`

Here are the RGB triplets and hexadecimal color codes for the default colors MATLAB uses in many types of plots.

RGB Triplet	Hexadecimal Color Code	Appearance
`[0 0.4470 0.7410]`	`"#0072BD"`
`[0.8500 0.3250 0.0980]`	`"#D95319"`
`[0.9290 0.6940 0.1250]`	`"#EDB120"`
`[0.4940 0.1840 0.5560]`	`"#7E2F8E"`
`[0.4660 0.6740 0.1880]`	`"#77AC30"`
`[0.3010 0.7450 0.9330]`	`"#4DBEEE"`
`[0.6350 0.0780 0.1840]`	`"#A2142F"`

Example: 'blue'

Example: [0 0 1]

`Shape` — Shape of word cloud
`'oval'` (default) | `'rectangle'`

Shape of word cloud chart, specified as 'oval' or 'rectangle'.

Example: 'rectangle'

`LayoutNum` — Word placement layout
1 (default) | nonnegative integer

Word placement layout, specified as a nonnegative integer. If you repeatedly call wordcloud with the same inputs, then the word placement layouts will be the same each time. To get different word placement layouts, use different values of LayoutNum.

Output Arguments

collapse all

`wc` — `WordCloudChart` object
`WordCloudChart` object

WordCloudChart object. You can modify the properties of a WordCloudChart after it is created. For more information, see WordCloudChart Properties.

Tips

Text Analytics Toolbox extends the functionality of the wordcloud (MATLAB) function. It adds support for creating word clouds directly from string arrays, and creating word clouds from bag-of-words models, bag-of-n-gram models, and LDA topics. For the wordcloud (Text Analytics Toolbox) reference page, see wordcloud (Text Analytics Toolbox).

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

The wordcloud function supports tall arrays with the following usage notes and limitations:

The syntax wc = wordcloud(str), where str is a string array, character vector, or cell array of character vectors (these inputs require Text Analytics Toolbox) is not supported.
When the words and sizedata inputs are provided as tall arrays, then they are gathered into memory and thus, must fit into memory.

Version History

Introduced in R2017b

wordcloud

Syntax

Description

Examples

Create Word Cloud from Table

Prepare Text Data for Word Clouds

Specify Word Sizes

Specify Word Colors

Create Word Cloud Using Text Analytics Toolbox

Input Arguments

`tbl` — Input table
table

`wordVar` — Table variable for word data
string scalar | character vector | numeric index | logical vector

`sizeVar` — Table variable for size data
string scalar | character vector | numeric index | logical vector

`C` — Input categorical data
categorical array

`words` — Input words
string vector | cell array of character vectors

`sizeData` — Word size data
numeric vector

`parent` — Parent container
`Figure` object | `Panel` object | `Tab` object | `TiledChartLayout` object | `GridLayout` object

Name-Value Arguments

`MaxDisplayWords` — Maximum number of words to display
100 (default) | nonnegative integer

`Color` — Word color
`[0.3804 0.3804 0.3804]` (default) | RGB triplet | character vector containing a color name | matrix

`HighlightColor` — Word highlight color
`[0.7529 0.2980 0.0431]` (default) | RGB triplet | character vector containing a color name

`Shape` — Shape of word cloud
`'oval'` (default) | `'rectangle'`

`LayoutNum` — Word placement layout
1 (default) | nonnegative integer

Output Arguments

`wc` — `WordCloudChart` object
`WordCloudChart` object

Tips

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

Version History

See Also

Topics

wordcloud

Syntax

Description

Examples

Create Word Cloud from Table

Prepare Text Data for Word Clouds

Specify Word Sizes

Specify Word Colors

Create Word Cloud Using Text Analytics Toolbox

Input Arguments

tbl — Input table table

wordVar — Table variable for word data string scalar | character vector | numeric index | logical vector

sizeVar — Table variable for size data string scalar | character vector | numeric index | logical vector

C — Input categorical data categorical array

words — Input words string vector | cell array of character vectors

sizeData — Word size data numeric vector

parent — Parent container Figure object | Panel object | Tab object | TiledChartLayout object | GridLayout object

Name-Value Arguments

MaxDisplayWords — Maximum number of words to display 100 (default) | nonnegative integer

Color — Word color [0.3804 0.3804 0.3804] (default) | RGB triplet | character vector containing a color name | matrix

HighlightColor — Word highlight color [0.7529 0.2980 0.0431] (default) | RGB triplet | character vector containing a color name

Shape — Shape of word cloud 'oval' (default) | 'rectangle'

LayoutNum — Word placement layout 1 (default) | nonnegative integer

Output Arguments

wc — WordCloudChart object WordCloudChart object

Tips

Extended Capabilities

Tall Arrays Calculate with arrays that have more rows than fit in memory.

Version History

See Also

Topics

`tbl` — Input table
table

`wordVar` — Table variable for word data
string scalar | character vector | numeric index | logical vector

`sizeVar` — Table variable for size data
string scalar | character vector | numeric index | logical vector

`C` — Input categorical data
categorical array

`words` — Input words
string vector | cell array of character vectors

`sizeData` — Word size data
numeric vector

`parent` — Parent container
`Figure` object | `Panel` object | `Tab` object | `TiledChartLayout` object | `GridLayout` object

`MaxDisplayWords` — Maximum number of words to display
100 (default) | nonnegative integer

`Color` — Word color
`[0.3804 0.3804 0.3804]` (default) | RGB triplet | character vector containing a color name | matrix

`HighlightColor` — Word highlight color
`[0.7529 0.2980 0.0431]` (default) | RGB triplet | character vector containing a color name

`Shape` — Shape of word cloud
`'oval'` (default) | `'rectangle'`

`LayoutNum` — Word placement layout
1 (default) | nonnegative integer

`wc` — `WordCloudChart` object
`WordCloudChart` object

Tall Arrays
Calculate with arrays that have more rows than fit in memory.