Main Content

Access Data Using Categorical Arrays

Select Data By Category

Selecting data based on its values is often useful. This type of data selection can involve creating a logical vector based on values in one variable, and then using that logical vector to select a subset of values in other variables. You can create a logical vector for selecting data by finding values in a numeric array that fall within a certain range. Additionally, you can create the logical vector by finding specific discrete values. When using categorical arrays, you can easily:

  • Select elements from particular categories. For categorical arrays, use the logical operators == or ~= to select data that is in, or not in, a particular category. To select data in a particular group of categories, use the ismember function.

    For ordinal categorical arrays, use inequalities >, >=, <, or <= to find data in categories above or below a particular category.

  • Delete data that is in a particular category. Use logical operators to include or exclude data from particular categories.

  • Find elements that are not in a defined category. Categorical arrays indicate which elements do not belong to a defined category by <undefined>. Use the isundefined function to find observations without a defined value.

Common Ways to Access Data Using Categorical Arrays

This example shows how to index and search using categorical arrays. You can access data using categorical arrays stored within a table in a similar manner.

Load Sample Data

Load data about 100 patients from the sample patients.mat MAT-file.

load patients.mat
whos
  Name                            Size            Bytes  Class      Attributes

  Age                           100x1               800  double               
  Diastolic                     100x1               800  double               
  Gender                        100x1             13012  cell                 
  Height                        100x1               800  double               
  LastName                      100x1             13216  cell                 
  Location                      100x1             15808  cell                 
  SelfAssessedHealthStatus      100x1             13140  cell                 
  Smoker                        100x1               100  logical              
  Systolic                      100x1               800  double               
  Weight                        100x1               800  double               

Create Categorical Arrays

The arrays Location and SelfAssessedHealthStatus contain data that belong in categories. Each array contains text taken from a small set of unique values (indicating three locations and four health statuses respectively). To convert Location and SelfAssessedHealthStatus to categorical arrays, use the categorical function. On the other hand, the array LastName has a list of last names that are not categories. So, convert LastName to a string array using the string function.

Location = categorical(Location);
SelfAssessedHealthStatus = categorical(SelfAssessedHealthStatus);
LastName = string(LastName);

Search for Members of a Single Category

For categorical arrays, you can use the logical operators == and ~= to find the data that is in, or not in, a particular category.

Determine if there are any patients observed at the location, Rampart General Hospital.

any(Location == "Rampart General Hospital")
ans = logical
   0

There are no patients observed at Rampart General Hospital.

Search for Members of a Group of Categories

You can use ismember to find data in a particular group of categories. For example, call ismember using Location as input data. Create a logical vector that identifies patients observed at either County General Hospital or VA Hospital.

Location
Location = 100x1 categorical
     County General Hospital 
     VA Hospital 
     St. Mary's Medical Center 
     VA Hospital 
     County General Hospital 
     St. Mary's Medical Center 
     VA Hospital 
     VA Hospital 
     St. Mary's Medical Center 
     County General Hospital 
     County General Hospital 
     St. Mary's Medical Center 
     VA Hospital 
     VA Hospital 
     St. Mary's Medical Center 
     VA Hospital 
     St. Mary's Medical Center 
     VA Hospital 
     County General Hospital 
     County General Hospital 
     VA Hospital 
     VA Hospital 
     VA Hospital 
     County General Hospital 
     County General Hospital 
     VA Hospital 
     VA Hospital 
     County General Hospital 
     County General Hospital 
     County General Hospital 
      ⋮

VA_CountyGenIndex = ...
    ismember(Location,["County General Hospital","VA Hospital"])
VA_CountyGenIndex = 100x1 logical array

   1
   1
   0
   1
   1
   0
   1
   1
   0
   1
      ⋮

VA_CountyGenIndex is a 100-by-1 logical array containing logical true (1) for each element in Location that is a member of the categories County General Hospital or VA Hospital. The output, VA_CountyGenIndex contains 76 nonzero elements.

Use the logical vector, VA_CountyGenIndex to select the LastName of the patients observed at either County General Hospital or VA Hospital.

VA_CountyGenPatients = LastName(VA_CountyGenIndex)
VA_CountyGenPatients = 76x1 string
    "Smith"
    "Johnson"
    "Jones"
    "Brown"
    "Miller"
    "Wilson"
    "Taylor"
    "Anderson"
    "Jackson"
    "White"
    "Martin"
    "Garcia"
    "Martinez"
    "Robinson"
    "Clark"
    "Rodriguez"
    "Lewis"
    "Lee"
    "Walker"
    "Hall"
    "Allen"
    "Young"
    "Hernandez"
    "King"
    "Wright"
    "Lopez"
    "Green"
    "Adams"
    "Baker"
    "Mitchell"
      ⋮

Select Elements in a Particular Category to Plot

Use the summary function to print a summary containing the category names and the number of elements in each category.

summary(Location)
Location: 100x1 categorical

     County General Hospital       39 
     St. Mary's Medical Center      24 
     VA Hospital                   37 
     <undefined>                    0 

Location is a 100-by-1 categorical array with three categories. County General Hospital occurs in 39 elements, St. Mary's Medical Center in 24 elements, and VA Hospital in 37 elements.

Use the summary function to print a summary of SelfAssessedHealthStatus.

summary(SelfAssessedHealthStatus)
SelfAssessedHealthStatus: 100x1 categorical

     Excellent        34 
     Fair             15 
     Good             40 
     Poor             11 
     <undefined>       0 

SelfAssessedHealthStatus is a 100-by-1 categorical array with four categories.

Use logical operator == to access the ages of patients who assess their own health status as Good. Then plot a histogram of this data.

figure()
histogram(Age(SelfAssessedHealthStatus == "Good"))
title("Ages of Patients with Good Health Status")

Figure contains an axes object. The axes object with title Ages of Patients with Good Health Status contains an object of type histogram.

histogram(Age(SelfAssessedHealthStatus == "Good")) plots the age data for the 40 patients who reported Good as their health status.

Delete Data from a Particular Category

You can use logical operators to include or exclude data from particular categories. Delete all patients observed at VA Hospital from the workspace variables, Age and Location.

Age = Age(Location ~= "VA Hospital");
Location = Location(Location ~= "VA Hospital")
Location = 63x1 categorical
     County General Hospital 
     St. Mary's Medical Center 
     County General Hospital 
     St. Mary's Medical Center 
     St. Mary's Medical Center 
     County General Hospital 
     County General Hospital 
     St. Mary's Medical Center 
     St. Mary's Medical Center 
     St. Mary's Medical Center 
     County General Hospital 
     County General Hospital 
     County General Hospital 
     County General Hospital 
     County General Hospital 
     County General Hospital 
     County General Hospital 
     St. Mary's Medical Center 
     St. Mary's Medical Center 
     County General Hospital 
     St. Mary's Medical Center 
     St. Mary's Medical Center 
     St. Mary's Medical Center 
     County General Hospital 
     County General Hospital 
     County General Hospital 
     County General Hospital 
     County General Hospital 
     County General Hospital 
     St. Mary's Medical Center 
      ⋮

Now, Age is a 63-by-1 numeric array, and Location is a 63-by-1 categorical array.

List the categories of Location, as well as the number of elements in each category.

summary(Location)
Location: 63x1 categorical

     County General Hospital       39 
     St. Mary's Medical Center      24 
     VA Hospital                    0 
     <undefined>                    0 

The patients observed at VA Hospital are deleted from Location, but VA Hospital is still a category.

Use the removecats function to remove VA Hospital from the categories of Location.

Location = removecats(Location,"VA Hospital");

Verify that the category, VA Hospital, was removed.

categories(Location)
ans = 2x1 cell
    {'County General Hospital'  }
    {'St. Mary's Medical Center'}

Location is a 63-by-1 categorical array that has two categories.

Delete Element

You can delete elements by indexing. For example, you can remove the first element of Location by selecting the rest of the elements with Location(2:end). However, an easier way to delete elements is to use [].

Location(1) = [];
summary(Location)
Location: 62x1 categorical

     County General Hospital       38 
     St. Mary's Medical Center      24 
     <undefined>                    0 

Location is a 62-by-1 categorical array that has two categories. Deleting the first element has no effect on other elements from the same category and does not delete the category itself.

Test for Undefined Elements

Remove the category County General Hospital from Location.

Location = removecats(Location,"County General Hospital");

Display the first eight elements of the categorical array, Location.

Location(1:8)
ans = 8x1 categorical
     St. Mary's Medical Center 
     <undefined> 
     St. Mary's Medical Center 
     St. Mary's Medical Center 
     <undefined> 
     <undefined> 
     St. Mary's Medical Center 
     St. Mary's Medical Center 

After removing the category, County General Hospital, elements that previously belonged to that category no longer belong to any category defined for Location. The categorical elements that do not belong to any category are undefined, and display <undefined> as their values.

Use the function isundefined to find elements of a categorical array that do not belong to any category.

undefinedIndex = isundefined(Location);

undefinedIndex is a 62-by-1 categorical array containing logical true (1) for all undefined elements in Location.

Set Undefined Elements

Use the summary function to print the number of undefined elements in Location. Then display the first five elements of Location.

summary(Location)
Location: 62x1 categorical

     St. Mary's Medical Center      24 
     <undefined>                   38 
Location(1:5)
ans = 5x1 categorical
     St. Mary's Medical Center 
     <undefined> 
     St. Mary's Medical Center 
     St. Mary's Medical Center 
     <undefined> 

The first element of Location belongs to the category, St. Mary's Medical Center. Set the first element to be an undefined value so that it no longer belongs to any category. The recommended way is to use the missing function to create undefined values. Another way is to assign '' or "" to elements of the array. When you assign such values to elements of a categorical array, it converts them to undefined values.

Location(1) = missing;
Location(3) = '';
Location(1:5)
ans = 5x1 categorical
     <undefined> 
     <undefined> 
     <undefined> 
     St. Mary's Medical Center 
     <undefined> 

The summary function shows that these assignments increased the number of undefined elements.

summary(Location)
Location: 62x1 categorical

     St. Mary's Medical Center      22 
     <undefined>                   40 

You can make selected elements undefined without removing a category or changing the categories of other elements. Set undefined elements to indicate elements with values that are unknown.

Preallocate Categorical Arrays with Undefined Elements

You can use undefined elements to preallocate the size of a categorical array for better performance. Create a categorical array that has elements with known locations only.

definedIndex = ~isundefined(Location);
newLocation = Location(definedIndex);
summary(newLocation)
newLocation: 22x1 categorical

     St. Mary's Medical Center      22 
     <undefined>                    0 

Expand the size of newLocation so that it is a 200-by-1 categorical array. Set the last new element to be an undefined element. All of the other new elements are also assigned undefined values. The 22 original elements keep the values that they had.

newLocation(200) = missing;
summary(newLocation)
newLocation: 200x1 categorical

     St. Mary's Medical Center       22 
     <undefined>                   178 

newLocation has room for values you plan to store in the array later.

See Also

| | | | | |

Related Examples

More About