Main Content

grp2idx

Create index vector from grouping variable

Description

[g,gN] = grp2idx(s) creates an index vector g from the grouping variable s. The output g is a vector of integer values from 1 up to the number K of distinct groups. gN is a cell array of character vectors representing the list of group names.

[g,gN,gL] = grp2idx(s) also returns a column vector gL representing the list of the group levels with the same data type as s.

example

Examples

collapse all

Create a categorical vector by using discretize and convert it to an index vector by using grp2idx.

Load the hospital data set and convert the ages in hospital.Ages to categorical values representing the ages by decade.

load hospital
edges = 0:10:100; % Bin edges
labels = strcat(num2str((0:10:90)','%d'),{'s'}); % Labels for the bins
s = discretize(hospital.Age,edges,'Categorical',labels);

Display the ages and the groups of ages for the first five samples.

ages = hospital.Age(1:5)
ages = 5×1

    38
    43
    38
    40
    49

groups = s(1:5)
groups = 5x1 categorical
     30s 
     40s 
     30s 
     40s 
     40s 

Create an index vector from the categorical vector s.

[g,gN,gL] = grp2idx(s);

Display the index values corresponding to the first five samples.

g(1:5)
ans = 5×1

     4
     5
     4
     5
     5

Reproduce the input argument s using the output gL.

gL(g(1:5))
ans = 5x1 categorical
     30s 
     40s 
     30s 
     40s 
     40s 

Use gN(g) to reproduce the input argument s as a cell array of character vectors.

gN(g(1:5))
ans = 5x1 cell
    {'30s'}
    {'40s'}
    {'30s'}
    {'40s'}
    {'40s'}

Input Arguments

collapse all

Grouping variable, specified as a categorical, numeric, logical, datetime, or duration vector, a string array, a cell array of character vectors, or a character array with each row representing a group label.

grp2idx treats NaNs (numeric, duration, or logical), '' (empty character arrays or cell arrays of character vectors), "" (empty strings), <missing> values (string), <undefined> values (categorical), and NaTs (datetime) in s as missing values and returns NaNs in the corresponding rows of g. The outputs gN and gL do not include entries for missing values.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical | char | string | cell | categorical | datetime | duration

Output Arguments

collapse all

Group index, returned as a positive integer vector with values from 1 up to the number K of distinct groups in s.

List of group names, returned as a cell array of character vectors.

The order of gN depends on the data type of the grouping variable s.

  • For numeric and logical vectors, the order is the sorted order of s.

  • For categorical vectors, the order is the order of categories(s).

  • For other data types, the order is the order of first appearance in s.

gN(g) reproduces the contents of s in a cell array.

List of group levels, returned as the same data type as s: a categorical, numeric, logical, datetime, or duration vector, a cell array of character vectors, or a character array with each row representing a group label. (The software treats string arrays as cell arrays of character vectors.)

The set of groups and their order in gL are the same as those in gN, but gL has the same data type as s.

If s is a character matrix, then gL(g,:) reproduces s; otherwise, gL(g) reproduces s.

Extended Capabilities

Version History

Introduced before R2006a

Go to top of page