주요 콘텐츠

frequencyEncoderComponent

Pipeline component for frequency encoding categorical variables

Since R2026a

    Description

    frequencyEncoderComponent is a pipeline component that performs frequency encoding on categorical data. During the learn phase, the pipeline component establishes the frequency values for each categorical variable. During the run phase, the component encodes new data using the learned frequency values.

    Creation

    Description

    component = frequencyEncoderComponent creates a pipeline component for frequency encoding.

    component = frequencyEncoderComponent(Name=Value) sets writable Properties using one or more name-value arguments. For example, Limit=10 specifies to encode the top ten categories.

    example

    Properties

    expand all

    Learn Parameters

    The software sets learn parameters when you create the component. You can modify learn parameters using dot notation any time before you use the learn object function. Any unset learn parameters use the corresponding default values.

    Maximum number of categories, specified as a positive integer scalar. If there are more than Limit categories, the component frequency encodes the Limit highest frequency categories. Values not belonging to these categories are encoded as 0.

    Example: c = frequencyEncoderComponent(Limit=20)

    Example: c.Limit = 10

    Data Types: single | double

    Component Properties

    The software sets component properties when you create the component. You can modify the component properties (excluding HasLearnables and HasLearned) using dot notation at any time. You cannot modify the HasLearnables and HasLearned properties directly.

    Component identifier, specified as a character vector or string scalar.

    Example: c = frequencyEncoderComponent(Name="Encoder")

    Example: c.Name = "CategoricalEncoder"

    Data Types: char | string

    Names of the input ports, specified as a character vector, string array, or cell array of character vectors.

    Example: c = frequencyEncoderComponent(Inputs="Data1")

    Example: c.Inputs = "X"

    Data Types: char | string | cell

    Names of the output ports, specified as a character vector, string array, or cell array of character vectors.

    Example: c = frequencyEncoderComponent(Outputs="EncodedX")

    Example: c.Outputs = "X"

    Data Types: char | string | cell

    Tags that enable the automatic connection of the component inputs with other components or pipelines, specified as a nonnegative integer vector. If you specify InputTags, then the number of tags must match the number of inputs in Inputs.

    Example: c = frequencyEncoderComponent(InputTags=2)

    Example: c.InputTags = 1

    Data Types: single | double

    Tags that enable the automatic connection of the component outputs with other components or pipelines, specified as a nonnegative integer vector. If you specify OutputTags, then the number of tags must match the number of outputs in Outputs.

    Example: c = frequencyEncoderComponent(OutputTags=0)

    Example: c.OutputTags = 1

    Data Types: single | double

    This property is read-only.

    Indicator for the learnables, returned as 1 (true). A value of 1 indicates that the component contains Learnables.

    Data Types: logical

    This property is read-only.

    Indicator showing the learning status of the component, returned as 0 (false) or 1 (true). A value of 1 indicates that the learn object function has been applied to the component and the Learnables are nonempty.

    Data Types: logical

    Learnables

    The software sets learnables when you use the learn object function. You cannot modify learnables directly.

    This property is read-only.

    Categorical frequencies, specified as a cell array. For each categorical variable, Frequencies stores the category names and associated frequencies.

    Data Types: cell

    This property is read-only.

    Names of the variables used by the component to encode data, returned as a string array. The variables correspond to columns in the first data argument of learn.

    Data Types: string

    Object Functions

    learnInitialize and evaluate pipeline or component
    runExecute pipeline or component for inference after learning
    resetReset pipeline or component
    seriesConnect components in series to create pipeline
    parallelConnect components or pipelines in parallel to create pipeline
    viewView diagram of pipeline inputs, outputs, components, and connections

    Examples

    collapse all

    Create a frequencyEncoderComponent pipeline component. Specify to encode the top ten categories.

    component = frequencyEncoderComponent(Limit=10)
    component = 
      frequencyEncoderComponent with properties:
    
                 Name: "FrequencyEncoder"
               Inputs: "DataIn"
            InputTags: 1
              Outputs: "DataFreqEncoded"
           OutputTags: 1
    
       
    Learnables (HasLearned = false)
          Frequencies: []
        UsedVariables: []
    
       
    Learn Parameters (unlocked)
                Limit: 10
    
    
    Show all parameters
    

    component is a frequencyEncoderComponent object that contains two learnables: Frequencies and UsedVariables. These properties remains empty until you pass data to the component during the learn phase.

    Load the census1994 data set. Store the education variable in the table X. This variable contains 16 categories representing different levels of education.

    load census1994
    X = adultdata(:,"education");

    Use the learn object function to encode the education data.

    component = learn(component,X)
    component = 
      frequencyEncoderComponent with properties:
    
                 Name: "FrequencyEncoder"
               Inputs: "DataIn"
            InputTags: 1
              Outputs: "DataFreqEncoded"
           OutputTags: 1
    
       
    Learnables (HasLearned = true)
          Frequencies: {2×1 cell}
        UsedVariables: "education"
    
       
    Learn Parameters (locked)
                Limit: 10
    
    
    Show all parameters
    

    Note that the HasLearned property is set to true and Frequencies and UsedVariables are nonempty.

    Find the frequencies of the encoded categories.

    categories = [component.Frequencies{1}, component.Frequencies{2}]
    categories = 
    
      10×2 string array
    
        "HS-grad"         "0.34262" 
        "Some-college"    "0.23789" 
        "Bachelors"       "0.17472" 
        "Masters"         "0.056217"
        "Assoc-voc"       "0.045091"
        "11th"            "0.038337"
        "Assoc-acdm"      "0.034814"
        "10th"            "0.030441"
        "7th-8th"         "0.021077"
        "Prof-school"     "0.018793"  
    

    Version History

    Introduced in R2026a