kmeansEncoderComponent

Pipeline component for feature extraction using k-means clustering

Since R2026a

Description

kmeansEncoderComponent is a pipeline component that performs feature extraction using k-means clustering. The pipeline component uses the functionality of the kmeans function during the learn phase to find clusters in the data. The component uses the functionality of the pdist2 function during the run phase to map new data to the learned clusters.

Creation

Syntax

component = kmeansEncoderComponent

component = kmeansEncoderComponent(Name=Value)

Description

component = kmeansEncoderComponent creates a pipeline component for feature extraction using k-means clustering.

component = kmeansEncoderComponent(Name=Value) sets writable Properties using one or more name-value arguments. For example, NumClusters=5 specifies to extract five clusters (transformed features).

example

Properties

expand all

Learn Parameters

The software sets learn parameters when you create the component. You can modify learn parameters using dot notation any time before you use the learn object function. Any unset learn parameters use the corresponding default values.

`NumClusters` — Number of clusters to extract
positive integer scalar

Number of clusters (transformed features) to extract, specified as a positive integer scalar.

If you do not specify the NumClusters value, the software extracts all clusters.

Example: c = kmeansEncoderComponent(NumClusters=5)

Example: c.NumClusters = 10

Data Types: single | double

`Distance` — Distance metric
`"sqeuclidean"` (default) | `"cityblock"` | `"cosine"` | `"correlation"` | `"hamming"`

Distance metric, specified as "sqeuclidean", "cityblock", "cosine", "correlation", or "hamming". The software computes centroid clusters differently for the supported distance metrics. For more information, see Distance.

Example: c = kmeansEncoderComponent(Distance="cityblock")

Example: c.Distance = "correlation"

Data Types: char | string

`EmptyAction` — Action to take if cluster loses all member observations
`"singleton"` (default) | `"error"` | `"drop"`

Action to take if a cluster loses all its member observations, specified as "singleton", "error", or "drop". For more information, see EmptyAction.

Example: c = kmeansEncoderComponent(EmptyAction="error")

Example: c.EmptyAction = "drop"

Data Types: char | string

`MaxIter` — Maximum number of iterations
`100` (default) | positive integer

Maximum number of iterations, specified as a positive integer.

Example: c = kmeansEncoderComponent(MaxIter=1000)

Example: c.MaxIter = 500

Data Types: single | double

`Replicates` — Number of times to repeat clustering using new initial cluster centroid positions
`1` (default) | positive integer scalar

Number of times to repeat the clustering using new initial cluster centroid positions, specified as a positive integer scalar. The software returns the solution with the lowest within-cluster sums of point-to-centroid distances.

Example: c = kmeansEncoderComponent(Replicates=5)

Example: c.Replicates = 10

Data Types: single | double

`Start` — Method for choosing initial cluster centroid positions
`"plus"` (default) | `"cluster"` | `"sample"` | `"uniform"` | numeric matrix | numeric array

Method for choosing the initial cluster centroid positions, specified as "plus", "cluster", "sample", "uniform", a NumClusters-by-p numeric matrix, or a NumClusters-by-p-by-Replicates numeric array, where p is the number of features in the first input argument of learn used by the component. For more information, see Start.

Example: c = kmeansEncoderComponent(Start="sample")

Example: c.Start = "uniform"

Data Types: single | double | char | string

Component Properties

The software sets component properties when you create the component. You can modify the component properties (excluding HasLearnables and HasLearned) using dot notation at any time. You cannot modify the HasLearnables and HasLearned properties directly.

`Name` — Component identifier
`"KmeansEncoder"` (default) | character vector | string scalar

Component identifier, specified as a character vector or string scalar.

Example: c = kmeansEncoderComponent(Name="Extractor")

Example: c.Name = "KmeansExtractor"

Data Types: char | string

`Inputs` — Names of input ports
`"DataIn"` (default) | character vector | string array | cell array of character vectors

Names of the input ports, specified as a character vector, string array, or cell array of character vectors.

Example: c = kmeansEncoderComponent(Inputs="X")

Example: c.Inputs = "X1"

Data Types: char | string | cell

`Outputs` — Names of output ports
`["DataOut","ClusterIndices","SumOfDistances"]` (default) | character vector | string array | cell array of character vectors

Names of the output ports, specified as a character vector, string array, or cell array of character vectors.

Example: c = kmeansEncoderComponent(Outputs=["ExtractedX","Indices","Distances"])

Example: c.Outputs = ["DataOut","Indices","Distances"]

Data Types: char | string | cell

`InputTags` — Tags that enable automatic connection of component inputs
`1` (default) | nonnegative integer vector

Tags that enable the automatic connection of the component inputs with other components or pipelines, specified as a nonnegative integer vector. If you specify InputTags, the number of tags must match the number of inputs in Inputs.

Example: c = kmeansEncoderComponent(InputTags=0)

Example: c.InputTags = 1

Data Types: single | double

`OutputTags` — Tags that enable automatic connection of component outputs
`[1 NaN NaN]` (default) | nonnegative integer vector

Tags that enable the automatic connection of the component outputs with other components or pipelines, specified as a nonnegative integer vector. If you specify OutputTags, the number of tags must match the number of outputs in Outputs.

Example: c = kmeansEncoderComponent(OutputTags=[1 0 0])

Example: c.OutputTags = [1 NaN NaN]

Data Types: single | double

`HasLearnables` — Indicator for learnables
Read-only: `1` (`true`) (default)

This property is read-only.

Indicator for the learnables, returned as 1 (true). A value of 1 indicates that the component contains Learnables.

Data Types: logical

`HasLearned` — Indicator showing learning status of component
Read-only: `0` (`false`) (default) | `1` (`true`)

This property is read-only.

Indicator showing the learning status of the component, returned as 0 (false) or 1 (true). A value of 1 indicates that the learn object function has been applied to the component, and the Learnables are nonempty.

Data Types: logical

Learnables

The software sets learnables when you use the learn object function. You cannot modify learnables directly.

`ClusterCentroids` — Cluster centroid locations
Read-only: table | `[]`

This property is read-only.

Cluster centroid locations, returned as a NumClusters-by-p table, where p is the number of features in the first input argument of learn used by the component. Row j is the centroid of cluster j.

Data Types: table

`UsedVariables` — Names of variables used by component
Read-only: string array | `[]`

This property is read-only.

Names of the variables used by the component to extract features, returned as a string array. The variables correspond to columns in the data argument of learn.

Data Types: string

Object Functions

`learn`	Initialize and evaluate pipeline or component
`run`	Execute pipeline or component for inference after learning
`reset`	Reset pipeline or component
`series`	Connect components in series to create pipeline
`parallel`	Connect components or pipelines in parallel to create pipeline
`view`	View diagram of pipeline inputs, outputs, components, and connections

Examples

collapse all

Create Component for k-Means Feature Extraction

Create a pipeline component that performs feature extraction using k-means clustering. Specify to extract 3 features.

component = kmeansEncoderComponent(NumClusters=3)

component = 

  kmeansEncoderComponent with properties:

                Name: "KmeansEncoder"
              Inputs: "DataIn"
           InputTags: 1
             Outputs: ["DataOut"    "ClusterIndices"    "SumOfDistances"]
          OutputTags: [1 NaN NaN]

   
Learnables (HasLearned = false)
    ClusterCentroids: []
       UsedVariables: []

   
Learn Parameters (unlocked)
         NumClusters: 3


Show all parameters

component is a KmeansEncoder object that contains two learnables: ClusterCentroids and UsedVariables. The properties remain empty until you pass data to the component during the learn phase.

Read the fisheriris data set into a table. Store the predictor and response data in the tables X and Y, respectively.

fisheriris = readtable("fisheriris.csv");
X = fisheriris(:,1:end-1);
Y = fisheriris(:,end);

Use the learn object function to extract the cluster centroid locations from the predictor data X.

component = learn(component,X)

component = 

  kmeansEncoderComponent with properties:

                Name: "KmeansEncoder"
              Inputs: "DataIn"
           InputTags: 1
             Outputs: ["DataOut"    "ClusterIndices"    "SumOfDistances"]
          OutputTags: [1 NaN NaN]

   
Learnables (HasLearned = true)
    ClusterCentroids: [3×4 table]
       UsedVariables: ["SepalLength"    "SepalWidth"    "PetalLength"    "PetalWidth"]

   
Learn Parameters (locked)
         NumClusters: 3


Show all parameters

The ClusterCentroids and UsedVariables properties are nonempty, and the HasLearned property is set to true.

Find the cluster centroid locations used for extracting features.

centroids = component.ClusterCentroids

centroids =

  3×4 table

     Var1      Var2      Var3      Var4 
    ______    ______    ______    ______

      6.85    3.0737    5.7421    2.0711
    5.9016    2.7484    4.3935    1.4339
     5.006     3.428     1.462     0.246

Version History

Introduced in R2026a

kmeansEncoderComponent

Description

Creation

Syntax

Description

Properties

Learn Parameters

NumClusters — Number of clusters to extract positive integer scalar

Distance — Distance metric "sqeuclidean" (default) | "cityblock" | "cosine" | "correlation" | "hamming"

EmptyAction — Action to take if cluster loses all member observations "singleton" (default) | "error" | "drop"

MaxIter — Maximum number of iterations 100 (default) | positive integer

Replicates — Number of times to repeat clustering using new initial cluster centroid positions 1 (default) | positive integer scalar

Start — Method for choosing initial cluster centroid positions "plus" (default) | "cluster" | "sample" | "uniform" | numeric matrix | numeric array

Component Properties

Name — Component identifier "KmeansEncoder" (default) | character vector | string scalar

Inputs — Names of input ports "DataIn" (default) | character vector | string array | cell array of character vectors

Outputs — Names of output ports ["DataOut","ClusterIndices","SumOfDistances"] (default) | character vector | string array | cell array of character vectors

InputTags — Tags that enable automatic connection of component inputs 1 (default) | nonnegative integer vector

OutputTags — Tags that enable automatic connection of component outputs [1 NaN NaN] (default) | nonnegative integer vector

HasLearnables — Indicator for learnables Read-only: 1 (true) (default)

HasLearned — Indicator showing learning status of component Read-only: 0 (false) (default) | 1 (true)

Learnables

ClusterCentroids — Cluster centroid locations Read-only: table | []

UsedVariables — Names of variables used by component Read-only: string array | []

Object Functions

Examples

Create Component for k-Means Feature Extraction

Version History

See Also

`NumClusters` — Number of clusters to extract
positive integer scalar

`Distance` — Distance metric
`"sqeuclidean"` (default) | `"cityblock"` | `"cosine"` | `"correlation"` | `"hamming"`

`EmptyAction` — Action to take if cluster loses all member observations
`"singleton"` (default) | `"error"` | `"drop"`

`MaxIter` — Maximum number of iterations
`100` (default) | positive integer

`Replicates` — Number of times to repeat clustering using new initial cluster centroid positions
`1` (default) | positive integer scalar

`Start` — Method for choosing initial cluster centroid positions
`"plus"` (default) | `"cluster"` | `"sample"` | `"uniform"` | numeric matrix | numeric array

`Name` — Component identifier
`"KmeansEncoder"` (default) | character vector | string scalar

`Inputs` — Names of input ports
`"DataIn"` (default) | character vector | string array | cell array of character vectors

`Outputs` — Names of output ports
`["DataOut","ClusterIndices","SumOfDistances"]` (default) | character vector | string array | cell array of character vectors

`InputTags` — Tags that enable automatic connection of component inputs
`1` (default) | nonnegative integer vector

`OutputTags` — Tags that enable automatic connection of component outputs
`[1 NaN NaN]` (default) | nonnegative integer vector

`HasLearnables` — Indicator for learnables
Read-only: `1` (`true`) (default)

`HasLearned` — Indicator showing learning status of component
Read-only: `0` (`false`) (default) | `1` (`true`)

`ClusterCentroids` — Cluster centroid locations
Read-only: table | `[]`

`UsedVariables` — Names of variables used by component
Read-only: string array | `[]`