Main Content

transform

Transform data into principal component scores

Since R2024a

    Description

    example

    Xtransformed = transform(IncrementalMdl,X) transforms the predictor data X into principal component scores using the incremental PCA model IncrementalMdl. Xtransformed is a representation of X in the principal component space described by IncrementalMdl. For more information, see pca.

    Examples

    collapse all

    Create a model for incremental principal component analysis (PCA) and a default incremental linear SVM model for binary classification. Fit the incremental models to streaming data and analyze how the principal components, model parameters, and performance metrics evolve during training. Use the final models to predict activity labels.

    Load and Preprocess Data

    Load the human activity data set. Randomly shuffle the data.

    load humanactivity
    n = numel(actid);
    rng(0,"twister") % For reproducibility
    idx = randsample(n,n);
    X = feat(idx,:);
    Y = actid(idx);

    For details on the human activity data set, enter Description at the command line.

    Responses can be one of five classes: Sitting, Standing, Walking, Running, or Dancing. Dichotomize the response by identifying whether the subject is moving (actid > 2).

    Y = Y > 2;

    Specify the first 20,000 observations and labels as streaming data, and the remaining observations and labels as test data.

    n = 20000;
    Xstream = X(1:n,:);
    Ystream = Y(1:n,:);
    Xtest = X(n+1:end,:);
    Ytest = Y(n+1:end,:);

    Create Incremental Models

    Create a model for incremental PCA. Specify to standardize the data, keep 3 principal components, and set a warm-up period of 2000 observations.

    IncrementalPCA = incrementalPCA(StandardizeData=true, ...
        NumComponents=3,WarmupPeriod=2000);
    details(IncrementalPCA)
      incrementalPCA with properties:
    
                         IsWarm: 0
        NumTrainingObservations: 0
                   WarmupPeriod: 2000
                             Mu: []
                          Sigma: []
              ExplainedVariance: [3x1 double]
               EstimationPeriod: 1000
                         Latent: [3x1 double]
                   Coefficients: [0x3 double]
                VariableWeights: [1x0 double]
                  NumComponents: 3
                  NumPredictors: 0
    

    IncrementalPCA is an incrementalPCA model object. All its properties are read-only. By default, the software sets the hyperparameter estimation period to 1000 observations. The incremental PCA model must be warm (all hyperparameters are estimated) before the fit function returns transformed observations.

    Create a default incremental linear SVM model for binary classification by using the incrementalClassificationLinear function.

    IncrementalLinear = incrementalClassificationLinear;
    details(IncrementalLinear)
      incrementalClassificationLinear with properties:
    
                        Learner: 'svm'
                         Solver: 'scale-invariant'
                      BatchSize: 1
                           Beta: [0x1 double]
                           Bias: 0
                        FitBias: 1
                     FittedLoss: 'hinge'
                         Lambda: NaN
                      LearnRate: 1
              LearnRateSchedule: 'constant'
                             Mu: []
                          Sigma: []
                  SolverOptions: [1x1 struct]
               EstimationPeriod: 0
                     ClassNames: [0x1 double]
                          Prior: [1x0 double]
                 ScoreTransform: 'none'
                  NumPredictors: 0
        NumTrainingObservations: 0
            MetricsWarmupPeriod: 1000
              MetricsWindowSize: 200
                         IsWarm: 0
                        Metrics: [1x2 table]
    

    IncrementalLinear is an incrementalClassificationLinear model object. All its properties are read-only. IncrementalLinear must be fit to data before you can use it to perform any other operations. By default, the software sets the metrics warm-up period to 1000 observations and the metrics window size to 200 observations.

    Fit Incremental Models

    Fit the IncrementalPCA and IncrementalLinear models to the streaming data by using the fit and updateMetricsAndFit functions, respectively. To simulate a data stream, fit each model in chunks of 50 observations at a time. At each iteration:

    • Process 50 observations.

    • Overwrite the previous incremental PCA model with a new one fitted to the incoming observations.

    • Return the transformed observations Xtr.

    • Overwrite the previous incremental classification model with a new one fitted to the incoming transformed observations.

    • Store β1, the cumulative metrics, and the window metrics to see how they evolve during incremental learning.

    • Store topEV, the explained variance of the component with the highest variance, to see how it evolves during incremental learning.

    numObsPerChunk = 50;
    nchunk = floor(n/numObsPerChunk);
    ce = array2table(zeros(nchunk,2),"VariableNames",["Cumulative" "Window"]);
    beta1 = zeros(nchunk,1);   
    topEV = zeros(nchunk,1);
    
    % Incremental learning
    for j = 1:nchunk
        ibegin = min(n,numObsPerChunk*(j-1) + 1);
        iend = min(n,numObsPerChunk*j);
        [IncrementalPCA,Xtr] = fit(IncrementalPCA,Xstream(ibegin:iend,:));
        IncrementalLinear = updateMetricsAndFit(IncrementalLinear,Xtr, ...
            Ystream(ibegin:iend));
        beta1(j + 1) = IncrementalLinear.Beta(1);
        ce{j,:} = IncrementalLinear.Metrics{"ClassificationError",:};
        topEV(j + 1) = IncrementalPCA.ExplainedVariance(1);
    end

    During the incremental PCA estimation and warm-up periods, the fit function returns the transformed observations as NaNs. After the PCA estimation period and warm-up period, updateMetricsAndFit fits the linear coefficient estimates β using the transformed observations. After the metrics warm-up period, IncrementalLinear is warm, and updateMetricsAndFit checks the performance of the model on the incoming transformed observations, and then fits the model to those observations.

    Analyze Incremental Models During Training

    To see how the highest explained variance, β1, and performance metrics evolve during training, plot them on separate tiles.

    figure
    t = tiledlayout(3,1);
    nexttile
    plot(topEV)
    ylabel("Top EV [%]")
    xline(IncrementalPCA.EstimationPeriod/numObsPerChunk,"r-.")
    xlim([0 nchunk])
    ylim([0 100])
    nexttile
    plot(beta1)
    ylabel("\beta_1")
    xline((IncrementalPCA.WarmupPeriod+ ...
        IncrementalPCA.EstimationPeriod)/numObsPerChunk,"b:")
    xlim([0 nchunk])
    nexttile
    h = plot(ce.Variables);
    xlim([0 nchunk])
    ylabel("Classification Error")
    xline((IncrementalLinear.MetricsWarmupPeriod+ ...
        IncrementalPCA.WarmupPeriod+ ...
        IncrementalPCA.EstimationPeriod)/numObsPerChunk,"g--")
    legend(h,ce.Properties.VariableNames)
    xlabel(t,"Iteration")

    The highest explained variance value is 0 during the estimation period and then rapidly rises to 73%. The value then gradually approaches 77%.

    The plots suggest that updateMetricsAndFit performs these steps:

    • Fit β1 after the estimation and warm-up periods only.

    • Compute the performance metrics after the estimation, warm-up, and metrics warm-up periods only.

    • Compute the cumulative metrics during each iteration.

    • Compute the window metrics after processing 200 observations (four iterations).

    Predict Activity Labels Using Final Models

    Transform the test data using the final incremental PCA model. Predict activity labels for the transformed test data using the final incremental linear classification model.

    transformedXtest = transform(IncrementalPCA,Xtest);
    predictedLabels = predict(IncrementalLinear,transformedXtest);

    Create a confusion matrix for the test data.

    figure
    ConfusionTrain = confusionchart(Ytest,predictedLabels);

    The final model misclassifies only 27 of 4075 observations in the test data.

    Input Arguments

    collapse all

    Incremental PCA model, specified as an incrementalPCA model object. You can create IncrementalMdl by calling incrementalPCA directly.

    Chunk of predictor data to transform, specified as a floating-point matrix of n observations and IncrementalMdl.NumPredictors variables. The rows of X correspond to observations, and the columns correspond to variables.

    Note

    transform supports only numeric input data. If your input data includes categorical data, you must prepare an encoded version of the categorical data. Use dummyvar to convert each categorical variable to a numeric matrix of dummy variables. Then, concatenate all dummy variable matrices and any other numeric predictors. For more details, see Dummy Variables.

    Data Types: single | double

    Output Arguments

    collapse all

    Principal component scores, returned as a floating-point matrix. The rows of Xtransformed correspond to observations, and the columns correspond to components.

    Version History

    Introduced in R2024a

    See Also

    | | |