Logistic regression in MATLAB (without Statistics and Machine Learning Toolbox)

조회 수: 10 (최근 30일)
Anyone knows if there is a way to perform logistic regression (similar to the "LogisticRegression" model utilized in Python / scikit-learn) in MATLAB but without the "Statistics and Machine Learning Toolbox"? I have the basic MATLAB software (R2020b) without such toolbox and I didn´t want to migrate to the Python environment as well as to buy this specific toolbox.
Many thanks.
  댓글 수: 3
dpb
dpb 2025년 8월 20일
There are at least a couple of well-rated functions at the File Exchange you could look at...
Alexandre Englert
Alexandre Englert 2025년 8월 21일
Matt: Yep, I just want to use plain MATLAB (no toolboxes).
dpb: I tried using the "Logistic Regression for Classification" package which I found on "File Exchange", but already trying the "demo.m" file gave a message saying I needed the "Statistics and Machine Learning Toolbox"...
Any other suggestion of functions or packages there?
Many thanks to both.

댓글을 달려면 로그인하십시오.

채택된 답변

William Rose
William Rose 2025년 8월 21일
편집: William Rose 2025년 8월 21일
[edit: The equation I entered with LaTeX, in my answer below, is displaying as a fuzzy gray rectangle on my computer, after I submit the anwer. I hope it looks better to you. In case it doesn't look better for you, here is the equation in non-LaTeX format:
probability=1/(1+exp(-(x-mu)/s))
]
Here is code to do logistic regression without using any toolboxes. What it does:
  • Generate simulated data: 100 observed values of 0 or 1, based on a logistic function of x, with known, specified parameters mu and s (location and scale, respectively).
  • Estimate the parameters of the logistic function from the observations.
  • Display the estimated parameter values in the console window.
  • Plot the observations, and the logistic function with the known parameters, and the logistic function with the parameters estimated from the data.
The code defines a function which returns the negative log likelihood of the observations, for given values of mu and s (location and scale), using the logistic probability equation above.
The code uses fminsearch() (which is part of basic Matlab, not in a toolbox) to find the values of mu and s which give the best fit, i.e. which minimize the negative log likelihood of the observed data.
This script will give slightly different results each time you run it, since it uses rand() to generate the observations.
See detailed comments in the code.
% Generate simulated data for regression
mu=-1; % location = x-value where probability=0.5
s=2; % scale = transition width
x=sort(20*rand(1,100)-10); % 100 random numbers between -10 and +10
pr=1./(1+exp(-(x-mu)/s)); % probability of observing 1, at each x value
y=(rand(1,100)<=pr); % observed y-values (0 or 1)
% Estimate the logistic probability parameters from the observed data
params0=[0,1]; % initial guess for [mu,s]
params=fminsearch(@(params)negloglike(params,x,y),params0); % best-fit parameters
muEst=params(1); sEst=params(2);
fprintf('Estimated mu=%.3f, estimated s=%.3f.\n',muEst,sEst) % display results on console
Estimated mu=-1.213, estimated s=1.948.
% Compute estimated probability
prEst=1./(1+exp(-(x-muEst)/sEst)); % estimated probability
% Plot the observations, the probability function used to generate the observations,
% and the probability function estimated from the observations
figure
plot(x,y,'ko','MarkerFaceColor','k') % observations
hold on;
plot(x,pr,'-b','LineWidth',2) % probability function used to generate the observations
xlabel('X'); grid on
plot(x,prEst,'-r','LineWidth',2); % probability function esitmated from the observations
title('Logistic Regression: Observations and Probabilities')
legend('Observation','True Probability','Estimated Probability', 'Location','southeast')
function negLL=negloglike(params,x,y)
% NEGLOGLIKE Negative log likelihood of observations, using logistic regression
% Inputs
% params=[mu,s] where
% mu = x-value where probability=0.5
% s = scale = transition width in the x-direction
% x = x-values of observations (vector)
% y = observations, true(1) or false(0) (vector)
% Output
% negLL=-log(probability of observed observations, given mu and s)
% =-sum(log(prob. of each observation))
% =-sum(y*log(p)+(1-y)*log(1-p))
% where p=probility from logistic probability density
mu=params(1); s=params(2);
p=1./(1+exp(-(x-mu)/s)); % probability of observing 1
negLL=-sum(y.*log(p)+(1-y).*log(1-p));
end
  댓글 수: 5
William Rose
William Rose 2025년 8월 26일
편집: William Rose 2025년 8월 26일
@Alexandre Englert, I made a small adjustment. Now the script for logistic regresison with three predictors works well. I tried four runs with N=50 samples and four runs with N=200 samples. The script converged to reasonable estimates every time. An example of the output is below. The script is attached.
>> logisticRegressionExample3
200 samples: 80 zeros, 120 ones.
Estimated a= 2.293, estimated s= 1.696, 3.479,-2.605.
True a= 2.167, true s= 2.000, 4.000,-3.000.
>>

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

the cyclist
the cyclist 2025년 8월 20일
편집: the cyclist 2025년 8월 20일
Logistic regression is an algorithm that can definitely be programmed in MATLAB, which is a general-purpose programming language. It is a numerical optimization problem.
I was not willing to put in the work to solve that for you, but ChatGPT was. Here is what it came up with, along with a comparison to the output of fitglm (from the Statistics and Machine Learning Toolbox), which is what I would typically use to do a logistic regression.
Without extensive testing, I cannot vouch that this code is doing what I would expect. Frankly, I have not even checked to see if there is a stupendously easier way to do this.
% - Small default ridge for stability (lambda = 1e-6)
% - Base MATLAB implementation + optional fitglm comparison
rng default
% Generate a simple synthetic binary classification dataset (2 informative features)
n = 400;
X1 = [randn(n/2,1) + 1.0; randn(n/2,1) - 1.0];
X2 = [randn(n/2,1) - 0.5; randn(n/2,1) + 0.5];
X = [ones(n,1), X1, X2]; % add intercept only (no extra constant feature)
% True parameters (with intercept)
beta_true = [-0.25; 2.0; -1.5];
p = sigmoid(X*beta_true);
y = double(rand(n,1) < p);
% Fit with our scratch implementation (no toolboxes)
opts = struct('maxIter', 100, 'tol', 1e-8, 'lambda', 1e-6); % tiny ridge for numerical safety
[beta_hat, stats] = logreg_newton(X, y, opts);
% Report results
fprintf('=== Base MATLAB Logistic Regression (Newton/IRLS) ===\n');
=== Base MATLAB Logistic Regression (Newton/IRLS) ===
disp(table((0:size(X,2)-1)', beta_hat, 'VariableNames', {'CoeffIndex','Estimate'}));
CoeffIndex Estimate __________ ________ 0 -0.26037 1 2.2738 2 -1.6743
fprintf('Converged: %d in %d iters, final |step| = %.3e, logLik = %.6f\n', ...
stats.converged, stats.iters, stats.lastStepNorm, stats.logLik);
Converged: 1 in 8 iters, final |step| = 3.509e-11, logLik = -106.924048
% Train/test split and accuracy
idx = randperm(n);
train = idx(1:round(0.7*n));
test = idx(round(0.7*n)+1:end);
phat_train = sigmoid(X(train,:)*beta_hat);
phat_test = sigmoid(X(test,:)*beta_hat);
yhat_train = phat_train >= 0.5;
yhat_test = phat_test >= 0.5;
acc_train = mean(yhat_train == y(train));
acc_test = mean(yhat_test == y(test));
fprintf('Train acc: %.2f%% | Test acc: %.2f%%\n', 100*acc_train, 100*acc_test);
Train acc: 89.29% | Test acc: 88.33%
% Optional: compare to fitglm
hasStatsTBX = ~isempty(ver('stats'));
if hasStatsTBX
Xglm = X(:,2:end); % drop explicit intercept for fitglm
mdl = fitglm(Xglm, y, 'Distribution', 'binomial', 'Link', 'logit', 'Intercept', true);
beta_glm = mdl.Coefficients.Estimate;
fprintf('\n=== fitglm Comparison ===\n');
disp(table((0:size(X,2)-1)', beta_hat, beta_glm, beta_hat - beta_glm, ...
'VariableNames', {'CoeffIndex','BaseMATLAB','fitglm','Diff'}));
phat_test_glm = predict(mdl, Xglm(test,:));
yhat_test_glm = phat_test_glm >= 0.5;
acc_test_glm = mean(yhat_test_glm == y(test));
fprintf('Test acc (base): %.2f%% | Test acc (fitglm): %.2f%%\n', 100*acc_test, 100*acc_test_glm);
else
fprintf('\n[Note] Statistics and Machine Learning Toolbox not detected. Skipping fitglm comparison.\n');
end
=== fitglm Comparison ===
CoeffIndex BaseMATLAB fitglm Diff __________ __________ ________ ___________ 0 -0.26037 -0.26037 2.1952e-08 1 2.2738 2.2738 -2.0554e-07 2 -1.6743 -1.6743 1.7967e-07
Test acc (base): 88.33% | Test acc (fitglm): 88.33%
% ------- Local functions (base MATLAB only) -------
function [beta, stats] = logreg_newton(X, y, opts)
%LOGREG_NEWTON Logistic regression via Newton-Raphson (IRLS).
if nargin < 3, opts = struct; end
if ~isfield(opts, 'maxIter'), opts.maxIter = 100; end
if ~isfield(opts, 'tol'), opts.tol = 1e-8; end
if ~isfield(opts, 'lambda'), opts.lambda = 1e-6; end % tiny ridge
[n,p] = size(X);
beta = zeros(p,1);
lambda = opts.lambda;
R = zeros(p); % L2 penalty (no intercept penalty)
if lambda > 0
R(2:end,2:end) = lambda*eye(p-1);
end
for k = 1:opts.maxIter
eta = X*beta;
p1 = sigmoid(eta);
W = p1 .* (1 - p1);
W = max(W, 1e-12); % avoid zeros
g = X'*(y - p1) - R*beta;
% Build H = X' * diag(W) * X + R without forming diag(W)
H = X'*(bsxfun(@times, X, W)) + R;
% Extra diagonal jitter for numerical stability in tough cases
% (helps if features are nearly collinear)
H = H + 1e-12*eye(p);
% Solve Newton step
step = H \ g;
% Backtracking line search on penalized log-likelihood
t = 1.0;
pll_prev = loglik(y, eta) - 0.5*lambda*sum(beta(2:end).^2);
while t > 1e-8
beta_try = beta + t*step;
eta_try = X*beta_try;
pll_try = loglik(y, eta_try) - 0.5*lambda*sum(beta_try(2:end).^2);
if pll_try >= pll_prev
beta = beta_try;
pll_prev = pll_try;
break;
end
t = t/2;
end
if norm(step) < opts.tol
stats.converged = true;
stats.iters = k;
stats.lastStepNorm = norm(step);
stats.logLik = pll_prev;
return;
end
end
stats.converged = false;
stats.iters = opts.maxIter;
stats.lastStepNorm = norm(step);
stats.logLik = pll_prev;
end
function y = sigmoid(z)
%SIGMOID Numerically stable logistic sigmoid
y = zeros(size(z));
pos = z >= 0;
neg = ~pos;
y(pos) = 1 ./ (1 + exp(-z(pos)));
ez = exp(z(neg));
y(neg) = ez ./ (1 + ez);
end
function ll = loglik(y, eta)
%LOGLIK Bernoulli log-likelihood
% Compute sum(y.*eta - log(1+exp(eta))) stably
ll = 0;
for i = 1:numel(eta)
t = eta(i);
if t > 0
ll = ll + y(i)*t - (t + log1p(exp(-t)));
else
ll = ll + y(i)*t - log1p(exp(t));
end
end
end
  댓글 수: 2
Alexandre Englert
Alexandre Englert 2025년 8월 21일
Thank you for the answer. However, I didn´t want to rely on chatGPT... Are there any good books on MATLAB and supervised machine learning which you could suggest?
Torsten
Torsten 2025년 8월 21일
편집: Torsten 2025년 8월 21일
I fully support your efforts to program your own Logistic Regression code.
But from the economic point of view, the time you will have to spend understanding the theory and reliably coding the software will not pay compared to the price of the toolbox.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Support Vector Machine Regression에 대해 자세히 알아보기

제품


릴리스

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by