Subsets of uncorrelated features

조회 수: 10 (최근 30일)
Kais
Kais 2021년 7월 10일
댓글: Kais 2021년 7월 15일
Given a N by N correlation matrix of N features, how to find ALL subsets of pariwise uncorrelated features if we assume two features are uncorrelated if their correlation score is less than a threshold Alpha. There is no restriction on the number of features making the subsets. All features making a subset need to be pairwise uncorrelated.

채택된 답변

Jeff Miller
Jeff Miller 2021년 7월 12일
편집: Jeff Miller 2021년 7월 12일
N = 5;
R = rand(N); % We will ignore the lower triangular part of this array
rCutoff = 0.4;
% Make a cell array that holds all possible combinations of 2, 3, 4, ... features
combos = cell(0,0);
for i=2:N
iCombos = nchoosek(1:N,i);
for j=1:size(iCombos,1)
combos{end+1} = iCombos(j,:);
end
end
ncells = numel(combos);
% Check each cell to make sure that all of the pairwise correlations are
% less than the cutoff
qualifies = true(1,ncells);
for icell=1:ncells
features = combos{icell};
nfeatures = numel(features);
for ifeature=1:nfeatures-1
for jfeature=ifeature+1:nfeatures
iifeature = features(ifeature);
jjfeature = features(jfeature);
if abs(R(iifeature,jjfeature)) > rCutoff
qualifies(icell) = false;
end
end
end
end
  댓글 수: 5
Jeff Miller
Jeff Miller 2021년 7월 13일
You may well be right, that but "if sum" line is cognitively impenetrable to me. :)
Thanks for accepting my answer.
Kais
Kais 2021년 7월 14일
I am looking for a pairwise uncorrleation between ALL features iofn the "feature" variable". The line:
sum(nonzeros(triu(abs(R(features,features)),1)) > rCutoff)
will result into a matrix of logical values showing which feature pairs are correlated and which are not (triu is only there to reduce the symmetric correlation matrix). If any of the values of the matrix is true (equivalently, sum of values of the matrix is different from zero), The subset unqualifies as an uncorrelated subset.

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

Ive J
Ive J 2021년 7월 11일
편집: Ive J 2021년 7월 12일
Let R be the pairwise correlation matrix:
N = 10;
R = rand(N);
R(logical(eye(N))) = 1;
for i = 1:size(R, 1) - 1
for j = i+1:size(R, 1)
R(j, i) = R(i, j);
end
end
disp(R)
cutoff = 0.4; % independent features
idx = R < cutoff;
idx = triu(idx); % R(i, j) == R(j, i) in pairwise correlation matrix
features = "feature" + (1:N); % feature names
% there may be a simpler way to do this
indepFeatures = [];
for i = 1:N
indepFeatures = [indepFeatures, arrayfun(@(x)[x, features(i)], features(idx(i, :)), 'uni', false)];
end
indepFeatures = vertcat(indepFeatures{:});
% find all cliques of this set
nodes = zeros(size(indepFeatures, 1), 1);
[~, nodes(:, 1)] = ismember(indepFeatures(:, 1), features);
[~, nodes(:, 2)] = ismember(indepFeatures(:, 2), features);
G = graph(nodes(:, 1), nodes(:, 2));
M = maximalCliques(adjacency(G));
indepSets = cell(size(M, 2), 1);
for i = 1:numel(indepSets)
indepSets{i} = features(M(:, i) ~= 0);
end
indepSets(cellfun(@numel, indepSets) < 2) = []; % this can be further unified with indepFeatures
You can find maximalCliques in FEX.
  댓글 수: 12
Kais
Kais 2021년 7월 14일
편집: Kais 2021년 7월 15일
@Ive J I tried your code with cutoff = .7. I get 32 pairs, 2 triples, 2 quadruples, and 5 quintuples. While the number of pairs and quintuples seem to be correct, the numbers of triples and quadruples are not. For example, the triple [6, 7, 9] is missing.
There should be a total of 112 uncorrelated features. You code finds 44 only. Any clue why is that?
Kais
Kais 2021년 7월 15일
Never mind the last question. I figured the algorithm finds max cliques so it won't count subgraphs within larger subgraphs ([6, 7,9] won't show up because it's part of the larger subset [4, 6,7,9]), which significantly reduces the number of subsets.

댓글을 달려면 로그인하십시오.


Image Analyst
Image Analyst 2021년 7월 11일
Would stepwise regression be of any help?
Otherwise, just make an N by N table of correlation coefficients by corelating every feature with every other feature.

카테고리

Help CenterFile Exchange에서 Dimensionality Reduction and Feature Extraction에 대해 자세히 알아보기

제품


릴리스

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by