finding all possible polynomial combinations of n variables

Question

Louis 2019년 10월 22일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/486867-finding-all-possible-polynomial-combinations-of-n-variables

편집: Chukwunomso Agunwamba 2022년 5월 15일

I am trying to create additional polynomial features (with polynomial degree up to p where p can be anywhere between 1 to 10) to my current dataset of n variables/features.

For example, I have three variables/features: x1, x2, and x3

If I want to expand the dataset to include polynomial degree up to 2,

then I would have: x1, x1^2, x2, x1x2, x2^2, x3, x1x3, x2x3, x3^2

Once I have all possible combination of exponents the variables can take, I can easily do element-wise exponentiation.

Solution I came up with for the toy example above is the following:

% 3 variables
% 2 deg max polynomial
[var1_exp, var2_exp, var3_exp] = ndgrid(0:2);
% all exponent combinations
exp_comb = [var1_exp(:), var2_exp(:), var3_exp(:)];
% logic to remove combinations that aren't valid
exp_comb = exp_comb(sum(exp_comb,2)<=2 & sum(exp_comb,2)>0, :);

The solution indeed matches what I had above:

>> exp_comb
exp_comb =
   0     0
   0     0
   1     0
   1     0
   2     0
   0     1
   0     1
   1     1
   0     2

However, I am not sure how to generalize this easily to large n cases.

Any help would be greatly appreciated.

UPDATES:

Following solutions have been posted so far with its own pros and cons

Solution 1 - works but doesn't scale well to both n and p in terms of memory (n=30 and p=2 requires 1534009.5GB memory)

n = 3; % Number of variables/features
p = 2; % Maximum polynomial degree
vars = cell(n,1);
[vars{1:n}] = ndgrid(0:p);
exp_comb = reshape(cat(n+1, vars{:}), [], n);
exp_comb = exp_comb(sum(exp_comb,2)<=p & sum(exp_comb,2)>0, :);

Solution 2 - works without demanding tremendous memory, but doesn't generalize to p other than 2

n = 3; % Number of variables/features
p = 2; % Maximum polynomial degree
combs=nchoosek(1:n,p);
m=size(combs,1);
S=sparse([1:m;1:m],combs.',1,m,n);
exp_comb = [S;speye(n);2*speye(n)];

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Louis 2019년 10월 23일

I exponentiate the original feature set to generate expanded feature set, which will then be used train a machine learning model (Logistic regression model, for example).

In the example that I provided, if we assume that there were 10 observations, the original feature set would be of dimension 10x3 (m=10 observations of n=3 features each).

Using the exp_comb variable we computed, I will have expanded feature set of 10x9.

Maximum polynomial degree would be no more than 10, and the number of observations are in the order of thousands.

Chukwunomso Agunwamba 2022년 5월 15일

편집: Chukwunomso Agunwamba 2022년 5월 15일

More generally, let D = a vector listing out the maximum power of each variable. Then the output solution matrix will have a size of prod(D+1) by length(D). Concerning the memory issue, [I am refering to "(n=30 and p=2 requires 1534009.5GB memory)"], maybe, a solution algorithm can be restricted to produce a vector at a time. Then, submatrices, that are concatenations of those vectors, can be saved in file(s), or in a database, with the RAM kept free from being clogged up. If you output a column at a time, the column grows quickly with n after fixing p, because of prod(D+1). [In your special case, that column length is (p+1)^n, with D = p*ones(size(1,n))]. So, I think that it is better to output a row at a time, since, I believe you might not be using n>1000, for example. That is, the row length is simply the n specified, which is nicely much smaller than prod(D+1) as n or as p increases. Then, a selection logic can be applied on the data stored in the file(s)/database while saving, and/or while reading, without loading the entire matrix into RAM.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Matt J 2019년 10월 22일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/486867-finding-all-possible-polynomial-combinations-of-n-variables#answer_397731

편집: Matt J 2019년 10월 22일

MATLAB Online에서 열기

This should be better:

n=30;
combs=nchoosek(1:n,2);
m=size(combs,1);
S=sparse([1:m;1:m],combs.',1,m,n);
exp_comb = [S;speye(n);2*speye(n)];

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

Bashar 2020년 2월 9일

편집: Bashar 2020년 2월 9일

Hi Louis and Matt,

I am considering the same problem of expanding my feature vector for classification and this solution seems ideal to my problem. Can you please advise on how to use the resultant exp_comb (from the method that utilises diophantine function) to expand the feature vector with a p order polyinomal expansion, e.g. X which is mxn (n is number of features per oberservation/data-sample; m = number of observations)?

If I use p = 2 for n =3, I get the following 12 entries (hence I am not sure how to use the resultant exp_comb to expand my feature set e.g. X is 10x3, to 10x9):

exp_comb 2

(1,1) 1

(4,1) 2

(5,1) 1

(7,1) 1

(2,2) 1

(5,2) 1

(6,2) 2

(8,2) 1

(3,3) 1

(7,3) 1

(8,3) 1

(9,3) 2

Louis 2020년 6월 30일

MATLAB Online에서 열기

Using your example

m = 10; 
p = 2;
n = 3;
exp_comb = exponent_combinations(n, p); % function to produce all exponent combinations as sparse array (as you wrote on your question)
orig_features = X;
exp_features = []; % we will store expanded features here
% for each observations, we use the original features and exponent combination we computed to create expanded feature set
for i = 1:m
    poly_features(i, :) = prod(orig_features(i, :).^exp_combs, 2);
end

댓글을 달려면 로그인하십시오.

Answer 2

Matt J 2019년 10월 22일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/486867-finding-all-possible-polynomial-combinations-of-n-variables#answer_397726

편집: Matt J 2019년 10월 22일

MATLAB Online에서 열기

[vars{1:n}] = ndgrid(0:2);
exp_comb = reshape(cat(n+1,vars{:})  ,[],n);
exp_comb = exp_comb(sum(exp_comb,2)<=2 & sum(exp_comb,2)>0, :);

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Louis 2019년 10월 22일

Thank you. The solution worked, however, unfortunately our solution does not scale very well in terms of memory... It requests 1534008.5 GB memory for case of n=30 and maximum polynomial degree of p=2.

Perhaps I need to go about different ways of achieving the purpose.. Do you have any suggestions?

댓글을 달려면 로그인하십시오.

finding all possible polynomial combinations of n variables

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

채택된 답변

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

추가 답변 (1개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

finding all possible polynomial combinations of n variables

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

채택된 답변

댓글 수: 5 이전 댓글 3개 표시이전 댓글 3개 숨기기

추가 답변 (1개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기