Principle Component Analysis/ Singular value decomposition; great with ovariancancer dataset terrible with my data

조회 수: 11 (최근 30일)
Hello,
I am currently getting to grips with PCA, I came accorss a great tutorial from Steve Brunton on its use with matlab. This turotial makes use of the ovariancancer dataset included with matlab and works very well for seperating data, however when I try to apply it to my own data the seperation is nowhere near as clear. So my questions;
  1. Is there a discriptor of what each of the 4000 features within the ovariancancer database are and any pre-processing done on them?
  2. For anyone maths minded, what would be causing this to work well for one dataset and not the other. I can see my rank is very high but I cannot understand why.
What is my data? My data is 13 channel PSG recordings, from which I window into 10 second windows, with 5 second overlaps. I then calculate Mean, Med,Mode,variance ,Standard deviation, Interquartile range, range, kurtios and skewness. This gives me 117 features (9*13). I will include the first 1000 rows of features and clinical truth as the data is open source anyways. The code, which works well is included below;
%load ovariancancer % works great with this featureset but poorly with others, I have renamed
% my uploaded vaiables to match this example
[U,S,V] = svd(obs,'econ');
figure
subplot(1,2,1)
semilogy(diag(S),'k-o','LineWidth',2.5)
set(gca,'FontSize',15), axis tight, grid on
subplot(1,2,2)
plot(cumsum(diag(S))./sum(diag(S)),'k-o','Linewidth',2.5)
set(gca,'FontSize',15), axis tight, grid on
set(gcf,'Position',[1440 100 3*600 3*250])
figure, hold on
for i = 1:size(obs,1)
x = V(:,1)'*obs(i,:)';
y = V(:,2)'*obs(i,:)';
z = V(:,3)'*obs(i,:)';
if (grp(i) == 1)
plot3(x,y,z,'rx','LineWidth',1);
else
plot3(x,y,z,'bo','LineWidth',1);
end
end
xlabel('PC1'), ylabel('PC2'), zlabel('PC3')
view(85,25), grid on, set(gca,'FontSize',15)
set(gcf,'Position',[1400 100 1200 900])
  댓글 수: 1
Christopher McCausland
Christopher McCausland 2021년 12월 17일
For anyone that comes accross this I belive my problem was high variance between the featrues. I resolved this by using the normalize function however this leads to U,S,V returning arrays of NaN values. I will continue to update this if I make any progress but I am not sure why normalisation causes this NaN output.

댓글을 달려면 로그인하십시오.

답변 (0개)

카테고리

Help CenterFile Exchange에서 Dimensionality Reduction and Feature Extraction에 대해 자세히 알아보기

제품


릴리스

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by