Remove duplicate variables depending on a second variable
조회 수: 2 (최근 30일)
이전 댓글 표시
Dear experts, I have a list of variables where I need te remove duplicate variables. However, in case of duplicate variables I want to keep the varibles that have value 1 in the second column. In cases when there are multiple duplicates with a 1 then it needs to keep randomly only one variable. See example below: Here I want to keep the variable BG1028 where the data in the third column is 1.3. For BG1030, I want to keep the variable with 3.0 or 0.3 in the third column. I hope it is clear. Im puzzling how to do this. This is the code I came up with so far.
ppn(:,1) = {'BG1026';'BG1027';'BG1028';'BG1028';'BG1028';'BG1029';'BG1030';'BG1030';'BG1030';'BG1030'};
ppn(:,2) = {'0';'0';'1';'0';'0';'1';'1';'0';'1';'0'};
ppn(:,3) = {'1.2';'2.2';'1.3';'0.2';'8.9';'3.4';'3.0';'0.3';'1.3';'0.3'};
% find duplicates
ppn2 = ppn(:,1);
idx = find(strcmp(ppn2(1:end-1),ppn2(2:end)))+1;
%remove duplicates
ppn((idx),:) = [];
댓글 수: 0
채택된 답변
Kirby Fears
2015년 9월 21일
Hi Marty,
Try the code below.
% Defining ppn (all at once)
ppn = [ {'BG1026';'BG1027';'BG1028';'BG1028';'BG1028';'BG1029';...
'BG1030';'BG1030';'BG1030';'BG1030'},... % start col 2
{'0';'0';'1';'0';'0';'1';'1';'0';'1';'0'},... % start col 3
{'1.2';'2.2';'1.3';'0.2';'8.9';'3.4';'3.0';'0.3';'1.3';'0.3'}];
% Storing ppn column 2 as numerical values
bPpn=cell2mat(cellfun(@(c)str2double(c),ppn(:,2),...
'UniformOutput',false));
% Deleting all duplicates with 0 in bPpn
idx = strcmp(ppn(1:end-1,1),ppn(2:end,1));
delidx = ([idx;false] | [false;idx]) & ~bPpn;
ppn(delidx,:)=[];
clear bPpn idx delidx;
% Get names of remaining duplicates
chooseNames = ppn([strcmp(ppn(1:end-1,1),ppn(2:end,1));false],1);
% Loop over chooseNames and keep one at random
if numel(chooseNames)>0,
for j=1:numel(chooseNames),
dupidx=find(strcmp(chooseNames{j},ppn(:,1)));
dupidx(randi(numel(dupidx)))=[];
ppn(dupidx,:)=[];
end,
end,
Hope this helps.
추가 답변 (1개)
the cyclist
2015년 9월 21일
This is not the world's most efficient code, but is a very straightforward implementation of what you want (or at least my understanding of it). It displays the indices you want to keep.
It's not documented at all, but I tried to use some intuitive variable names, so maybe you can figure it out.
ppn(:,1) = {'BG1026';'BG1027';'BG1028';'BG1028';'BG1028';'BG1029';'BG1030';'BG1030';'BG1030';'BG1030'};
ppn(:,2) = {'0';'0';'1';'0';'0';'1';'1';'0';'1';'0'};
ppn(:,3) = {'1.2';'2.2';'1.3';'0.2';'8.9';'3.4';'3.0';'0.3';'1.3';'0.3'};
[unique_ppn,~,indexFromUniqueBackToAll] = unique(ppn(:,1));
number_unique_ppn = numel(unique_ppn);
indices_to_keep = [];
for np = 1:number_unique_ppn
index_to_this_ppn = find((indexFromUniqueBackToAll==np));
if numel(index_to_this_ppn) == 1
indices_to_keep = [indices_to_keep; index_to_this_ppn];
else
remove_zero_index = ismember(ppn(index_to_this_ppn,2),'0');
index_to_this_ppn(remove_zero_index) = [];
random_one_to_keep = index_to_this_ppn(randi(numel(index_to_this_ppn)));
indices_to_keep = [indices_to_keep; random_one_to_keep];
end
end
indices_to_keep
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Function Creation에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!