Removing redundant rows where not every row has the same number of elements

조회 수: 2 (최근 30일)
Hello,
I have data that often looks like this:
"HIST1H2BC" "K13"
"HIST1H2BC" "K13;K16"
"HIST1H2BC" "K16"
"HIST1H2BH" "K13"
"HIST1H2BH" "K13;K16"
"HIST1H2BH" "K16"
"HIST1H2BO" "K13;K16"
"HIST1H2BO" "K16"
"HIST2H2BE" "K13;K16"
"HIST2H2BE" "K16"
I have been trying to code a function that splits the second columns at the ';' and then removes any rows for which every element is contained in another row, which would hopefully yield something like this:
"HIST1H2BC" "K13" "K16"
"HIST1H2BH" "K13" "K16"
"HIST1H2BO" "K13" "K16"
"HIST2H2BE" "K13" "K16"
All of the solutions I have tried have been very excessive and difficult to wrap my head around.
Thank you in advance!

답변 (2개)

Chunru
Chunru 2021년 7월 30일
편집: Chunru 2021년 7월 30일
x = ["HIST1H2BC" "K13"
"HIST1H2BC" "K13;K16"
"HIST1H2BC" "K16"
"HIST1H2BH" "K13"
"HIST1H2BH" "K13;K16"
"HIST1H2BH" "K16"
"HIST1H2BO" "K13;K16"
"HIST1H2BO" "K16"
"HIST2H2BE" "K13;K16"
"HIST2H2BE" "K16"
"1H1D" "K137"
"1H1D" "K137|K138"
"1H1D" "K138"
"1H1D" "K136"
"1H1E" "K136|K137"
"1H1E" "K137"];
s = split(x(1, 2), {';', '|'});
y = {x(1, 1), s'};
for i=2:size(x, 1)
s = split(x(i, 2), {';', '|'});
[lix, locy] = ismember(x(i, 1), [y{:, 1}]);
if ~lix
% new entry
y =[ y; {x(i, 1) ,s'}];
else
[lis, loc] = ismember(s, y{locy, 2});
y{locy, 2} = [y{locy, 2} s(~lis)'];
end
end
y
y = 6×2 cell array
{["HIST1H2BC"]} {["K13" "K16" ]} {["HIST1H2BH"]} {["K13" "K16" ]} {["HIST1H2BO"]} {["K13" "K16" ]} {["HIST2H2BE"]} {["K13" "K16" ]} {["1H1D" ]} {["K137" "K138" "K136"]} {["1H1E" ]} {["K136" "K137" ]}

KSSV
KSSV 2021년 7월 30일
str = ["HIST1H2BC" "K13"
"HIST1H2BC" "K13;K16"
"HIST1H2BC" "K16"
"HIST1H2BH" "K13"
"HIST1H2BH" "K13;K16"
"HIST1H2BH" "K16"
"HIST1H2BO" "K13;K16"
"HIST1H2BO" "K16"
"HIST2H2BE" "K13;K16"
"HIST2H2BE" "K16"];
iwant = strings([],3) ;
count = 0 ;
for i = 1:length(str)
s = strsplit(str(i,2),';') ;
if length(s) == 2
count = count+1 ;
iwant(count,:) = [str(i,1) s] ;
end
end
  댓글 수: 8
Dillon Heidenreich
Dillon Heidenreich 2021년 7월 30일
편집: Dillon Heidenreich 2021년 7월 30일
I've attached a small version of the excel file trimmed down to only the relevant information, but it's unfortunately hard to find the infrequent combinations of data that give me trouble. The base file is around 12 thousand rows with around 60 columns, but only 9 or so of them are relevant to my work currently. The small version is titled HistExample whereas the larger version, though still not the complete file, is titled HistTrimmed. My process starts by concatenating columns A,B,C to each other top to bottom, and repeating the same for the other two groups of columns, then concatenating them all together horzontally, effectively yielding
Seq A Common A XL A
Seq B Common B XL B
Seq C Common C XL C
which I name wholePPR
I then take the first column of wholePPR and use the unique function to find unique elements. I then run the below code:
for x = 1:size(uniquePep,1)
tempPPR = wholePPR(uniquePep(x) == wholePPR(:,1),:);
cellFam(x,1) = {tempPPR(:,2)};
cellFam(x,2) = {tempPPR(:,3)};
end
where uniquePep is the result of running the unique function of the first column. I then run this code:
for x = 1:size(cellFam,1)
tempFams = [cellFam{x,1},cellFam{x,2}];
uFams(x) = {unique(tempFams,'Rows')};
end
in order to get the data I showed at the beginning of this question. (uFams)

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Creating and Concatenating Matrices에 대해 자세히 알아보기

제품


릴리스

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by