Deleting certain rows with partially duplicate values

If two rows have matching values in column A and in column B, I need to delete the row with the value RED. For example:
A B C D
1 March-1st 2 RED
1 March-1st 3 GREEN
3 March-5th 8 GREEN
6 March-8th 4 GREEN
2 April-3rd 5 RED
2 April-3rd 0 GREEN
Here I would need to delete row 1 and 5 beacuse they have the value RED in column D while they share values in column A and B with another row.

답변 (1개)

Voss
Voss 2024년 3월 12일
I'll assume that's a table and the text entries are strings.
T = table( ...
[1;1;3;6;2;2], ...
["March-1st";"March-1st";"March-5th";"March-8th";"April-3rd";"April-3rd"], ...
[2;3;8;4;5;0], ...
["RED";"GREEN";"GREEN";"GREEN";"RED";"GREEN"], ...
'VariableNames',["A","B","C","D"])
T = 6×4 table
A B C D _ ___________ _ _______ 1 "March-1st" 2 "RED" 1 "March-1st" 3 "GREEN" 3 "March-5th" 8 "GREEN" 6 "March-8th" 4 "GREEN" 2 "April-3rd" 5 "RED" 2 "April-3rd" 0 "GREEN"
[G,GID] = findgroups(T.A,T.B);
is_red = strcmp(T.D,"RED");
delete_row = false(size(T,1),1);
for ii = 1:numel(GID)
idx = G == GID(ii);
if nnz(idx) == 1
continue
end
delete_row(idx & is_red) = true;
end
T(delete_row,:) = []
T = 4×4 table
A B C D _ ___________ _ _______ 1 "March-1st" 3 "GREEN" 3 "March-5th" 8 "GREEN" 6 "March-8th" 4 "GREEN" 2 "April-3rd" 0 "GREEN"

댓글 수: 7

Thank you! Im getting the error message "Operator '==' is not supported for operands of type 'cell'" in idx = G == GID(ii);
You're welcome!
Please upload your table (save it to a mat file and upload the mat file using the paperclip button), and I'll take a look.
One way you'd get that error is if the date column is a cell array (as opposed to a string array like I've used) and is the first argument passed to findgroups.
T = table( ...
[1;1;3;6;2;2], ...
cellstr(["March-1st";"March-1st";"March-5th";"March-8th";"April-3rd";"April-3rd"]), ...
[2;3;8;4;5;0], ...
["RED";"GREEN";"GREEN";"GREEN";"RED";"GREEN"], ...
'VariableNames',["A","B","C","D"])
T = 6×4 table
A B C D _ _____________ _ _______ 1 {'March-1st'} 2 "RED" 1 {'March-1st'} 3 "GREEN" 3 {'March-5th'} 8 "GREEN" 6 {'March-8th'} 4 "GREEN" 2 {'April-3rd'} 5 "RED" 2 {'April-3rd'} 0 "GREEN"
[G,GID] = findgroups(T.B,T.A); % B (cell array) first gives the error
is_red = strcmp(T.D,"RED");
delete_row = false(size(T,1),1);
for ii = 1:numel(GID)
idx = G == GID(ii);
if nnz(idx) == 1
continue
end
delete_row(idx & is_red) = true;
end
Operator '==' is not supported for operands of type 'cell'.
T(delete_row,:) = []
In that case, make the numeric column be the first one passed to findgroups:
[G,GID] = findgroups(T.A,T.B); % numeric first
I ended up fixing it by making it idx = G == GID{ii} instead of idx = G == GID[ii]. But now I get the error:
Error using tabular/parenDelete
Row index exceeds table dimensions.
Error in Parser
T(delete_row,:) = []
Please upload your table (save it to a mat file and upload the mat file using the paperclip button), and I'll take a look.
Charles
Charles 2024년 3월 13일
편집: Charles 2024년 3월 13일
I can not upload my table, but it is read from Excel, and then I used the unique function to delete all exact duplicate rows. TUnique is >800 rows.
T = readtable('Table.xlsx');
TUnique = unique(T);
You should be deleting rows from TUnique instead of T.
Can you upload the xlsx file? Without the data, I can only guess.

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Data Type Identification에 대해 자세히 알아보기

제품

릴리스

R2022b

질문:

2024년 3월 12일

댓글:

2024년 3월 13일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by