Hello,
I have a .mat file as following
Name adress company
BOB london BIM
Alfred Paris BOB
John BOB CEF
I would like to display only the duplicate variable in or order to create this new .mat file
Name adress company
BOB
BOB
BOB
If someone have an idea to create the adapted code?
Thanks in advance

댓글 수: 5

% Data
varnames = ["Name" "adress" "company"];
data = ["BOB" "london" "BIM"
"Alfred" "Paris" "BOB"
"John" "BOB" "CEF"];
T = array2table(data, 'VariableNames', varnames);
% Find the duplicate one
Tc = categorical(T{:,:}); % convert from string to categorical
Tcv = Tc(:); % make it into a long vector
dup = mode(Tcv); % duplicated entries (using mode)
% The location of duplicate one
idx = Tc == dup;
% Generate output
Tout = strings(size(data));
Tout(idx) = dup;
Tout = array2table(Tout, 'VariableNames', varnames)
Tout = 3×3 table
Name adress company _____ ______ _______ "BOB" "" "" "" "" "BOB" "" "BOB" ""
Ali
Ali 2021년 12월 11일
Thanks a lot,
I thought that I could get my solution using that kind of code
uc1 = unique( Data(:,:));
and this one errase only the duplicate in column not in row visibely
have a nice day.
Ali
Ali 2021년 12월 11일
Sorry For the late answer,
the duplication of yhe entries doesn' t work
dup = mode(Tcv); % duplicated entries (using mode)
It the same variables are randomly put in the table
I needed to find where they are...
Image Analyst
Image Analyst 2021년 12월 12일
You accepted Walter's answer, so we assume everything is working perfectly now.
Ali
Ali 2021년 12월 12일
I accepted walter's code
But chunru's code is more adapted to what I was looking for
walter's code is more complicated to adapt for the moment
so I focused on chunru's code and i saw that the duplication of the entries didn't work
dup = mode(Tcv); % duplicated entries (using mode)
this function doesn't seem to work
the same names (not variables) are randomly put in the table
I should keep the names which are the same and that appear in more than one variable.
Thanks for your understanding

댓글을 달려면 로그인하십시오.

 채택된 답변

Walter Roberson
Walter Roberson 2021년 12월 11일

1 개 추천

common_values = intersect(intersect(Name, adress), company);
N = length(Name);
NName = strings(N, 1);
mask = ismember(Name, common_values);
NName(mask) = Name(mask);
Nadress = strings(N, 1);
mask = ismember(adress, common_values);
Nadress(mask) = adress(mask);
Ncompany = strings(N, 1);
mask = ismember(company, common_values);
Ncompany(mask) = company(mask);
output = table(Nname, Nadress, Ncompany, 'VariableNames', {'Name', 'adress', 'company'});

댓글 수: 8

Ali
Ali 2021년 12월 11일
Thank you!
Ali
Ali 2021년 12월 11일
Its perfect for few variables but its very complicated for 50 variables...
Walter Roberson
Walter Roberson 2021년 12월 11일
Do you have a table() object? The code can be automated for a table object.
Question: what should be done for a name that appears only in most of the variables but not all of them? Should you keep only names that appear in all variables? Should you keep names that appear in more than one variable?
Ali
Ali 2021년 12월 12일
편집: Walter Roberson 2021년 12월 12일
>>I should keep names which are the same that appear in more than one variable.
>>I don't know if its a table object unfortunately, It's a table which is read like that:
A = readtable('DataFilter.xlsx','TextType','string');
and where I exctract the Variables like that
VarNames = T.Properties.VariableNames;
when I look at the properties:
A.Properties
ans =
struct with fields:
Description: ''
UserData: []
DimensionNames: {'Row' 'Variables'}
VariableNames: {1×46 cell}
VariableDescriptions: {}
VariableUnits: {}
RowNames: {}
I hope it helps.
Thank you for your times!!
Untested as I do not have a sample of your data.
A = readtable('DataFilter.xlsx','TextType','string');
names_by_var = varfun(@unique, A, 'OutputFormat', 'cell');
[G, ID] = findgroups({names_by_var{:}});
counts = accumarray(G, 1);
names_with_dups = ID(counts>1);
is_dup = varfun(@(V) ismember(V, names_with_dups), A, 'OutputFormat', 'uniform');
Aarray = table2array(A);
B = strings(size(Aarray));
B(is_dup) = Aarray(is_dup);
Ali
Ali 2021년 12월 12일
I have an error when I use it
Error using findgroups (line 77)
A grouping variable must be a categorical, numeric, logical, datetime,
or duration vector, or a cell array of character vectors.
Error in [G, ID] = findgroups({names_by_var{:}});
The code should work with a table as described below
Name adress company
BOB london BIM
Alfred Paris BOB
John BOB CEF
and it should return
Name adress company
BOB
BOB
BOB
A = readtable('DataFilter.xlsx','TextType','string');
names_by_var = varfun(@unique, A, 'OutputFormat', 'cell');
[G, ID] = findgroups(categorical({names_by_var{:}}));
counts = accumarray(G, 1);
names_with_dups = ID(counts>1);
is_dup = varfun(@(V) ismember(categorical(V), names_with_dups), A, 'OutputFormat', 'uniform');
Aarray = table2array(A);
B = strings(size(Aarray));
B(is_dup) = Aarray(is_dup);
Ali
Ali 2021년 12월 12일
Error using categorical could not find unique values in DATA using the UNIQUE function.
Error in [G, ID] = findgroups(categorical({names_by_var{:}}));
Caused by:
Error using cell/unique Cell array input must be a cell array of character vectors.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Data Import and Network Parameters에 대해 자세히 알아보기

제품

릴리스

R2016b

태그

질문:

Ali
2021년 12월 11일

댓글:

Ali
2021년 12월 12일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by