Renaming categories with accents
조회 수: 5 (최근 30일)
이전 댓글 표시
I have a categorical array t and some categories can have diacritic/accents, such as circumflexes. I want to standardize everything with no diacritic/accents.
I tried this code:
str = {'Á', 'É', 'Í', 'Ó', 'Ú','Ã','Ç','Â','Ê','Ô'};
strreplace = {'A', 'E', 'I', 'O', 'U','A','C','A','E','O'};
t = categorical({'VÉRDE','VERDE','AZUL','AMARELO','VERMELHO','VERMÊLHO'})';
cat = categories(t);
newcat = cat;
for i = 1:numel(str)
newcat = regexprep(newcat, str{i}, strreplace{i});
end
B = renamecats(t,cat,newcat)
However, after removing the accents, some categories turn out to be the same, for exemple: VERMELHO AND VERMÊLHO.
So I receive the following error:
Error using categorical/renamecats (line 39)
NEWNAMES contains duplicated values.
Is there anyway around?
This is just an example. I need a very efficient code since my categorical array t is comming from a very long table with approximaly 500 categories.
Thanks,
댓글 수: 2
Jan
2018년 11월 30일
500 does not sound like big data.
You did not mention what you want to happen, if the names of the categoricals are equal. So it is hard to suggest a solution.
By the way, strrep is much faster than regexprep .
채택된 답변
Guillaume
2018년 11월 30일
str = {'Á', 'É', 'Í', 'Ó', 'Ú','Ã','Ç','Â','Ê','Ô'};
strreplace = {'A', 'E', 'I', 'O', 'U','A','C','A','E','O'};
t = categorical({'VÉRDE','VERDE','AZUL','AMARELO','VERMELHO','VERMÊLHO'})';
cat = categories(t);
%calculation of new categories, no need for loop
newcat = replace(cat, str, strreplace);
%replace cat by newcat. Create new categorical array using newcat and the index of the original categories in t:
newt = categorical(newcat(double(t)))
댓글 수: 0
추가 답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Characters and Strings에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!