필터 지우기
필터 지우기

unique is giving the same expression twice

조회 수: 1 (최근 30일)
Wesso
Wesso 2021년 1월 29일
편집: dpb 2021년 1월 29일
Hi,
(data is attached)
[Country,~,ix] = unique(A);
tally = accumarray(ix, 1);
Q2= table(Country, tally);
Q2 contains the same expression twice for the unique values of 'Audit and assurance, and tax services'. what could be the reason? and how to overcome it? is it a bug?
  댓글 수: 4
Steven Lord
Steven Lord 2021년 1월 29일
They may look the same, but can you prove they're stored the same? Store two of the expressions that look identical in separate variables x and y then run the following code and show us the results.
disp(x)
disp(y)
isequal(x, y)
whos x y
x==y % only if x and y are the same size
dpb
dpb 2021년 1월 29일
편집: dpb 2021년 1월 29일
This undoubtedly is the same issue I pointed out before at https://www.mathworks.com/matlabcentral/answers/730643-replacing-999-in-a-table-to-nan-regardless-of-the-type-of-the-column?s_tid=srchtitle#comment_1294958 where the encoding is different. Thus the strings visually appear the same, but one contains a double-byte character and the other doesn't.
Here's the specifics to show what was there for that particular set of values I looked at; undoubtedly you'll find the same thing here if you look carefully...
>> sort(categories(Final.org04b))
ans =
46×1 cell array
{'-999' }
{'-9999' }
...
{'I don't know' }
{'I don’t know' }
...
>> tmp=ans(42:43)
tmp =
2×1 cell array
{'I don't know'}
{'I don’t know'}
>> strcmp(tmp(1),tmp(2))
ans =
logical
0
>> [double(tmp{1});double(tmp{2})]
ans =
73 32 100 111 110 39 116 32 107 110 111 119
73 32 100 111 110 8217 116 32 107 110 111 119
>>
NB: the extended character "8217" in the second instead of the ASCII 39 for the single quote.

댓글을 달려면 로그인하십시오.

채택된 답변

dpb
dpb 2021년 1월 29일
편집: dpb 2021년 1월 29일
I didn't notice the data attached for this case -- the same exercise as above shows:
>> sort(categories(A))
ans =
29×1 cell array
{'Agriculture and fishing' }
{'Audit and assurance, and tax services' }
{'Audit and assurance, and tax services' }
{'Banking and capital markets' }
{'Civil Societies/NGOs' }
{'Civil society/NGOs' }
{'Construction' }
{'Consulting services' }
{'Education and academia' }
{'Electronics' }
{'Energy, utilities and resources' }
{'Financial services' }
{'Food Services' }
{'Government and public services' }
{'Health and healthcare services' }
{'Hospitality' }
{'IT and telecommunications' }
{'Manufacturing' }
{'Mining and Quarrying' }
{'Oil and gas' }
{'Other' }
{'Other business services' }
{'Other business services, please specify: ____________'}
{'Petrochemicals' }
{'Real Estate' }
{'Tourism' }
{'Transportation and logistics' }
{'Wholesale and retail trade' }
{'org03' }
>> tmp=ans(2:3)
tmp =
2×1 cell array
{'Audit and assurance, and tax services'}
{'Audit and assurance, and tax services'}
>>
There's an extended character (=160) in the second where there's an ordinary space in the first:
>> find(tmp{1}~=tmp{2})
ans =
25
>> [double(tmp{1}(25));double(tmp{2}(25))]
ans =
32
160
>>
Besides that, there are other anomolous entries as well just as were pointed out in the other categorical array in the previous Q?
...
{'Civil Societies/NGOs' }
{'Civil society/NGOs' }
...
{'Other business services' }
{'Other business services, please specify: ____________'}
...
that need to be cleaned up or one will never be able to match all elements of what are obviously intended to be the same categories but are not.
The data need a throrough cleaning before being ready for prime time.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Data Distribution Plots에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by