sorting alphabetically with sortrows: underscore handled differently than in windows
조회 수: 7 (최근 30일)
이전 댓글 표시
Hello everybody,
I am having a problem with the sortrows. I am sorting filenames alphabetically but I get a different sorted list than what Windows shows in the Explorer.
Here is how I sort with Matlab:
>> sorted = sortrows({'abc1_test.txt'; 'abc11_test.txt'})
sorted =
'abc11_test.txt'
'abc1_test.txt'
The Matlab display of the current folder and the Windows Explorer show a different sorting:

So for some reason, sortrows prefers numbers over underscore while Windows and even the Matlab display have it vice versa.
Is this a bug? What can I do to get the same sorting result in my Matlab scripts?
Regards
Stephan
댓글 수: 1
Stephen23
2019년 9월 30일
편집: Stephen23
2019년 9월 30일
"What can I do to get the same sorting result in my Matlab scripts?"
The short answer is that you can't: Microsoft does not document the sort order that it uses in Windows Explorer, nor is there any guarantee that this order remains the same between different Windows versions or other Microsoft products.
"Is this a bug?"
Are undocumented Windows behaviors bugs or features?
"...So for some reason, sortrows prefers numbers over underscore..:"
It is not for "some reason", it is because sortrows (just like the sorting routines of almost every programming language in existence) simply sorts the character values, which are extremely well documented:
채택된 답변
Stephen23
2019년 9월 30일
편집: Stephen23
2019년 9월 30일
If you actually want to sort filenames into alphanumeric order, then one option is to download my FEX submission natsortfiles:
and then use it something like this:
>> C = {'abc1_test.txt'; 'abc11_test.txt'};
>> D = natsortfiles(C)
D =
'abc1_test.txt'
'abc11_test.txt'
추가 답변 (2개)
Stephen23
2023년 8월 14일
편집: Stephen23
2023년 9월 2일
I had a requirement to sort some non-English text into alphabetic order (or even better, alphanumeric order). I first looked at calling something in Python or Java, but setting the locale was too fiddly from MATLAB. So I wrote a little text sorting function ARBSORT, that sorts into an alphabetic order specified by the user:
Note that ARBSORT does not recognise syllables, word roots, or compound words, nor does ARBSORT automatically split ligatures, or sort diacritics in reverse order, or implement any of countless other language-specific rules (such things are well beyond the scope of my little project). ARBSORT does provide two simple ways for the user to specify the text/alphabetic order:
- define equivalent characters, e.g. in French the ligatures "æ" and "œ" are sorted as "ae" and "oe" respectively.
- define the alphabetic order (or the order of any arbitrary words, like MS Excel's custom list sorting feature).
By default any unmatched diacritics are removed before sorting (this matches common practice in English and several other langauges, where diacritics not used in that language are silently ignored).
For example sorting German text, based on the example from Wikipedia:
% Text in character code order:
S = ["Goethe"; "Goldmann"; "Gurke"; "Göbel"; "Göthe"; "Götz"]
% DIN 5007 Variante 1:
arbsort(S, ["ß";"ss"])
% DIN 5007 Variante 2:
arbsort(S, ["ä","ö","ü","ß"; "ae","oe","ue","ss"])
% Österreichische Sortierung:
arbsort(S, ["ß";"ss"], num2cell(['aä','b':'o','ö','p':'u','ü','v':'z']))
To provide an alphanumeric sort the function ARBSORT can be provided (parameterized if required) as an optional input argument to NATSORT, NATSORTFILES, and NATSORTROWS:
Z = ["Zoë_2"; "Zoz"; "Zoë_10"; "Zoa"; "Zoë_1"]
natsort(Z, [], @arbsort) % alphanumeric sort ignoring diacritics
natsort(Z) % alphanumeric sort using character code order
댓글 수: 0
Walter Roberson
2019년 9월 30일
>> '1' < '_'
ans =
logical
1
It is thus correct that abc11_test.txt should sort before abc1_test.txt, because the first point of difference is the '1' versus '_' and '1' has a lower code value.
Windows Explorer does not document its sort order, and does not document whether it uses the same sort order for all file system types. Windows does not impose a sort order itself: it leaves it up to the file system driver.
"Proper" sort order turns out to be a messy question when you include Unicode compatible file systems, with Unicode's "combining characters", and when you include the fact that different human languages have different sorting rules. For example, how should ï sort relative to "i followed by combining-diaresis" ? How should ï sort relative to ö ? Hah -- trick question: in Swedish, ö is a distinct character, the last of the regular characters in the language -- the sort order goes xyzwåäö . So in Swedish, ö sorts before ï as ï is not a character in Swedish. But if we were talking some other language, the sort rules might be different.
Then there issues in Unicode that interplay with some languages where the lower-case equivalent of a capital letter might be two letters instead of one (for example ss vs ẞ)
The rules of how lists of unicode bytes are "normalized" in order to compare for sorting purposes depends upon the file system in Windows.
Thus, if sorting order is important to you, you need to strictly define which sorting order you are going to use, and impose that order on the results returned by interacting with the file system.
You should not define Windows Explorer's order as being "right" and MATLAB's as being "a bug": Windows Explorer is different . And undocumented.
댓글 수: 3
Walter Roberson
2019년 9월 30일
Sorry, I would have to dig further into decompiling java and trying to make sense of it than I am willing to undertake to answer this question. You can open a Support case to get an answer about which sorting method Mathworks is using.
참고 항목
카테고리
Help Center 및 File Exchange에서 Shifting and Sorting Matrices에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!