Sort files based on last part of file name

조회 수: 4 (최근 30일)
Erica McCune
Erica McCune 2018년 2월 21일
댓글: Sophia Boucher 2020년 12월 15일
Hi,
I'm trying to sort a list of .dcm files to compile into a 3D scan. I need to sort them in their correct numerical order, but the file names are long and contain many numbers; only the very last part of the file holds the actual scan number. I'm trying to figure out how to get Matlab to sort the file names in their correct order so I can easily compile the images. I've looked at Stephen Cobeldick's Natural-Order Filename Sort, which looks helpful. I can't figure out how to use that or another tool to have it sort just based on the last few digits of the filename, however. For example, I have filenames like this that I need to sort:
2.16.840.1.114362.1.6.7.6.17718.8160090133.469658533.982.709.dcm
2.16.840.1.114362.1.6.7.6.17718.8160090133.469658533.1089.676.dcm
The .676.dcm file needs to be sorted before the .709.dcm file, but will not because the .1089 part is higher than the .982. I need to somehow sort these based only on those last few digits before the .dcm.
Thank you for any help!

채택된 답변

Stephen23
Stephen23 2018년 2월 21일
편집: Stephen23 2018년 3월 23일
Both natsortfiles and natsort have the option of specifying the regular expression that matches the numbers, so you can easily define a lookaround assertion to ensure that the following characters are the file extension. If the filenames all follow exactly the same pattern and there are no directories or different file-extensions then you can use natsort (it will be slightly faster), although natsortfiles will also work:
>> C = {'2.16.840.1.114362.1.6.7.6.17718.8160090133.469658533.982.709.dcm','2.16.840.1.114362.1.6.7.6.17718.8160090133.469658533.1089.676.dcm'};
>> D = natsort(C,'\d+(?=\.dcm$)');
>> D{:}
ans = 2.16.840.1.114362.1.6.7.6.17718.8160090133.469658533.1089.676.dcm
ans = 2.16.840.1.114362.1.6.7.6.17718.8160090133.469658533.982.709.dcm
Or as an alternative, you can easily sort your files with a solution based around regexp:
>> [~,idx] = sort(str2double(regexp(C,'\d+(?=\.dcm$)','match','once')));
>> D = C(idx);
>> D{:}
ans = 2.16.840.1.114362.1.6.7.6.17718.8160090133.469658533.1089.676.dcm
ans = 2.16.840.1.114362.1.6.7.6.17718.8160090133.469658533.982.709.dcm
  댓글 수: 4
Stephen23
Stephen23 2020년 12월 12일
@Sophia Boucher: my FEX submissions NATSORT/NATSORTFILES are not suitable because the order of the date/time fields are not arranged in sequence from largest unit to smallest unit: ambiguous and inconsistent date formats like the one used in those filenames make processing your data more difficult. If the dates had been written in any ISO 8601 format then a trivial character sort would return the filenames in chronological order:
Using any ISO 8601 date format would be the simplest and most robust approach by far, and is what I strongly recommend. If you cannot change the format when the files are created, then the next best option would be to convert the timestamp into a DATETIME array and sort that, e.g.:
C = {'Serial1234 Front HI_RES 12_9_2020 01_34_22 PM.csv'
'Serial1234 Front HI_RES 12_9_2020 01_44_24 PM.csv'
'Serial1234 Front HI_RES 12_9_2020 10_26_12 AM.csv'
'Serial1234 Front HI_RES 12_9_2020 10_36_14 AM.csv'
'Serial1234 Front HI_RES 12_9_2020 10_46_16 AM.csv'
'Serial1234 Front HI_RES 12_9_2020 10_56_18 AM.csv'
'Serial1234 Front HI_RES 12_9_2020 11_06_20 AM.csv'};
D = regexp(C,'[\d_]+ [\d_]+ [AP]M','match','once');
T = datetime(D,'InputFormat','dd_MM_yyyy hh_mm_ss a')
T = 7×1 datetime array
12-Sep-2020 13:34:22 12-Sep-2020 13:44:24 12-Sep-2020 10:26:12 12-Sep-2020 10:36:14 12-Sep-2020 10:46:16 12-Sep-2020 10:56:18 12-Sep-2020 11:06:20
[~,X] = sort(T);
Z = C(X)
Z = 7x1 cell array
{'Serial1234 Front HI_RES 12_9_2020 10_26_12 AM.csv'} {'Serial1234 Front HI_RES 12_9_2020 10_36_14 AM.csv'} {'Serial1234 Front HI_RES 12_9_2020 10_46_16 AM.csv'} {'Serial1234 Front HI_RES 12_9_2020 10_56_18 AM.csv'} {'Serial1234 Front HI_RES 12_9_2020 11_06_20 AM.csv'} {'Serial1234 Front HI_RES 12_9_2020 01_34_22 PM.csv'} {'Serial1234 Front HI_RES 12_9_2020 01_44_24 PM.csv'}
Sophia Boucher
Sophia Boucher 2020년 12월 15일
Thank you SO much Stephen, extremely helpful, especially with the reasoning behind it. Appreciate the help :)

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Shifting and Sorting Matrices에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by