필터 지우기
필터 지우기

Regexp to filter file names

조회 수: 71 (최근 30일)
maxroucool
maxroucool 2017년 10월 19일
댓글: maxroucool 2017년 10월 20일
Hello,
I would like to use a regexp to filter file names contained in a folder. I almost got it, but I have trouble handling cell that output the dir() function...
Let's imagine I have these files in the data folder :
DY463269-F 01-01-2017.xlsx
DY463269-F 01-01-2017.xlsx
DY463271-8 01-01-2017.xlsx
DY466290-M 01-01-2017.xlsx
My code is :
filesList = dir('data/');
serialList = regexpi({filesList.name}, '[a-z]{2}[0-9]{6}[a-z\-]{0,2}', 'match')
dateList = regexpi({filesList.name}, '[0-9]{2}-[0-9]{2}-[0-9]{2}', 'match')
data = cell2struct([serialList; dateList],{'Name','Date'},1)
data(3).Name
First, I guess we can do only one regexpi to get both information... But mostly, data(3).Name return me a cell and I would like it to simply be a string.
Any idea to fix this?
Thanks,
Max

채택된 답변

Walter Roberson
Walter Roberson 2017년 10월 19일
filenames = {'DY463269-F 01-01-2017.xlsx'
'DY463269-F 01-01-2017.xlsx'
'DY463271-8 01-01-2017.xlsx'
'DY466290-M 01-01-2017.xlsx'};
fileinfo_cell = regexpi( filenames, '^(?<serial>[a-z]{2}\d{6}[a-z\-]{0,2})\s+(?<date>\d\d-\d\d-\d\d)', 'names','lineanchors');
fileinfo = vertcat(fileinfo_cell{:});
fileinfo will then be a struct array with fields "serial" and "date".
With that particular set of data, the struct will have 3 elements, because DY463271-8 01-01-2017.xlsx does not match the pattern (the serial ends in -8 but the pattern does not permit numbers at that point.)
A generalization of the pattern would be
fileinfo = regexpi( filenames, '^(?<serial>\w\w\d{6}[\w-]{0,2})\s+(?<date>\d\d-\d\d-\d\d)', 'names','lineanchors')
Each \w corresponds to [a-z0-9_] -- the "word-building" characters. What you have now is more focused and you might need that focus, but sometimes it is useful to be more lenient.
  댓글 수: 1
maxroucool
maxroucool 2017년 10월 20일
Perfect thank you Walter!

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Guillaume
Guillaume 2017년 10월 19일
Note that the reason you get cell arrays in your structure is because as you've used it regexpi returns a cell array of cell arrays of char arrays as there can be more than one match per filename. Adding the 'once' option to these regexpi calls would tell them that there can only ever be more than one match and as a result you'd get a cell array of char arrays, which would give you the structure you expect.
serialList = regexpi({filesList.name}, '[a-z]{2}[0-9]{6}[a-z\-]{0,2}', 'match', 'once');
dateList = regexpi({filesList.name}, '[0-9]{2}-[0-9]{2}-[0-9]{2}', 'match', 'once');
  댓글 수: 1
maxroucool
maxroucool 2017년 10월 20일
Perfect thank you Guillaume!

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 File Operations에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by