필터 지우기
필터 지우기

regexp to filter file names

조회 수: 3 (최근 30일)
chlor thanks
chlor thanks 2016년 7월 5일
댓글: Image Analyst 2016년 7월 5일
I have files such as the following:
s =
'HI_B2_TTTT9_Default452_07052016.xlsx'
'HI_H2G_TTTT7_Default259_070516.xlsx'
'HI_B2C_TTTT9_Default1482_070516.xlsx'
'HI_A1C_TTTT4_468_070516.xlsx'
'HI_G1C_TTTT8_862_07052016.xlsx'
'HI_KA6_TTTT4_148_07052016.xlsx'
'HI_8C_TTTT7_279_Potato_07052016.xlsx'
I only wish to process the first six files and filter out the last one which is a different format than the first six files. Note that even though some of them did not say "Default" in the file names, it is still considered default since it did not specifically mention "Potato" or other keywords.
I try not to filter it out by keywords "Potato" since there may be future files add in this cell array that contains other keywords such as "Carrot", "Bacon", etc (I don't know what they will be yet) other than "Potato". In that case, they will not be filtered out as I wish they would.
Actually I think I figure out the code after looking at your answers?
I used find(cell2mat(regexp(s,'HI_\w+_\TTTT\d_(Default)?\d+_\d+')))
Thank y'all for all the inspiration!!

채택된 답변

Azzi Abdelmalek
Azzi Abdelmalek 2016년 7월 5일
s={'HI_A1C_TTTT4_468_07052016.xlsx'
'HI_B2_TTTT9_Default452_070516.xlsx'
'HI_GA1C_TTTT8_862_07052016.xlsx'
'HI_HB2C_TTTT7_Default259_070516.xlsx'
'HI_KA6_TTTT4_148_07052016.xlsx'
'HI_B2C_TTTT9_Default1482_070516.xlsx'
'HI_8C_TTTT7_279_Potato.xlsx'}
out=regexp(s,'\w+_\w+_\w+_(Default)?\d+_\d+','match','once')

추가 답변 (1개)

Image Analyst
Image Analyst 2016년 7월 5일
What's unique about the filenames you want to keep? Do they all end in 16 like in your small sample? If so do
fileStruct = dir('*16.xlsx');
Now, just use fileStruct(k).name in your loop or wherever you need to reference the filename.
  댓글 수: 2
chlor thanks
chlor thanks 2016년 7월 5일
Thank you for providing another insight to do this!
However, it will not work very well in my particular case (I fixed this particular little bug in my updated question...I made the question up so that I can rewrite the code later by myself.)
The filenames are unique taking the example of 'HI_A1C_TTTT4_468_07052016.xlsx':
HI may stands for a particular program name
A1C may stands for a particular operation within it
TTTT4 stands for who performed this operation
468 stands for the task number
07052016 stands for the date the file is made (you will notice that sometimes it is 070516 and sometimes it is 07052016 depends on how the person feel when they save the file...)
So the purpose of this regexp is to extract these files out of hundreds of other files that I have and I will later parsing these info using "split", but that's a different story...
Image Analyst
Image Analyst 2016년 7월 5일
OK, though I'm still not sure what constitutes a good filename and a bad one. If it's just the presence of some list of keywords defined in advance, you might look at ismember to identify what strings, in a cell array of filenames, have any of the keywords in them.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 String Parsing에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by