Sort cell strings according to specific subsets of those cell strings

Let's say I have a cell string with values:
filename = {'2009.272.17.57.23.8445.AZ.SMER..BHE.R.SAC';...
'2009.272.17.57.24.5500.AZ.FRD..BHN.R.SAC';...
'2009.272.17.57.27.5445.AZ.SMER..BHN.R.SAC';...
'2009.272.17.57.27.8000.AZ.SND..BHZ.R.SAC';...
'2009.272.17.57.27.9445.AZ.BZN..BHE.R.SAC';...
'2009.272.17.57.28.7000.AZ.SND..BHN.R.SAC';...
'2009.272.17.57.29.1250.AZ.FRD..BHZ.R.SAC';...
'2009.272.17.57.29.2250.AZ.PFO..BHE.R.SAC';...
'2009.272.17.57.29.3695.AZ.SMER..BHZ.R.SAC';...
'2009.272.17.57.29.9445.AZ.BZN..BHN.R.SAC';...
'2009.272.17.57.30.0000.AZ.RDM..BHN.R.SAC';...
'2009.272.17.57.30.8000.AZ.RDM..BHZ.R.SAC';...
'2009.272.17.57.31.8250.AZ.LVA2..BHZ.R.SAC';...
'2009.272.17.57.31.8500.AZ.LVA2..BHE.R.SAC';...
'2009.272.17.57.31.9195.AZ.BZN..BHZ.R.SAC';...
'2009.272.17.57.32.0000.AZ.WMC..BHZ.R.SAC';...
'2009.272.17.57.32.6750.AZ.WMC..BHN.R.SAC';...
'2009.272.17.57.33.3195.AZ.KNW..BHZ.R.SAC';...
'2009.272.17.57.33.4750.AZ.TRO..BHN.R.SAC';...
'2009.272.17.57.33.7750.AZ.PFO..BHN.R.SAC';...
'2009.272.17.57.33.9000.AZ.PFO..BHZ.R.SAC';...
'2009.272.17.57.34.1750.AZ.LVA2..BHN.R.SAC';...
'2009.272.17.57.34.8000.AZ.TRO..BHZ.R.SAC';...
'2009.272.17.57.35.0000.AZ.WMC..BHE.R.SAC';...
'2009.272.17.57.35.0750.AZ.RDM..BHE.R.SAC';...
'2009.272.17.57.35.8945.AZ.KNW..BHE.R.SAC';...
'2009.272.17.57.36.0250.AZ.FRD..BHE.R.SAC';...
'2009.272.17.57.36.2250.AZ.CRY..BHZ.R.SAC';...
'2009.272.17.57.36.3500.AZ.CRY..BHN.R.SAC';...
'2009.272.17.57.36.4500.AZ.SND..BHE.R.SAC';...
'2009.272.17.57.36.5000.AZ.TRO..BHE.R.SAC';...
'2009.272.17.57.36.5195.AZ.KNW..BHN.R.SAC';...
'2009.272.17.57.36.5750.AZ.CRY..BHE.R.SAC'};
I want to be able to assume that I do not know what character the station name (e.g., CRY) or component name (e.g., BHE) starts and ends on. Though, the number of periods (".") will be consistent.
I have something fairly clunky to do this, but I am wondering if anyone can suggest a quick one/two-liner that would assume a string format of the general form:
YYYY.DDD.HH.MM.SS.ssss.$1.$2..$3.R.SAC
where:
$1 = Array name $2 = Station name $3 = Component name
And then sort the list with the primary and secondary sort order according to $2 and $3, respectively, so that the first 6 rows in the cell string would be:
2009.272.17.57.27.9445.AZ.BZN..BHE.R.SAC
2009.272.17.57.29.9445.AZ.BZN..BHN.R.SAC
2009.272.17.57.31.9195.AZ.BZN..BHZ.R.SAC
2009.272.17.57.36.5750.AZ.CRY..BHE.R.SAC
2009.272.17.57.36.3500.AZ.CRY..BHN.R.SAC
2009.272.17.57.36.2250.AZ.CRY..BHZ.R.SAC
...

댓글 수: 4

In the example you show, the number of characters for each component is exactly the same for each line. Is that a rule for your situation? If so then the sort becomes quite simple.
Yes, they are the same. I would definitely like to see the simple approach, but I would also like to know the more general approach for future applications if say, in my example, $2 and $3 were to be different lengths inside the specified periods.
It looks like the parts do *not* have the same length:
'2009.272.17.57.33.9000.AZ.PFO..BHZ.R.SAC'
'2009.272.17.57.34.1750.AZ.LVA2..BHN.R.SAC'
Oh, his question was related to the "component" name, which are all the same number of characters (i.e., 3). The "station" names are not the same - they range from 3 to 4 characters.

댓글을 달려면 로그인하십시오.

 채택된 답변

% Split using |'.'| as the delimiter
splt = regexpi(filename,'\.','split');
% Sort according to the 8th and 10th column
[sorted,idx] = sortrows(cat(1,splt{:}),[8,10])
Now you can use the sorted split array or apply idx to filename

댓글 수: 2

Just what I was looking for. Thanks, Oleg!
+1 for the compact REGEXP call.

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

Jan
Jan 2012년 1월 22일
filename = {'2009.272.17.57.23.8445.AZ.SMER..BHE.R.SAC';...
'2009.272.17.57.24.5500.AZ.FRD..BHN.R.SAC';...
'2009.272.17.57.27.5445.AZ.SMER..BHN.R.SAC';...
'2009.272.17.57.27.8000.AZ.SND..BHZ.R.SAC';...
'2009.272.17.57.27.9445.AZ.BZN..BHE.R.SAC';...
'2009.272.17.57.28.7000.AZ.SND..BHN.R.SAC';...
'2009.272.17.57.29.1250.AZ.FRD..BHZ.R.SAC';...
'2009.272.17.57.29.2250.AZ.PFO..BHE.R.SAC'};
n = numel(filename);
C2 = cell(1, n);
C3 = cell(1, n);
for iC = 1:n
D = textscan(filename{iC}(27:end), '%s', 'Delimiter', '.');
C2{iC} = D{1}{1};
C3{iC} = D{1}{3};
end
% A kind of SORTROWS:
[dummy, ind3] = sort(C3);
[dummy, ind2] = sort(C2(ind3));
index = ind3(ind2);
filename = filename(index);

댓글 수: 3

Thanks, Jan! In the end, I was looking for something more general in the event that the station name and component name were not the same number of characters. I also was looking for something that could mix and match the primary and secondary sort ordering... like in Oleg's response.
If I want the primary and secondary sort order to be the station name and component name (as in the example I gave) then I would use his "idx" from "sortrows(cat(1,splt{:}),[8,10])"... but if I wanted to flip the sort order, I could do this easily be using the "idx" created from "sortrows(cat(1,splt{:}),[10,8])" where the 10 and the 8 are now flipped.
Thanks for the updated code... +1!
While Oleg's REGEXP is much nicer than calling TEXTSCAN in a loop, SORTROWS does exactly the same as my sorting method, but with a lot of overhead.

댓글을 달려면 로그인하십시오.

Dr. Seis
Dr. Seis 2012년 1월 22일
Here is the clunky version I have been using:
numFiles = numel(filename);
sortcell = {''};
sortind = zeros(numFiles,4);
for i = 1 : numFiles
sortind(i,2)=strfind(filename{i},'..')-1;
for j = sortind(i,2):-1:1
if isequal(filename{i}(j),'.')
break;
end
sortind(i,1)=j;
end
sortind(i,3)=sortind(i,2)+3;
for j = sortind(i,3):length(filename{i})
if isequal(filename{i}(j),'.')
break;
end
sortind(i,4)=j;
end
sortcell(i,1)=cellstr(filename{i}(sortind(i,1):sortind(i,2)));
sortcell(i,2)=cellstr(filename{i}(sortind(i,3):sortind(i,4)));
end
[tempcell,tempind1]=sort(sortcell(:,2));
[tempcell,tempind2]=sort(sortcell(tempind1,1));
filename = filename(tempind1(tempind2));

카테고리

도움말 센터File Exchange에서 Cell Arrays에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by