Efficient way of reading cell arrays

Question

djr 2014년 8월 4일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/146273-efficient-way-of-reading-cell-arrays

편집: per isakson 2016년 12월 4일

Hi

What would be the most efficient way to read content from cells; is it using for loops or cellfun functions. For example, My data have 3 layers of cells (cell in cell in cell). So, I used 3 'for loops' to get to data that are in the third cell. The code looks like this:

for j=1:size(evaldata,1)
      for k=1:size(evaldata{j,1},1)
          for l=1:size(evaldata{j,1}{k,2},1)
              A{l,:}={evaldata{j,1}{k,1},(sscanf(evaldata{j,1}{k,2}{l,1}, '%f').')};
              %
              if ismember(A{l,1}{1,2}(2),v)
                  store{j}{l,k}{1,1}=A{l}{1,1};
                  for w=2:14
                      store{j}{l,k}(1,w)={A{l}{1,2}(w-1)};
                  end
              else
                  store{j}{l,k}='not selected';
              end
          end
      end
      fz=cellfun(@isempty,evaldata{1,1});
      fzz=find(fz(:,2));
      for kk=1:size(fzz,1)
          store{j}{1,fzz(kk)}=evaldata{1,1}{fzz(kk),1};
      end
  end

It probably does not mean much without explaining what it does... However, this looks like a brute force. Is there a bettwer way to deal with multiple array cells that are not the same size or declaration (strings, numbers, ...)?

Last few days I asked for help several times. This time I decided to do it by muself and I ended up having 100 loops :( :( :(. Maybe because I only used Fortran when it comes to programing.

Cheers, Djordje

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

djr 2014년 8월 5일

I need 20 min just to catch how all these loops are running and how they are connected to each other. I'll try tomorrow to solve it in a more sufficient way.

And yes... it does work. Somehow. :D

Image Analyst 2014년 8월 5일

Well you're right that it doesn't mean much without any explanation. No comments at all , and it looks so cryptic that I didn't even attempt to figure out what it does. It looks like you're trying to transfer some data from A and evaldata into store, but that's about all I got from it.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

per isakson 2014년 8월 5일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/146273-efficient-way-of-reading-cell-arrays#answer_147392

편집: per isakson 2016년 12월 4일

MATLAB Online에서 열기

add_met_data_to_lib.m

I continue where we left in your last question. I assume

There are many data-files in one folder
The names of the data-files match '\d[4}_\d{4}', i.e four digits, underscore followed by four digits. It's hard-coded.

The function, add_met_data_to_lib, can be used repeatedly to read data-files in many folders. Syntax

Create new lib

lib = add_met_data_to_lib('h:\m\cssm\*_*.txt')

add to existing lib

lib = add_met_data_to_lib('h:\m\cssm\*_*.txt', lib )

Example:

    >> lib = add_met_data_to_lib('h:\m\cssm\*_*.txt') 
    lib = 
      Map with properties:
            Count: 2925
          KeyType: char
        ValueType: any
    >> lib('19491020T0600')
    ans =
       1.0e+03 *
      Columns 1 through 9
        0.0010    0.0110    0.0804    0.0526    1.0232   -0.0001    0.0026    0.0058    0.4113
        0.0020    0.0010    0.0726    0.0449    1.0288   -0.0003    0.0050    0.0057    0.9271
        0.0030         0    0.0393    0.0497    1.0288   -0.0004    0.0100    0.0105    0.1410
      Columns 10 through 13
        0.0020    0.0008    0.0790    0.0537
        0.0004    0.0001    0.0747    0.0450
        0.0018    0.0005    0.0360    0.0524
    >>

where

    function    lib = add_met_data_to_lib( glob, lib )
        narginchk( 1, 2 )
        if nargin == 1
            lib = containers.Map('KeyType','char','ValueType','any');        
        end
        %
        file_list = transpose( dir( glob ) );
        %
        is_met_file = not( isempty( regexp( [file_list.name] ...
                                          , '\d{4}_\d{4}', 'start' ) ) );
        file_list( not( is_met_file ) ) = [];
        %
        for file = file_list
            lib = met2lib( file.name, lib );
        end
    end

댓글 수: 23
이전 댓글 21개 표시이전 댓글 21개 숨기기

djr 2014년 8월 5일

MATLAB Online에서 열기

pow.m

OK, thanks. This is the whole problem about these files. The file Kos.dat has dates for which I need data from met data files(these files contain some meteorological characteristics of anticyclones that were detected in Kos.dat dates). So this is the step that we (you) completed. Now I need to inspect this new set of data (anticyclones for Kos.dat dates).

First things first, if you take a look at any of the lines (data) that we read from met files they look like this (e.g. Jan 1st 1949 at 0000):

 k io     lon     lat       f       c      dp      rd      zs      up      vp    lonv    latv
*0*   77.20   43.17 1038.47  -0.268   7.442   7.1651644.929   0.000   0.000   75.96   41.45
*0*   59.74   45.38 1036.33  -0.237   5.786   7.805 146.842   1.161   0.339   58.27   48.36
*10*   44.15   40.01 1034.52  -0.042   1.419   5.0761082.190   0.000   0.000   43.36   40.20

The second number in the line is always 0, 10, 1 or 11 (in this case first two lines have 0 and the third one has 11); I put it in bold font (stars). These numbers represent the strength of anticyclones that were detected. FIRST, I need to create a check-box window that will allow me to further 'clean' the dataset based on the input. I managed to create function that will do ask me to do that (attached). But I don't know how to connect that function with our data base. So, when I check 1 (strong closed) I would like to 'clean' our data set in a way that all anticyclones (lines of data) that are not 1 (which means 0, 10 and 11) are gone. But also if we check for example 1 and 10 on the checkbox than all 1's and 10's should stay, and 0 and 11 disappear. So the cleaned data set would only have data with 1 and 10 in the second column.

Would this be possible? This is one of these steps that I need to do in order to process the data.

Please ask me to clarify if I was to confusing.

per isakson 2014년 8월 5일

편집: per isakson 2014년 8월 5일

"Would this be possible?" &nbsp The short answer is "it's a piece of cake", but there is also a longer answer.

I use the term, database, for the containers.Map object, lib with all the data.

The Kos.txt contains 9000+ lines and the database will be "large". The response times of the GUI might become a problem. Matlab is not FORTRAN. Furthermore, GUIs are a lot of work to make an tedious to use.
I'm a bit uneasy about taking the first step without knowing in what direction the second and third are supposed to head
You describe kind of SELECT * WHERE ... statements. (SQL-words.)
How many times will this software be used? And by how many? I guess you will use it to produce a couple of tables for a paper or report. Thus it's a piece of run-once-code.
Where should the result of the SELECT be stored? In another database? Or in a text file?
I believe that you would be better served be an m-function with one code section per SELECT. See Run Code Sections. (My code often contains "%%".) This m-function may serve as a concise log-file, by adding comments.

djr 2014년 8월 5일

편집: per isakson 2014년 8월 5일

vdist.m

Step 2. The next two step are similar to each other. Columns 3 and 4, and last two columns of the line contain latitude and longitude of the anticyclone center (line 3 and 4, say C1) and anticyclonic vorticity centre (the last two, say C2). At the same time I have coordinates of 5 weather stations: BG, NS, VR, VG, SP. I have to calculate distances between C1 and weather station and C2 and weather station. That means, I need one more checkbox, or some GUI, that will ask me which of these 5 stations I want to use. This would be that step 2.

Step 3. A certain term can have more than one anticyclone detected. In the above example, we have two with status 0 (strong anticyclones). For my further analysis (which will be a regression analysis), I need to have only one anticyclone per term. So based on step 2, I want to keep an anticyclone that is the closest to the weather station that I am analysing. But there are two options, closest regarding distance from C1 or distance from C2. Therefore, I need one more checkbox that will ask do I want to measure distance from C1 or C2.

&nbsp

Summary: Three ways to analyze data:

Select data based on strength of antyciclone (step 1)
Calculate distances from anticyclone centers (step 2)
Further select data based on "which one is the closest to the weather station (step 3).

I am using this for a paper (a part of my thesis). Therefore, I will use it many times. That's why I would like to have an easy-to-use inteface. Besides me, 3 more people would (probably) use it. This whole analysis is for anticyclones. I have to do the same for cyclones, but the data format is the same so this script will be perfectly good for cyclones as well (however, one more reason to make it easy to use, and flexible).

I attached a script that calculates distances between two coordinates.

&nbsp

My idea was to create one script that would start wiht a GUI window that has 3 parts:

checkbox for step 1 (to select strenght) - because I should be able to check more then 1 option, then checkbox looks like a good tool;
drop down menu that will have these five weather stations (of cours, giving me only one option to choose)
drop down menu or checkbox that will ask me do I want minimal distance from C1 or C2.

This is the whole story...

P.S. I am reading the Run Code Sections now. As I said. I woild love to learn Matlab, but the thing is that I should have these results asap and I don't have time to start from basic tutorials (that I will go through as soon as I find some time for it).

per isakson 2014년 8월 6일

편집: per isakson 2014년 8월 6일

MATLAB Online에서 열기

"certain term can have more than one anticyclone" &nbsp What is the definition of "term". Is this

    k io   lon   lat       f      c    dp ...
1 81.64 46.37 1028.49 -0.220 3.460 ...
10 69.26 46.84 1030.67 -0.111 3.491 ...
0 54.71 50.24 1028.85 -0.339 8.402 ...

one or three "terms" or am I totally mistaken?

&nbsp

A bit scary: The following excerpt is from the lines 182421 through 18245 of 1948_1950.txt.

        dp      rd      zs      up 
460   6.9911133.348   0.000 
491   6.806 451.073   1.089 
402  10.986 147.814  -0.004

This is fixed format of old time FORTRAN (I assume). There is no delimiter between the columns. Space is just padding. I did assume that the columns were delimited by one or more space characters. Thus, we have a bug.

&nbsp

ASAP &nbsp In Swedish we have a saying with the approx. meaning shortcuts often takes longer.

&nbsp

Thesis work &nbsp Late in the work when difficult questions are posed (or doubt comes creeping) it is very valuable to be able to rerun all calculation automatically over a weekend.

&nbsp

Data structures &nbsp How do you want to store and present the results of step 2 and 3?

djr 2014년 8월 6일

Term means the hour when anticyclone was detected. 49010100 is a term (so date+hour is a term). For instance, term 49010100 had 3 observations (you'll see in the data base). Basically, our keys are terms. I chose a poor definition I guess, but English is not my first language, although I moved to Canada about 2 years ago.

You're right about the bug. The 'rd' column always has 3 decimal digits so what is after the third digit is 'zs'. And yes, the cyclone/anticyclone tracking scheme that I have used was written in Fortran (actually Fortran 77).

The reason I need it soon is because we almost completed a paper and this is the last piece that we need. Therefore, my supervisor is constantly asking me to finish, so that we can submit it as soon as possible.

I think that an ascii file would be good. Then I can look at it on Windows (I am using Linux mostly and my Matlab is on Linux Ubuntu 12.04). I can send it to my supervisor, co-authors and I can easily import it in Matlab to perform the regression analysis. Do you thing that some other data format would be a better option?

Thank you so much for helping me.

djr 2014년 8월 6일

MATLAB Online에서 열기

The Fortran output statement (the last WRITE is writes down the values):

The cyclone position file consists of a concatenated series of single
analysis time records of variable length depending on the number of
cyclones found.  Each record contains a header line, a record of control 
parameters and quantity information in namelist form, a row of column 
headers, and one row of numerical data for each cyclone position.
The header line should conform to the format given by statement 240 
below.  The number nnmlc is needed for reading the namelist record,
itabc* are needed for interpreting the tabulation, and da and hr are
needed for time management. 
The Fortran statements are
      namelist /nmlcycdat/quant,level,lunit,source,unit,cunit,area,
     * dmode,rdiff,hilo,feat,iopmxc,istmxc,latmnc,latmxc,lonmnc,lonmxc,
     * nshell,mscrn,sdrmx,drmx1,drmx2,itmx1,itmx2,diflt1,diflt2,iconcv,
     * icendp,cvarad,cmnh,cmnc0,cmnc1,cmnc2,swvmn,dpmn,fccmn,
     * fmxc,frmxc,frcmxc,cmnhw,cmncw,dpmnw,swvmnw,rdincr,nrddir,sphtrg,
     * zsmax,zscr1,zscr2,qsteer,rdustr,npgdir,alatgv,rhoa,upfact
c ...
      write (iunit)                  ! blank line separating headers
      write (iunit,'(a80)' chead     ! header line
      if (lnblnk(chead).eq.0) go to 400
c ...
      write (chead,240,err=440,end=430) da,hr,nnmlc,nnmlcp4,
     * itabc1,itabc2,itabc3,itabc4,itabc5,nk
  240 format (' CENTRES:  ',i6,x,i4,' (NNML=',i2,',',i2,';ITABC=',
     * 2i1,i2,i1,i2,'), ',i3,x,a)
c ...
      write (iunit,'()',end=460)
      write (iunit,'(a100)',end=470) ! namelist record written to file and 
     * (lnmlc(inmlc),inmlc=1,nnmlc)  ! read/written as 80 character strings
c ...
      write (iunit,'(a/)',end=510)   ! tabulation headers
     * tabhead
      do 500 k = 1,nk
        write (iunit,fmt) 
     *   k,                          !                  sequential number
     *   iopc(k),                    !                  cyclone status
     *   (xc(k),yc(k),               !(if itabc2 >= 1)  cyclone position
     *   fc(k),                      ! if itabc3 >= 1   field variable
     *   cc(k),                      ! if itabc3 >= 2   averaged Laplacian
     *   dpc(k),                     ! if itabc3 >= 3   depth  (NOT IN USE)
     *   rdc(k),                     !      " "         radius (NOT IN USE)
     *   zsc(k),                     ! if itabc3 >= 4   togographic height
     *   (upc(k),vpc(k),             ! if itabc4 >= 1   steering velocity
     *   (xv(k),yv(k),               ! if itabc2 >= 2   vorticity centre
     *   (sc(k,isup),isup=1,itabc5)  ! if itabc5 >= 1   supplementary vars.
  500 continue

Ascii tab delimiter file. First column to be date and hour (our keys) and other columns are the outputs of the tracking scheme, except 'k' (that is column 1 in our data (sequential number of observed anticyclone) and instead of 2 components of the steering velocities (columns 10 and 11) I would prefer the intensity (sqrt(upc^2 + vpc^2)).

Then, when I run our code for cyclone case I will get 1 more asci files and it is easy to merge them together. The very last file will have date; acyc data, cyc data, pressure gradient column and wind speed column. Where wind speed will be my dependent variable and all other columns will be predictors (except data/hour column of course).

Does this help?

djr 2014년 8월 6일

편집: per isakson 2014년 8월 7일

MATLAB Online에서 열기

acyc_output.txt

Hi,

I made a template for output file (attached). The output is for anticyclone case and therefore there is 'a' at the and of each variable name. I used the same names for variables as they appear in the dateset (i.e. tracking scheme output).

I used the following command to create the output file:

T = table (datetime, ioa, fa, ca, dpa, rda, Vsa, dist_pa, dist_va)
writetable(T, 'acyc_output.txt', 'Delimiter', '\t', 'WriteRowNames',true)

Few notes:

dist_pa would be a distance from the pressure center to weather station (first pair of coordinates (columns 3 and 4))
dist_va would be a distance from the vorticity center to weather station (columns 12 and 13)
I don't need first column (named 'k' in the database)
I don't need 9th column (named 'zs' in the datebase)
And I don't need lat and log, but instead distances (dist_pa and dist_va) that will be calculated based on the lat/lon data.

Hopefully this helped.

P.S.

My e-mail is dj**************om

per isakson 2014년 8월 7일

Yes, this becoming too specific for this forum. I sent a mail and blurred your mail-address

댓글을 달려면 로그인하십시오.

Answer 2

Ahmet Cecen 2014년 8월 5일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/146273-efficient-way-of-reading-cell-arrays#answer_147387

MATLAB Online에서 열기

This really doesn't make much sense without context to me. This is an incredibly inefficient data storage configuration. From what I understand, you can access everything you need in 3 for loops. (for the first eval.mat)

 for i=1:14
     currentI=eval{i}
     for j=1:size(currentI,1)
         header1=currentI{j,1};
         currentJ=currentI{j,2};
         for k=1:size(currentJ,1}
             currentdata=currentJ{k,1}
             PUT YOUR EVALUATION FUNCTION HERE
         end
     end
 end

With this, at the evaluation step you have the following information: '0101194900' 1 0 77.20 43.17 1038.47 -0.268 7.442 7.1651644.929 0.000 0.000 75.96 41.45

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Efficient way of reading cell arrays

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

채택된 답변

댓글 수: 23
이전 댓글 21개 표시이전 댓글 21개 숨기기

추가 답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

Efficient way of reading cell arrays

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

채택된 답변

댓글 수: 23 이전 댓글 21개 표시이전 댓글 21개 숨기기

추가 답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 23
이전 댓글 21개 표시이전 댓글 21개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기