이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
I'm trying to organize data so it can easily be averaged by date
조회 수: 1 (최근 30일)
이전 댓글 표시
I have a cell of data where the first column is the full date. I'm being asked to organize it in a matrix so that if something like H(1,2,:), it would return all the data from the second day of January. The dates are in the format yyyymmddHHMM, if that helps. I'm kind of just looking for guidance on how to achieve something like this. Any help is appreciated
댓글 수: 2
Susan Santiago
2018년 10월 13일
I uploaded my workspace because there are many files and they're all in .dat which can't be uploaded here. And I think that is probably more clear anyway. The cell I'm concerned with is named C. Each row of C represents on data file.
채택된 답변
jonas
2018년 10월 13일
편집: jonas
2018년 10월 13일
"...something like H(1,2,:), it would return all the data from the second day of January."
Not very good in my opinion. How do you deal with the fact that different months have different number of days? By padding with NaNs?
It is much easier to put all your data in a timetable. You can then easily access specific days.
t = datetime(2000,1,1):days(1):datetime(2001,1,1);
TT = timetable(t,zeros(length(t),1))
You want to access data for a specific date? Easy:
TT('2001-1-1',:)
ans =
timetable
Time Var1
___________ ____
01-Jan-2001 0
댓글 수: 32
Susan Santiago
2018년 10월 13일
My professor specifically asked that it be done the way I specified so if possible, I'd like to do it in that way
jonas
2018년 10월 13일
편집: jonas
2018년 10월 13일
Perhaps your professor has never heard about timetables. It is particularily awkward to put your data in the third dimension, making it much harder to access. Also, you have like 100 variables, how do you want to keep track of them?
What does these variables even mean? "TimeStampstart" "TimeStampEnd". Can you clean up your workspace and attacha new .mat file with explanations?
Susan Santiago
2018년 10월 13일
It's the date, as I mentioned in my original question, it's the first column that I'm using for the time. I've never used timetables before but the main purpose for this is pretty much going to be for plotting like monthly averages for the different variables. Is timetable a good idea for that? And if so, any tips for making this cell into that format?
jonas
2018년 10월 13일
Yes, timetable is ideal for that. Much better than what your prof. suggested. I'll help you with a nice timetable and some example plots, then you can show that to your professor. I'm sure he will appreciate you finding a better way to solve the problem, or at least he should.
jonas
2018년 10월 13일
편집: jonas
2018년 10월 13일
% Convert time to datetime
t = datetime(num2str(FluxAmeriFluxFormat3{:,1}),'inputformat','yyyyMMddHHmm');
% Put data in timetable
TT = timetable(t,FluxAmeriFluxFormat3(:,3:end))
TT = splitvars(TT);
% Remove columns with categorical data
TT(:,varfun(@iscategorical,TT,'outputformat','uniform')) = [];
% Calculate monthly values
TTmonthly = retime(TT,'monthly','mean')
This gives you the monthly average for all variables. You only gave me one month worth of values, so I the output is just a single row. You can do lots of things with a timetable, interpolation and resampling is just one thing. It's also extremely easy to access the data, because the columns can be called by their variable names. For example
TT.CO2
will give you all values of CO2 and
TT('28-Aug-2018 00:30:00',:)
gives you all variables at that point in time. You want to plot all variables?
plot(TT.t,TT.Variables)
easy as that... and your data is already in datetime format so no need to format the axes.
Susan Santiago
2018년 10월 13일
This is great but I wanted to do it for all the data, not just from one file. Any way to change this so it works for all the data in cell C? That's where all the data is. In the cells of the first column
Susan Santiago
2018년 10월 13일
Also is there a way to put out the data like in the example I gave in my original question? Like what would be the indexing be if I wanted to just output all the data from one specific day?
jonas
2018년 10월 13일
편집: jonas
2018년 10월 13일
"Also is there a way to put out the data like in the example I gave in my original question?"
I'm not going to code that, because it is (in my opinion) a poor solution to a simple problem.
"Like what would be the indexing be if I wanted to just output all the data from one specific day?"
That would be simple with a timetable, and extremely annoying with your professors methods, the reason being that the time is not given in days, but in 30-minute intervals. In datetime you can calculate the daily average in a second and access a specific date.
"This is great but I wanted to do it for all the data, not just from one file. Any way to change this so it works for all the data in cell C? That's where all the data is. In the cells of the first column"
That is easy. However, there is one problem, apart from the data being organized in a really awkward way. It is impossible to concatenate the cells to a single array because there is missing data. Take a look at this:
C=
41×3 cell array
{ 71×14 double} { 71×29 double} { 71×8 double}
{332×14 double} {332×29 double} {332×8 double}
{ 59×14 double} { 59×29 double} { 58×8 double}
↑↑ ↑↑ ↑↑
The cell at position (3;3) is missing one value. From this cell array, it is impossible to determine which value that is missing (i.e. the time that no value was recorded). How do you want to deal with the missing data?
After concatenating all cells vertically, I obtained the following sizes: 7892x14, 7880x29 and 7879x8. That means that 12 values are missing from the second set of columns and 13 values from the third set of columns, with no way of knowing the exact row where those values are missing.
For the reason mentioned above, it is simply impossible to concatenate a full array/table/whatever, because of missing values. If you were to solve that somehow, then you would do the following to create a timetable:
idc = cellfun(@(x)strcmp(class(x),'cell'),C(1,:))
C(:,idc) = [];
Ca=vertcat(C{:,1});
% Convert time to datetime
t = datetime(num2str(Ca(:,1)),'inputformat','yyyyMMddHHmm');
% Put data in timetable
TT = timetable(t,Ca)
TT = splitvars(TT);
% Calculate monthly values
TTmonthly = retime(TT,'monthly','mean')
I have only used the 14 first variables here, where there is no missing data.
jonas
2018년 10월 13일
편집: jonas
2018년 10월 13일
One hint I can give you is that you will almost get the same number of rows on all variables if you delete each row with NaNs. I think the rows only differ by 1-2 in the end if you assume all rows with NaN should be deleted.
"yes, I do have raw data. I didn't notice this before but there's some data missing in the files, I don't know I think I should just do it the way my professor wanted."
Good luck with that. I guarantee that you will run into the exact same problem with missing data. An array must be be rectangular/cubic, it cannot have holes, just like a table.
Susan Santiago
2018년 10월 13일
There are some pieces of data that are missing and show up as NaN when I import the data but I guess the NaN isn't transferred over when it's converted to a cell. I don't know, I think it might just be better to just try to do it the way my professor wants it
jonas
2018년 10월 13일
편집: jonas
2018년 10월 13일
Yes I noticed that, as I wrote in the above comment. However, even if you remove all rows that are full of nans, the dimensions do not line up. Note that doing it "the way your professor wants it" is not going to solve the problem with missing data. I have given you a functional code. I guarantee that no one else will be able to solve this problem in another way, because there is simply missing information.
By the way, do you have a raw file like a .txt or .xls? I bet I would solve this if I could import the data myself.
Susan Santiago
2018년 10월 14일
This is the first three files. I can add more if you want but I just didn't want it to be overwhelming I guess. Thank you for continuing to try to help me figure this out. I'm kind of freaking out over this haha
jonas
2018년 10월 14일
편집: jonas
2018년 10월 14일
I will stick around until its resolved, if you want to use this approach. However, even if you want to revert to your other approach later, it would probably be wise to structure the data in a timetable first, so that you can calculate daily averages. I suggest importing the files with readtable, because you get everything in a single variable instead of a cell array.
Anyway, those files you gave me all have 55 columns, so they are easy to deal with. Do you have some more complicated files? In the first .mat file, there were multiple cells with different number of columns. Of course, this could also be due my import options... Are all files expected to have the same number of columns?
Will get some sleep now, but will check out this thread tomorrow.
Susan Santiago
2018년 10월 14일
I have 42 files in total. They all have the same number of columns. The cell C have 5 columns but each of the cells in those columns have the same number of columns in them and they should all add up to 55 as well. They're separated because I imported them using collectoutput and some of the columns are full of NaNs. I hope this makes sense. This is the code I used to create the cell if it helps.
a = dir;
i=3;
n = 1;
while i<=43
d = a(i).name;
A = textscan(fopen(d),'%f %f %f %f %f %f %f %f %f %f %f %f %f %f %s %s %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %s %s %f %f %f %f %f %f %f %f', 'Delimiter',',','Headerlines',1,'CollectOutput',true);
C(n,:) = A(1,:);
n = n+1;
i=i+1;
fclose('all');
end
jonas
2018년 10월 14일
편집: jonas
2018년 10월 14일
This is much better, no annoying missing data. I hope it works for your MATLAB release. Otherwise we have to adjust the code (readtable is constantly updating).
% Write variable names and number of files
VarNames = sprintfc('Var%g',1:55);
FileID = dir('*.dat')
% Write files and store in table
T = table;
for j=1:length(FileID)
% Read file
opts = detectImportOptions(FileID(j).name);
Ts = readtable(FileID(j).name,opts,'ReadVariableNames',false);
% Set variable names
Ts.Properties.VariableNames=VarNames;
% Remove columns full of 'NAN'
idc = varfun(@(x)strcmp(class(x),'double'),Ts,'outputformat','uniform');
Ts(:,~idc)=[];
% Concatenate, (preallocate for speed)
T = [T;Ts]
end
% Convert time to datetime
t = datetime(num2str(T{:,1}),'inputformat','yyyyMMddHHmm');
% Put data in timetable
TT = timetable(t,T(:,2:end));
TT = splitvars(TT);
TT = sortrows(TT);
% Calculate monthly values
TTmonthly = retime(TT,'monthly','mean');
Susan Santiago
2018년 10월 14일
편집: Susan Santiago
2018년 10월 14일
it's not working, I think it might be because I only sent you the first three files that you logically assumed all the files were named the same thing but that's actually not true, I should have made that clear, my mistake. That was why I used dir. They're named mostly the same thing but the last number changes from 0 up to 7. I should add that it works fine for the first three files but then it stops when it tries to get to _0(4) since that file doesn't exist.
jonas
2018년 10월 14일
편집: jonas
2018년 10월 14일
Oh ok, its not too hard to fix that, you already did it when you made your first import. This should do it for you, replacing the first part of the script:
% Write variable names and number of files
VarNames = sprintfc('Var%g',1:55);
FileID = dir('*.dat')
% Write files and store in table
T = table;
for j=1:length(FileID)
% Read file
opts = detectImportOptions(FileID(j).name);
Ts = readtable(FileID(j).name,opts,'ReadVariableNames',false);
...
This is assuming all .dat files in your current dir are files that you want to import. I've updated the previous post. I'd be interested to know if you manage to compile all of the files. If not, then you could send me some of the more problematic files.
Susan Santiago
2018년 10월 14일
I'm attaching all the files just to help you get a better idea of what's going on. I think there's still an issue. I don't know why but all the dates say April in the TT which definitely shouldn't be the case. Thanks so much
jonas
2018년 10월 14일
편집: jonas
2018년 10월 14일
Ops, my bad. I've updated the code. It seems to be working now. Finally! :)

Just add the appropriate variable names and you should be golden. Now, if you really want to revert to the "other format", then I'd first calculate the daily average.
TTdaily = retime(TT,'daily','mean');
then it should be fairly simple to reshape the table to an array. But as I've said a few times, in my honest opinion this timetable format is much much superior as you retain information about variable names and it's much easier to perform calculations and plot in timetable format.
Susan Santiago
2018년 10월 14일
편집: Susan Santiago
2018년 10월 14일
Thank you so so much! Just a few more things. What is splitvars doing? when I tied running the code, it said it was undefined? I'm using Matlab R2017b idk if it's different for that version? Also could you just help me out with indexing a little bit like what would I write if I wanted to plot the average value for one variable over a month, in general? It seems like the dates are still out of order? I see that sortrows is used so idk why that is but when I run it, the dates are all over the place
jonas
2018년 10월 14일
편집: jonas
2018년 10월 14일
Splitvars was introduced in 2018a. When a table is used as input to timetable it causes all variables to be concatenated in the same column. Splitvars simply splits the variables in separate columns. Sortrows is much older and should sort the table by dates in ascending order.. I will show you some indexing when your table is complete!
See attached m-file for new code. I have used table2timetable, introduced in 2016b, instead of splitvars. The variable TT should contain a sorted table having about 10k rows.
For monthly values you simple calculate the monthly average,
TTmonthly = retime(TT,'monthly','mean');
and then plot your desired variable
plot(TTmonthly.Properties.Rowtimes,TTmonthly.VariableName)
just replace "VariableName" with the actual variable names. You can set your own variable names by
TT.Properties.VariableNames = {var1,var2,var3...var51}
I opted to not import the variable names directly, because only some files had variable names as header.
Susan Santiago
2018년 10월 14일
so I renamed the variables by their actual names but in the line "TT = table2timetable(T,'RowTimes',T.Var1);" I don't know what to change the T.Var1 to because it says it doesn't recognize the variable. And I know you've said this isn't the best way to do it but could you show me how to turn this into the array because while this is great I don't think it's really what my professor was looking for. I really appreciate all the help
jonas
2018년 10월 14일
편집: jonas
2018년 10월 14일
What goes here
TT = table2timetable(T,'RowTimes',T.Var1)
↑
is the variable name of the column where the times are stored. Look at your table and replace the name with whatever it is called (probably t or time or something else you have named it). You can print the first row of your table and look what its called. Here is mine:
>> T(1,1)
ans =
table
Var1
____________________
02-Apr-2018 11:30:00
See that it says Var1 on top of the column? This is the name of the column.
Well, I will give it a try after you have:
- produced the correct table (becauase you are going to need it for the next step) and
- accepted the answer (because quite honestly the question was answered yesterday and now I'm just teaching you basic syntax).
Susan Santiago
2018년 10월 14일
sorry, I completely forgot about accepting the answer. I renamed it T.TIMESTAMP_START since that's what I called the variable and I received this error "Error using table2timetable (line 74) Row times must be specified as variable name or index, or as datetime or duration vector."
jonas
2018년 10월 14일
편집: jonas
2018년 10월 14일
You cannot name it "T.TIMESTAMP_START". Any column name must be a valid variable name and cannot contain the character ".". The name is actually "TIMESTAMP_START" without the T. When you have finally figured out how to make the table, then add these lines of code to get your weirdo matrix.
TTdaily = retime(TT,'daily','mean');
A = nan(12,365,50);
d = day(TTdaily.Time);
m = month(TTdaily.Time)
for i=1:numel(m);
A(m(i),d(i),:) = TTdaily{i,2:end};
end
A reference to
A(9,1,:)
gives you all variables (daily averages) at the 1st of september. This code will only work if all data is obtained within a single year. However, if the data is spanning several years, then this indexing method is borderline lunatic.
I will reply a bit less frequently now. I believe you have all the tools necessary to solve this on your own, so try yourself and ask if it doesn't work out, but don't expect instant reply.
Good luck!
PS: I've attached a .mat file with the organized data, so you have something to deliver if you cannot figure out how to generate it on your own machine ;)
Susan Santiago
2018년 10월 14일
I should have been more clear, I named the variable "TIMESTAMP_START" so I put in TT = table2timetable(T,'RowTimes',T.TIMESTAMP_START) and that was what I had put in when I got the error that I reproduced here. I just want to be clear that I 100% understand why you prefer the table and I think it's a lot more clear but I just don't think it was what was supposed to do. You have really went above and beyond helping me with this problem and I am very grateful
jonas
2018년 10월 14일
편집: jonas
2018년 10월 14일
Weird, did you double check that the variable name is correct? Just type T(1,1) after Matlab crash. Paste the output here and Ill check tomorrow.
Thats fine, if you have to delivery that then thats what you should do. Im just trying to teach good coding in general. I prefer to keep the timrtable in the main answer because others who find this thread in the future are probably better off opting for that approach.
Susan Santiago
2018년 10월 15일
편집: Susan Santiago
2018년 10월 15일
Thanks! One last thing, is there any way to change my weirdo code so it's not just giving a daily average but all the results from the day? One of the main uses with this matrix is gonna be plotting the data. Thanks again. And if you don't mind, how would I get just one variable from matrix?
jonas
2018년 10월 15일
편집: jonas
2018년 10월 15일
1. Yes and no. You have data every half hour if I remember correctly. You could make a fourth dimension of the matrix, and enter the "hour of day". Still, it would not work because you have half hours. You could make a fifth dimension called "minute of day"... you realize how absurd this method is becoming, especially since most 98% of minutes would be NaNs.
2. All variables are stored in the third dimension:
A(1,1,5)
outputs the "fifth" variable form the first of January. What is the fifth variable? You would have to compare with some kind of table every time you want to extract one variable.
I fully understand that you want to comply with your professors instructions. However, if you show him these two methods and explain the advantages of using tables (indexing by variable names, options for interpolation, easier access to specific dates, possibility of storing different classes, easier to plot as well as a variety of table-specific options that we have not even talked about) he/she would be crazy stubborn to opt for the array.
추가 답변 (1개)
Peter Perkins
2018년 10월 17일
"return all the data from the second day of January"
Imagine having this timetable:
>> tt = array2timetable(rand(100,2),'RowTimes',datetime(2018,1,1,0:8:792,0,0));
>> head(tt)
ans =
8×2 timetable
Time Var1 Var2
____________________ _______ ________
01-Jan-2018 00:00:00 0.85071 0.55903
01-Jan-2018 08:00:00 0.56056 0.8541
01-Jan-2018 16:00:00 0.92961 0.34788
02-Jan-2018 00:00:00 0.69667 0.44603
02-Jan-2018 08:00:00 0.58279 0.054239
02-Jan-2018 16:00:00 0.8154 0.17711
03-Jan-2018 00:00:00 0.87901 0.66281
03-Jan-2018 08:00:00 0.98891 0.33083
In recent versions of MATLAB (R2018a and later IIRC), you can do this:
>> tt(timerange('01-Jan-2018','day'),:)
ans =
3×2 timetable
Time Var1 Var2
____________________ _______ _______
01-Jan-2018 00:00:00 0.85071 0.55903
01-Jan-2018 08:00:00 0.56056 0.8541
01-Jan-2018 16:00:00 0.92961 0.34788
>> tt(timerange(datetime(2018,1,1),'day'),:)
ans =
3×2 timetable
Time Var1 Var2
____________________ _______ _______
01-Jan-2018 00:00:00 0.85071 0.55903
01-Jan-2018 08:00:00 0.56056 0.8541
01-Jan-2018 16:00:00 0.92961 0.34788
In earlier versions, you can do the same thing, with a bit more typing:
>> tt(timerange(datetime(2018,1,1),datetime(2018,1,2)),:)
ans =
3×2 timetable
Time Var1 Var2
____________________ _______ _______
01-Jan-2018 00:00:00 0.85071 0.55903
01-Jan-2018 08:00:00 0.56056 0.8541
01-Jan-2018 16:00:00 0.92961 0.34788
참고 항목
카테고리
Help Center 및 File Exchange에서 Tables에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
아시아 태평양
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)
