필터 지우기
필터 지우기

Find missing days in a date vector matlab

조회 수: 11 (최근 30일)
Meriem Deli
Meriem Deli 2016년 11월 4일
댓글: Peter Perkins 2016년 11월 10일
Hello I have a mat file that contains dates for exemple for the year 1975 in an ordered way but with many repetion(daily observations of Temperature in different stations), but there is some missing days so I need to detect those days and replace it by the missing one and put NaN for Temperaute at this specific index. To do this I have generated a date vector with datestr but could not compare between the two. The Temperature mat file is attached Could someone help me please, thanks

채택된 답변

Guillaume
Guillaume 2016년 11월 4일
If I understood correctly this would work:
origdata = load('Temp1975.mat');
origdata = table(origdata.Jour', origdata.Stations', origdata.T', 'VariableNames', {'Jour', 'Station', 'Temperature'}); %put everything in one variable for easier processing
stations = unique(origdata.Station); %get all the stations identifiers
alldays = datetime(1975,1,1) : datetime(1975,12,31); %generate all dates for 1975
alldays.Format = 'yyyyMMdd'; %to match your format
alldays = cellstr(char(alldays)); %convert to cell array of chars
[daysidx, stations] = ndgrid(1:numel(alldays), stations); %generate all combination of days and stations
fulldata = table(alldays(daysidx(:)), stations(:), NaN(numel(daysidx), 1), 'VariableNames', {'Jour', 'Station', 'Temperature'}); %Create full table prefilled with NaN for temperature
[isinorig, rowidx] = ismember(fulldata(:, [1 2]), origdata(:, [1 2]), 'rows'); %find which dates/station combination are present in original data
fulldata(isinorig, 3) = origdata(rowidx(isinorig), 3) %and copy that overwriting the relevant nans
I've kept your string format for the dates but I recommend you switch to using datetime, i.e.:
fulldata.Jour = datetime(fulldata.Jour, 'InputFormat', 'yyyyMMdd')
  댓글 수: 5
Guillaume
Guillaume 2016년 11월 7일
Never use length. If your char array is less than 8 rows (what you call strings), then length will return the number of columns, so 8, not the the number of strings. If you want to get the number of rows, use size(alldays, 1). This is guaranteed to work regardless of the number of rows and columns.
In my answer alldays is not a char array but a vector cell array. I converted the 365*8 char array into a 365x1 cell array (of 1x8 char array) using cellstr.
Meriem Deli
Meriem Deli 2016년 11월 8일
Thank you Guillaume, I got the table without any missing day.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Peter Perkins
Peter Perkins 2016년 11월 7일
Meriem, if you're using a version of MATLAB since R2014b, you might find using tables and datetimes make this pretty simple:
>> load Temp1975.mat
>> Jour = datetime(Jour','inputformat','yyyyMMdd','format','defaultdate');
>> Stations = categorical(Stations',unique(Stations'),strcat({'Station'},num2str(unique(Stations'))));
>> T = T';
>> t = table(Jour,Stations,T)
ans =
Jour Stations T
___________ ________ ____
01-Jan-1975 S164700 51.8
02-Jan-1975 S164700 57.2
03-Jan-1975 S164700 59
06-Jan-1975 S164700 55.4
07-Jan-1975 S164700 55.4
[snip]
You have, in effect, 19 time series, one for each station, all rolled up into one. You want them all to have the same time vector, in effect a time series with 19 measurements on each day. unstack will do that:
>> tw = unstack(t,'T','Stations');
>> tw = sortrows(tw,'Jour');
Jour S164700 S603600 S604750 S607100 S607140 S607150 S607200 S607250 S607350 S607400 S607450 S607500 S607600 S607650 S607690 S607750 S620020 S620070 S621030
___________ _______ _______ _______ _______ _______ _______ _______ _______ _______ _______ _______ _______ _______ _______ _______ _______ _______ _______ _______
01-Jan-1975 51.8 57.2 NaN NaN 57.2 55.4 55.4 51.8 55.4 55.4 53.6 57.2 62.6 60.8 55.4 51.8 NaN NaN NaN
02-Jan-1975 57.2 59 51.8 NaN 62.6 60.8 59 59 60.8 59 59 60.8 60.8 60.8 57.2 53.6 48.2 NaN 53.6
03-Jan-1975 59 62.6 53.6 59 60.8 62.6 59 60.8 60.8 60.8 57.2 62.6 60.8 62.6 59 57.2 51.8 NaN NaN
04-Jan-1975 NaN 59 48.2 59 59 57.2 57.2 57.2 55.4 55.4 57.2 55.4 64.4 59 57.2 57.2 53.6 59 60.8
05-Jan-1975 NaN 60.8 46.4 59 59 57.2 57.2 59 57.2 57.2 53.6 59 59 57.2 55.4 53.6 48.2 59 57.2
It turns out you're missing one day at all stations:
>> missingDay = setdiff(min(tw.Jour):days(1):max(tw.Jour),tw.Jour)
missingDay =
datetime
31-May-1975
>> tw{end+1,'Jour'} = missingDay;
>> tw{end,2:end} = NaN;
>> tw = sortrows(tw,'Jour');
There are lots of other ways to do this, but unstack makes it pretty simple. If you have access to R2016b, the latest release, you could use timetables and retime, like this:
>> tt = timetable(Jour,Stations,T);
>> ttw = unstack(tt,'T','Stations')
Jour S164700 S603600 S604750 S607100 S607140 S607150 S607200 S607250 S607350 S607400 S607450 S607500 S607600 S607650 S607690 S607750 S620020 S620070 S621030
___________ _______ _______ _______ _______ _______ _______ _______ _______ _______ _______ _______ _______ _______ _______ _______ _______ _______ _______ _______
01-Jan-1975 51.8 57.2 NaN NaN 57.2 55.4 55.4 51.8 55.4 55.4 53.6 57.2 62.6 60.8 55.4 51.8 NaN NaN NaN
02-Jan-1975 57.2 59 51.8 NaN 62.6 60.8 59 59 60.8 59 59 60.8 60.8 60.8 57.2 53.6 48.2 NaN 53.6
03-Jan-1975 59 62.6 53.6 59 60.8 62.6 59 60.8 60.8 60.8 57.2 62.6 60.8 62.6 59 57.2 51.8 NaN NaN
06-Jan-1975 55.4 59 50 57.2 59 59 59 59 60.8 57.2 55.4 59 59 57.2 55.4 55.4 NaN NaN NaN
07-Jan-1975 55.4 60.8 NaN 57.2 57.2 59 57.2 57.2 60.8 59 53.6 60.8 59 59 57.2 55.4 50 48.2 59
[snip]
>> ttw = retime(ttw,min(ttw.Jour):days(1):max(ttw.Jour),'fillwithmissing')
Hope this helps.
  댓글 수: 3
Guillaume
Guillaume 2016년 11월 8일
Probably, the best thing would be to stack the table back:
stack(ttw, 2:width(ttw), 'NewDataVariableName', 'Temperature', ...
'IndexVariableName', 'Station')
Peter Perkins
Peter Perkins 2016년 11월 10일
If the table is unstacked as in my example, you've already got all the data for each station separated out. Just access that variable. I used a categorical and named the categories things like S164700, so to get the data for that station out of the unstacked table, it's just
tw.S164700
and that returns a numeric column vector of temperatures.
Unstacked may or may not be the way you want the data organized, as Guillaume showed, you could unstack to get back to one series, and select data for each station using a logical condition on the Station variable. It all depends on what you are doing with the data.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Time Series Objects에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by