How I can eliminate days with incomplete hours?
조회 수: 1 (최근 30일)
이전 댓글 표시
Hi everyone. I can't find a solution to my issue in the web. I have hourly averages data from a meteorological station. However there are some missing hours due to the lack of measurements. In this cases, I would like to remove the complete day, in order to keep only the days in which there are measurements in a whole day (24 data per day). Look at this example; The meteorological station is giving the next output, in which columns 1,2,3 and 4 corresponds to year, month, day and hour, respectively, and column 5 correspond to the measurement:
2010 1 1 0 12.3
2010 1 1 1 12.4
2010 1 1 2 3
2010 1 1 3 34.7
2010 1 1 4 54
2010 1 1 5 565
2010 1 1 6 2
2010 1 1 7 334
2010 1 1 8 23
2010 1 1 9 223
2010 1 1 10 32
2010 1 1 11 233
2010 1 1 12 544
2010 1 1 13 23
2010 1 1 16 12
2010 1 1 17 12
2010 1 1 18 3
2010 1 1 19 34
2010 1 1 20 54
2010 1 1 21 565
2010 1 1 22 2
2010 1 1 23 334
2010 1 2 0 23
2010 1 2 1 223
2010 1 2 2 32
2010 1 2 3 233
2010 1 2 4 544
2010 1 2 5 23
2010 1 2 6 12
2010 1 2 7 12
2010 1 2 8 3
2010 1 2 9 34
2010 1 2 10 54
2010 1 2 11 565
2010 1 2 12 2
2010 1 2 13 334
2010 1 2 14 23
2010 1 2 15 223
2010 1 2 16 32
2010 1 2 17 233
2010 1 2 18 544
2010 1 2 19 12
2010 1 2 20 12
2010 1 2 21 3
2010 1 2 22 34
2010 1 2 23 54
2010 1 3 0 565
2010 1 3 1 2
2010 1 3 2 334
2010 1 3 3 12
2010 1 3 5 12
2010 1 3 6 3
2010 1 3 7 34
2010 1 3 8 54
2010 1 3 9 565
2010 1 3 10 2
2010 1 3 11 334
2010 1 3 12 23
2010 1 3 13 223
2010 1 3 14 32
2010 1 3 15 233
2010 1 3 16 544
2010 1 3 17 23
2010 1 3 18 12
2010 1 3 19 12
2010 1 3 23 3
2010 1 4 0 34
2010 1 4 1 54
2010 1 4 2 565
2010 1 4 3 2.2
2010 1 4 4 334
2010 1 4 5 23
2010 1 4 6 223
2010 1 4 7 32
2010 1 4 8 233
2010 1 4 9 544
2010 1 4 10 12
2010 1 4 11 12
2010 1 4 12 3
2010 1 4 13 34
2010 1 4 14 54.7
2010 1 4 15 565
2010 1 4 16 2
2010 1 4 17 334
2010 1 4 18 23
2010 1 4 19 223
2010 1 4 20 32
2010 1 4 21 233
2010 1 4 22 544.6
2010 1 4 23 344.4
In the table of above, there were some missing hours; in day 1: hour 14 and hour 15; in day 3: hour 4, hour 20, hour 21 and hour 22. For that reason, is necessary don't take into account day 1 and day 3, and generate the next output:
2010 1 2 0 23
2010 1 2 1 223
2010 1 2 2 32
2010 1 2 3 233
2010 1 2 4 544
2010 1 2 5 23
2010 1 2 6 12
2010 1 2 7 12
2010 1 2 8 3
2010 1 2 9 34
2010 1 2 10 54
2010 1 2 11 565
2010 1 2 12 2
2010 1 2 13 334
2010 1 2 14 23
2010 1 2 15 223
2010 1 2 16 32
2010 1 2 17 233
2010 1 2 18 544
2010 1 2 19 12
2010 1 2 20 12
2010 1 2 21 3
2010 1 2 22 34
2010 1 2 23 54
2010 1 4 0 34
2010 1 4 1 54
2010 1 4 2 565
2010 1 4 3 2
2010 1 4 4 334
2010 1 4 5 23
2010 1 4 6 223
2010 1 4 7 32
2010 1 4 8 233
2010 1 4 9 544
2010 1 4 10 12
2010 1 4 11 12
2010 1 4 12 3
2010 1 4 13 34
2010 1 4 14 54
2010 1 4 15 565
2010 1 4 16 2
2010 1 4 17 334
2010 1 4 18 23
2010 1 4 19 223
2010 1 4 20 32
2010 1 4 21 233
2010 1 4 22 544
2010 1 4 23 344
Thanks in advance for your help!
댓글 수: 0
답변 (4개)
Marieke Klijn
2016년 9월 21일
Hi Miguel,
For your question I loaded your copied data and named it 'data'. I assume that every day contains 24 hours (0-23) and each day has a number which is unique per dataset. So I loop over each day and check if there are 24 hour entries. If this is not the case then the rows are removed from the dataset.
days = unique(data(:,3));
for ii = 1:length(days)
hours_in_day = find(data(:,3) == days(ii));
if length(hours_in_day) < 24
data(hours_in_day,:) = [];
end
clear hours_in_day
end
Hope this works for you!
Cheers, Marieke
댓글 수: 3
Geoff Hayes
2016년 9월 21일
Miguel - depending upon your version of MATLAB, you could use hist or http://www.mathworks.com/help/matlab/ref/histogram.html to determine the number of recorded hours in each day. For example, using your above data and hist we would see that
[hourCount,days] = hist(data(:,3),1:max(data(:,3)))
returns
hourCount =
22 24 20 24
days =
1 2 3 4
So we know that the first and third days are incomplete. We could then remove these days from the data by checking to see which element of column three is a member of those days that are incomplete and set it to an empty element (which removes it from the array)
data(ismember(data(:,3),days(hourCount ~= 24)),:) = [];
In the above, we rely on the fact that
ismember(data(:,3),days(hourCount ~= 24))
returns a logical array of ones and zeros for each row of data where a one indicate that the row is missing an hour and a zero indicates otherwise. There may be a more efficient way of doing the above..I seem to recall that ismember can be expensive.
George
2016년 9월 21일
편집: George
2016년 9월 21일
- Find groups on year, month, day.
- Validate that your hours meets what you need (e.g., numel == 24 or somesuch)
- Extract valid dates
This should mostly do it.
weather = readtable('weather.xlsx');
G = findgroups(weather.Year, weather.Month, weather.Day);
validDays = splitapply(@(x) (numel(x) == 24), weather.Hour, G);
weather(validDays(G),:)
Where this may fall down would be if you had days with a dupe reading, but still 24 readings. You can do different anonymous functino if that concerns you.
댓글 수: 0
Andrei Bobrov
2016년 9월 22일
[~,~,c] = unique(EXAMPLE(:,1:3),'rows');
t = nonzeros((1:max(c))'.*(accumarray(c,1) == 24));
out = EXAMPLE(ismember(c,t),:);
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Dates and Time에 대해 자세히 알아보기
제품
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!