if it is greater than 75 percent missing data
조회 수: 2 (최근 30일)
이전 댓글 표시
Hello there
I have a matrix of (10200x7) from 01-01-2019 to 29-02-2020, which is iqual to the hourly temperature measured. The data is organizaed in the first 6 columns with date information (yy-mm-dd-hh-mm-ss) and then my temperature data (column 7). I already got the nanmean dayli values for a new matrix. But I would like to know how can I get a conditional, for example: if more than 75% of my data each 24 hours is missing, give me NaN for that day. If not, give me nanmean value.
I'm trying something like this, but I'm pretty sure I'm on the wrong way
%first I find the hourly values positions after 0 hs and before 23 hs
x = find(files_mat{1,1}(:,5)>= 0 {1,1}(:,5)<=23);
%%then, I'm trying to get the conditional to the tempearture column
if isnan (files_mat{1,1}(x,7))<= 0.75
M_24h
elseif isnan (files_mat{1,1}(x,7))>= 0.75
M_24h
Thank you
댓글 수: 2
Rik
2020년 10월 26일
I don't fully understand the shape of your data. But if you have an array, you can call isnan and get out a logical array. Using mean you can find the percentage of NaNs in each row, and using the comparator you will end up with a logical vector you can use to index into your output array and correct all non-NaNs.
I'm not posting this as an answer, as I can't really write a working example for you without understanding your data.
채택된 답변
Peter Perkins
2020년 11월 19일
I think this would be much simpler if you created a timetable by combining the first 6 vars in the table. Actually, I have no idea if you even have a table, but you should. Use readtable to read in your data.
Given a timetable, this is a one-liner: use retime to aggregate to daily means, except use your own custom function that returns NaN if there's more than 18 missing hourly values. Something like this:
ttDaily = retime(ttHourly,'daily',myFun)
function y = myFun(x)
if sum(isnan(x)) > .75*length(x)
y = NaN;
else
y = mean(x,'omitnan');
end
댓글 수: 0
추가 답변 (1개)
Adam Danz
2020년 10월 26일
편집: Adam Danz
2020년 10월 27일
Here's an approach you can apply to your data,
% Create a table of year-month-day-hour-min-second-rain data with some of the hour data missing (NaN)
dateTimeVals = datetime(1999,1,1) + hours(1:5000)';
rain = rand(size(dateTimeVals));
T = array2table([datevec(dateTimeVals), rain], 'VariableNames',{'Y','M','D','H','Mn','S','rain'});
% Replace some Hour data with missing vals
rng('default')
T.H(randi(height(T),1,3000)) = NaN;
% Show top of table
head(T)
% Add column to identify valid hour-data
T.dateValid = ~isnan(T.H);
head(T)
% Compute the percentage of valid hour-data per day
% This is achieved by applying mean() to the binary
% 'dateValid' column for each day.
Tgs = groupsummary(T,{'Y','M','D'},'mean', 'dateValid');
head(Tgs)
% This shows that for the first day of the first month of 1999,
% 47.8% of hour-data are non-nan. Let's confirm that manually.
daySelection = T.Y==1999 & T.M==1 & T.D==1;
mean(T.dateValid(daySelection),'omitnan') * 100
% To identify which days have >75% hour-data,
Tgs(Tgs.mean_dateValid > 0.75, 1:3)
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Data Preprocessing에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!