Im working with data that should follow some restrictions, for example the water level in a tank is measured in % so it cannot be more than 100, however, due to measurement erros it can be the case that the measurement mark 120 for example. How should I work with this data if I don't want wrong data to mess up further statistic calculations?. Is it correct to leave the data in blank? and if so what should the condition: NaN? Thank you for your help!

 채택된 답변

Alfonso
Alfonso 2018년 5월 15일
If you are saving all the water level values in an array 1xn or nx1 (a row or column vector) , you could delete the incorrect values retreived like this
i=1;
While i<length(data_array) % Recorremos array con valores de nivel de agua
if data_array(i)>100 % si es mayor al max se elimina
data_array(i) = [] ; %vaciamos/eliminamos el valor
end
i=i+1;
end

댓글 수: 5

I recieved an error message in this part of the code:
elseif nn == 5 || nn == 20 || nn == 23
for ii = 1:rows
if A(ii,nn) <= qin250
B(ii,nn) = A(ii,nn);
elseif A(ii,nn) > qin250
B(ii,nn) = [];
end
end
--- A null assignment can have only one non-colon index.
Error in refine_data (line 85) B(ii,nn) = [];
Never delete elements of an array in a loop. You will get it wrong, just as Alfonso has done. There are ways to make it safe (usually by working in reverse) but it is most likely you'll get it wrong.
Let's follow Alfonso algorithm with the array: [5 200 150 10]:
  • i = 1, data_array = [5 200 150 10]. Therefore data_array(i) is 5. The if test is false, nothing happens. Increment i.
  • i = 2, data_array = [5 200 150 10]. data_array(i) is 200, the if test is true, we delete data_array(2), therefore data_array is now [5 150 10]. Increment i.
  • i = 3, data_array = [5 150 10]. data_array(i) is 10! Notice that we've skipped the 150, which will never be deleted.
The algorithm could be saved by not incrementing i if a deletion is performed but matlab has a much simpler and faster way of achieving the deletion:
data_array(data_array > 100) = []; %delete all elements greater than 100 all in one go.
Guillaume
Guillaume 2018년 5월 17일
편집: Guillaume 2018년 5월 17일
How should I work with this data if I don't want wrong data to mess up further statistic calculations?
Regardless of the flawed algorithm, in my opinion, the spirit of Alfonso's answer is correct. While you could indeed replace the data with NaN, since you know that the measurement is incorrect and you can't recover what it should be, you should simply discard the measurement.
I recieved an error message in this part of the code:
Your implementation has the same issue as Alfonso's algorithm. Your i will become out of sync with the actual rows once you've deleted something. In addition, you're trying to delete a single element in a row of a matrix, which is not possible. What you want to do is to delete the whole row.
The efficient and safe way to do that in matlab:
B = A;
B(any(B(:, [5, 20, 23]) > qin250, 2), :) = []; %delete all rows for which any of column 5, 20 or 23 is greater than qin250
Thanks for your help Guillaume! it was simple and worked pretty well.!

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Steven Lord
Steven Lord 2018년 5월 17일

0 개 추천

If you use NaN or missing (which for numeric arrays are the same thing) to represent the missing data in your data set, you can use some of the functions designed for handling missing data to remove, replace, or perform computations ignoring the missing data.

카테고리

도움말 센터File Exchange에서 Multidimensional Arrays에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by