"diff" function doesn't work properly with small numbers

Question

0 개 추천

For some reason when difference between n and n+1 is too small diff function assumes the solution is 0.

There are +-290 data points on the plot, The precision is 10^(-10), As far as i know Matlab works on 16 or 32 digits so it shouldn't be a problem.

Technically on the plot there should be on no constants, Just increase and decrease of value.

Pomiary=cisnienie300920151701average300
Czas = Pomiary{:, 4};
Temperatura = Pomiary{:, 5}; 
CzasDMY= Czas / 86400  + datenum(1970, 1, 1); 
y = Temperatura;
x = CzasDMY;
ydiff=diff(y,1); 
wieksze = (ydiff > 0);
mniejsze = (ydiff < 0);
gora = y;
dol = y;
gora(~wieksze) = NaN;
dol(~mniejsze) = NaN;
plot(x,y,'b',x, gora, 'r', x, dol, 'g');
grid on;
xlim tight;
xlim("auto");
ylim("auto");
legend("Constant", "Increasing", "Decreasing");
legend("Position", [0.15754,0.1468,0.20438,0.12165]);

댓글 수: 8
이전 댓글 6개 표시 이전 댓글 6개 숨기기

dpb 2025년 12월 23일

편집: dpb 2025년 12월 23일

MATLAB Online에서 열기

whos -file x
  Name        Size            Bytes  Class     Attributes

  x         288x1              2304  double              
whos -file y
  Name        Size            Bytes  Class     Attributes

  y         288x1              2304  double              
d=dir('cisni*.mat');
whos('-file',d.name)
  Name                                 Size            Bytes  Class    Attributes

  cisnienie300920151701average300       -              16275  table              
load x
load y
X=[x y];
fprintf('%.12f %.12f\n',X(1:10,:).')
736236.625000000000 1022.575847900000
736236.628472222248 1022.556940100000
736236.631944444496 1022.566749900000
736236.635416666628 1022.576562000000
736236.638888888876 1022.592748100000
736236.642361111124 1022.587627700000
736236.645833333372 1022.544010600000
736236.649305555504 1022.502575900000
736236.652777777752 1022.478764300000
736236.656250000000 1022.475430000000
dy=diff(y);
iy=find(dy==0);
nnz(iy)
ans = 5

This shows there are 5 separate repeated instances in the y vector.

iy
iy = 5×1
    38
    91
   144
   198
   251
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

shows that there aren't repeated values more than two in a rwo in this data set at least so the averaging technique in the earlier Answer would work to produce something that would have no zero differences if that is the ultimate goal.

Why it is significant and not just accepting the result as is is, so far, unclear? But, as noted, the problem is not in diff() or machine precision, but that the data have been rounded such that there really are identical values.

fprintf('%.14f\n',y(iy(1)+[-1:2]))

1023.03861350000000 1023.01522350000005 1023.01522350000005 1022.96080629999994

plot(x(iy(1)+[-1:2]),y(iy(1)+[-1:2]),'*-')

Reproduces exactly the problem illustrated before -- the data are identical to machine precision because the values have been rounded to seven (7) decimal digits and when read into memory from the input file containing those values, they were interpreted and stored identically in memory. Ergo, the diff() between those subsequent positions is, as it returns identically zero.

As my Answer over the same subset of the data shows, your only choices if you find this result unacceptable is to provide the data with full precision as input on the hope that there will be a difference in later digits in the original before the rounding or as illustrated there, interpolate over the range beyond the duplicated values to produce a different result for the second/repeated value such that a subsequent diff() would be nonzero. The caveats noted there are still in play, of course.

The basic answer is that your data are, indeed, not changing at every point in either a positive or negative direction but are unchanging over at least two consecutive positions and diff() is just doing its job.

Fangjun Jiang 2025년 12월 23일

편집: Fangjun Jiang 2025년 12월 23일

MATLAB Online에서 열기

@dpb, @Sylwester, There is no problem regarding diff(). There is no probelm regarding data accuracy or precision. It is a visual mis-conception.

First, as @dpb pointed out, in the whole set of 288 data points, there is only 5 places where the data value is un-changed thus regarded as "Constant" trend.

@Sylwester had this thought. Plot all the data in BLUE color, plot all the "Increasing" trend data in RED color, plot all the "Decreasing" trend data in GREEN color. Since the RED and GREEN color are going to over-write the BLUE color, the resulting plot should show almost no "BLUE" section, since there is only 5 out of 288 data points that are "Constant" trend.

But the resulting plot shows a lot of BLUE. So @Sylwester thought there was a problem in diff().

But there is no problem regarding diff() function. It is just a visual mis-conception. Or it is due to how the plot(time,data,'r') function connects the data points with the line style and color when there are "NAN" data points in the "data" set.

I only changed to this line.

plot(x,y,'b.',x, gora, 'r+', x, dol, 'g*');

and the resulting plot gives the correct visual impression (that there is almost no BLUE "Constant" data).

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Paul 2025년 12월 23일

MATLAB Online에서 열기

1 개 추천

x.mat
y.mat

The data in gora and dol are on the plot as can be seen below when using markers. However, if the y-data pattern is

increasing->decreasing->increasing ...

then the gora and dol will have data->nan->data ...

and so the data points in gora and dol won't be connected on the plot (and won't be visible at all if not using markers)

load x

load y

ydiff=diff(y,1);

wieksze = (ydiff > 0);

mniejsze = (ydiff < 0);

gora = y;

dol = y;

gora(~wieksze) = NaN;

dol(~mniejsze) = NaN;

figure

plot(x,y,'b',x, gora, 'r-o', x, dol, 'g-x');

xlim([7.3623688,7.3623691]*1e5)

●

xl = xlim;

counts = (1:numel(x)).';

index = x>xl(1) & x < xl(2);

format long

[counts(index),x(index),y(index),gora(index),dol(index),wieksze(index),mniejsze(index)]

ans = 9×7

1.0e+05 * 0.000750000000000 7.362368819444445 0.010231555306000 0.010231555306000 NaN 0.000010000000000 0 0.000760000000000 7.362368854166666 0.010232377609000 NaN 0.010232377609000 0 0.000010000000000 0.000770000000000 7.362368888888889 0.010232186155000 0.010232186155000 NaN 0.000010000000000 0 0.000780000000000 7.362368923611111 0.010232346412000 NaN 0.010232346412000 0 0.000010000000000 0.000790000000000 7.362368958333334 0.010232249740000 0.010232249740000 NaN 0.000010000000000 0 0.000800000000000 7.362368993055555 0.010232331298000 NaN 0.010232331298000 0 0.000010000000000 0.000810000000000 7.362369027777778 0.010232141998000 0.010232141998000 NaN 0.000010000000000 0 0.000820000000000 7.362369062500000 0.010232185551000 NaN 0.010232185551000 0 0.000010000000000 0.000830000000000 7.362369097222222 0.010231700000000 0.010231700000000 NaN 0.000010000000000 0

<mw-icon class=""></mw-icon>

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

Fangjun Jiang 2025년 12월 23일

Yes, @Paul, you explained this plot() behavior with data that contains "NAN" points perfectly,

with this "increasing->decreasing->increasing" extreme case.

Nothing is wrong. I call it a visual mis-conception by the OP.

댓글을 달려면 로그인하십시오.

Answer 2

Fangjun Jiang 2025년 12월 22일

MATLAB Online에서 열기

0 개 추천

The data value and results make sense. There is no problem using diff() to process your data based on your example data.

%%
format long
y=[36 1023.08766260000
    37 1023.03861350000
    38 1023.01522350000
    39 1023.01522350000
    40 1022.96080630000]
y = 5×2
1.0e+03 *

   0.036000000000000   1.023087662600000
   0.037000000000000   1.023038613500000
   0.038000000000000   1.023015223500000
   0.039000000000000   1.023015223500000
   0.040000000000000   1.022960806300000
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
ydiff=diff(y,1)
ydiff = 4×2
   1.000000000000000  -0.049049100000047
   1.000000000000000  -0.023389999999949
   1.000000000000000                   0
   1.000000000000000  -0.054417200000103
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
wieksze = (ydiff > 0)
wieksze = 4×2 logical array
   1   0
   1   0
   1   0
   1   0
mniejsze = (ydiff < 0)
mniejsze = 4×2 logical array
   0   1
   0   1
   0   0
   0   1

By default, MATLAB uses 64 bits floating-point data to represent a numeric value.

At around value 1023, its relative accuracy is 1e-13, sufficient to represent your data precision 10e-10.

The problem you observed comes from your raw data. Note that y(3,2) and y(4,2) are exactly the same by visual observation.

eps(1023)
ans = 
     1.136868377216160e-13

Check the document for eps(). You will understand the issue better.

doc eps

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

Fangjun Jiang 2025년 12월 22일

MATLAB Online에서 열기

The length of diff() output is 1 smaller than its input length. Your code didn't seem to consider this.

diff(1:3)
ans = 1×2
     1     1
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

Fangjun Jiang 2025년 12월 23일

편집: Fangjun Jiang 2025년 12월 23일

The length difference of 1 between the input and output of the diff() function is not an issue either in this case.

There is no issue regarding diff() function or data accuray/precision. The OP has a visual mis-conception due to the way that the plot(time,data,'b') function connects data points with color and line style when there are "NAN" data points in the "data" set.

댓글을 달려면 로그인하십시오.

Answer 3

dpb 2025년 12월 22일

편집: dpb 2025년 12월 23일

MATLAB Online에서 열기

0 개 추천

X=[
36 1023.08766260000
37 1023.03861350000
38 1023.01522350000
39 1023.01522350000
40 1022.96080630000];
dx=diff(X)
dx = 4×2
    1.0000   -0.0490
    1.0000   -0.0234
    1.0000         0
    1.0000   -0.0544
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

As hypothesized above, some of the temperature/pressure values are identical owing to the apparent rounding to seven (7) decimal digits.

You would have to have at least one more decimal place in the above between the 3rd and 4th data values in order for the difference to not be identically zero.

If you're transferring data from one place to another, to avoid this don't use text files but save the whole internal precision by using .mat files or binary formatted transfer if from some external source. Besides being able to retain full precision (note that precision does not necessarily imply accuracy), it's much more efficient in speed and memory/disk space.

As for your comment above about the values that "They are meant to be the same, The issue is that for some reason function for marking if value increased/decreased has holes in it and skips points unless difference is high enough", that makes no sense at all -- the two values are identically the same so how can there be any sense of the value changed that "increased/decreased" implies?

If you're trying to measure an overall change; then diff is entirely the wrong function as it is on a pointwise basis and so will indeed notice when there are any points for which the difference is actually zero.

Looking at your small subsample of data

plot(X(:,1),X(:,2),'*-')

indeed, there is an overall negative trend, but it isn't uniformly decreasing at every point, just overall. If you want indications of trends excluding such points, you'd have to do something like find the inflection points and then (say) the two points on either side and then use the adjusted temperature to compute the change.

Note that you would also have to locate any locations of more than two successive points being the same and then do something over those ranges. Also, in doing something like this you'll run into the issue that @Fangjun Jiang raised about the differenced vector being shorter than the original so the points are offset by one in the addressing.

For the simple example here

ix=find(dx(:,2)==0); % locate the zero point `

fprintf('%d %15.10f\n',X(ix+[0:1],:).') % display where are relatively

38 1023.0152235000 39 1023.0152235000

X(:,3)=X(:,2); % augment the X array

X(ix+1,3)=mean(X(ix+[0 2],3)); % replace the unchange with linear interp1

hold on

plot(X(:,1),X(:,3),'rx-')

legend('Original','Interpolated','location','northeast')

●

diff(X)

ans = 4×3

1.0000 -0.0490 -0.0490 1.0000 -0.0234 -0.0234 1.0000 0 -0.0272 1.0000 -0.0544 -0.0272

Now you don't have any zeros in the 3rd column diff().

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

"diff" function doesn't work properly with small numbers

댓글 수: 8
이전 댓글 6개 표시 이전 댓글 6개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

추가 답변 (2개)

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

카테고리

제품

릴리스

태그

Community Treasure Hunt

"diff" function doesn't work properly with small numbers

댓글 수: 8 이전 댓글 6개 표시 이전 댓글 6개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

추가 답변 (2개)

댓글 수: 3 이전 댓글 1개 표시 이전 댓글 1개 숨기기

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

카테고리

제품

릴리스

태그

참고 항목

Community Treasure Hunt

댓글 수: 8
이전 댓글 6개 표시 이전 댓글 6개 숨기기

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기