How to fit a line through this data "smartly" ?
조회 수: 8 (최근 30일)
이전 댓글 표시
I have the x-y data shown in the figure below (for the curious, it is the logarithm of the amplitude of a Hilbert transform).
I'm trying to write an algorithm to automatically get the slope of the initial portion of the plot, i.e. before the noise overcomes the signal.
What's causing the task to be even more challenging, are the small saw-tooth jumps that you can see in the zoomed-in portion on the right.
I tried several approaches, but could not find a reliable way to automatically fit a line only of the left-hand side linear portion.
Do you have any suggestion?
PS: I know it would help, but I don't (and cannot have) the curve fitting toolbox.
채택된 답변
Star Strider
2019년 1월 13일
편집: Star Strider
2019년 1월 13일
One approach (using the Signal Processing Toolbox medfilt1 function):
D = load('data.mat');
X = D.X;
Y = D.Y;
Ts = mean(diff(X));
YF = medfilt1(Y, 2500);
dYdX = gradient(YF,Ts);
gt0 = find(dYdX > -0.001, 1, 'first');
b = polyfit(X(1:gt0),Y(1:gt0),1);
Yf = polyval(b, X(1:gt0));
figure
plot(X, Y)
hold on
plot(X(1:gt0), Yf,, ':r', 'LineWidth',2)
hold off
text(10, -5, sprintf('y = %6.3f x %6.3f', b))
This code does a median filter (medfilt1) on ‘Y’, then uses the gradient function to calculate the derivative. It then uses a relatively unsophisticated threshold (the find call) to determine the end of the relatively linear section of the data, does a linear regression on that section, and plots the result.
I cannot determine how well it will work with any other data you have, so you will likely need to experiment with it.
EDIT — (13 Jan 2019 at 18:05)
Added plot image.
EDIT #2 — (13 jan 2019 at 18:50)
Another option, using the Signal Processing Toolbox findchangepts function:
D = load('data.mat');
X = D.X;
Y = D.Y;
ipt = findchangepts(X, 'Statistic','std');
b = polyfit(X(1:ipt), Y(1:ipt), 1);
Yf = polyval(b, X(1:ipt));
figure
plot(X, Y)
hold on
plot(X(1:ipt), Yf, ':r', 'LineWidth',2)
hold off
text(10, -5, sprintf('y = %6.3f x %6.3f', b))
This may be more robust. It produces similar statistics and plot.
댓글 수: 3
Image Analyst
2019년 1월 13일
You said in your comment to me that you're getting a new curve from the server every 10 minutes. When you run an experiment there are some parameters you want to keep constant, and others that you allow to vary. Have you thought about whether, theoretically, it is better to compute the slope between fixed limits, or whether you want to allow the right-most index of the fitting data to vary in a way that depends on the variable data itself? I suppose arguments could be made either way, depending on what the overall situation is (which we don't know but you do).
추가 답변 (2개)
John D'Errico
2019년 1월 13일
편집: John D'Errico
2019년 1월 13일
A serious problem is your data also starts with a strongly nonlinear portion.
You essentially need to trim it at both ends. I'd suggest the trick is to use a Savitsky-Golay style filter, combined with a simple call to median. Median will implicitly trim those ends for you, and because we use a filter of a fairly short span, that allows median to do some smoothing too. And the linear fits are short enough they will give you sufficient points inside that quasi-linear region.
The step is constant at 0.002.
dx = 0.002;
nfilt = 20;
M = [dx*(0:nfilt-1)',ones(nfilt,1)];
Mp = pinv(M);
slopeFilter = Mp(1,:);
allSlopes = conv(Y,slopeFilter,'valid');
histogram(allSlopes,1000)
linearSegmentSlope = median(allSlopes)
linearSegmentSlope =
0.15569
As you see, the histogram has a very high spike, as we would expect. A simple median will give us a good estimate now. You could change the filter length, and the result should be fairly robust to that value, because the essentially linear segment isquite long.
댓글 수: 4
John D'Errico
2019년 1월 13일
Yes. That makes some sense. If you know the quasi-linear segment will have a negative slope, I suppose it makes sense to exclude the positive slopes. I might still use median to do the final part though, not mean. At the least, I would use a trimmed mean. Discard say the lowest and highest 25% of the slope estimates, then compute the mean of those that remain.
Image Analyst
2019년 1월 13일
I plotted the data then it looked like the squirrely stuff stopped around element 2300 or so, and started to get fairly linear after that. So I found the slopes from element 2300 to the end of the data and plotted it.
fontSize = 15;
s = load('data.mat')
x = s.X;
y = s.Y;
subplot(1, 2, 1);
plot(x, y, 'b-');
grid on;
title('Original Data', 'FontSize', fontSize);
xlabel('x', 'FontSize', fontSize);
ylabel('y', 'FontSize', fontSize);
startingElement = 2310;
for k = 2310 : 100 : length(x) - 1
thisX = x((startingElement-10):k);
thisY = y((startingElement-10):k);
coefficients = polyfit(thisX, thisY, 1);
slope(k) = coefficients(1);
end
slope(1) = slope(2);
subplot(1, 2, 2);
mask = slope < 0;
x2 = x(mask);
y2 = slope(mask);
plot(x2, y2, 'b-');
grid on;
caption = sprintf('Slope from element %d to x', startingElement);
title(caption, 'FontSize', fontSize);
xlabel('x', 'FontSize', fontSize);
ylabel('Slope', 'FontSize', fontSize);
As you can see there is sort of a flat place for slopes, where they're fairly constant, but not quite. So it kind of becomes a judgment call as to what slope you want to use, or where you want to stipulate that the slopes are starting to move away from a "constant" value. There is no clear winner - it's not like the slopes are totally constant and then abruptly switch to a different value. There is more of a gradual change of slope as you progress into the noisy area. Since there is no clear winner, you can just pick wherever you want to stop. How about around x=30? Anything wrong with that value of -0.165?
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!