Video length is 15:46

What is Extremum Seeking Control | Data-Driven Control

From the series: Data-Driven Control

Brian Douglas

Get an introduction to extremum seeking control—an adaptive control method for finding an optimal control input or set of system parameters without needing a model of your system, static system dynamics, or a non-quadratic cost or objective function. You’ll see how to build the algorithm one component at a time in Simulink® to highlight the benefits and drawbacks of this method.

Published: 2 Jul 2021

In this video, I want to introduce an adaptive control method called extremum seeking control. We’re going to build up the algorithm in a way that I think will motivate each of the components and hopefully, highlight some of the overall benefits and drawbacks of this method. I think extremum seeking control is a really interesting and intuitive controller, so I hope you stick around for it. I’m Brian, and welcome to a MATLAB Tech Talk.

To begin, let’s start with this generic system. There are signals, u, entering into the system and it produces some output, y. With feedback control, we’re looking to design a controller that can use the outputs in some way to determine what the correct inputs are that ultimately get the system to behave the way we want. If we want to take an optimal approach to solving this problem, we need to set up some kind of cost function that we want to minimize or an objective function that we want to maximize and then find the parameters or the system inputs that do just that.

For example, a linear quadratic regulator is an optimal way to find the gain matrix for full state feedback. We set up a quadratic cost function that takes into account system error and actuator effort and then along with a model of the system dynamics we can find the gain matrix that perfectly blends effort and error together to produce the minimum overall cost.

LQR requires a linear model of the system. In order to do this optimization, we need to know f(x, u). Also, the optimization is done offline and produces static gains that won’t change over time even if the system dynamics do, and the cost function has to be, by definition, quadratic.  

So, the question is what if you don’t have a model of your system, or if the system dynamics change over time and so a static gain set won’t be sufficient, or if the cost or objective function that you’re trying to optimize isn’t quadratic? Then LQR isn’t a good optimal solution.

For example, take an anti-lock braking system for your car. The input into this system is the amount of brake pressure to apply, and the output is the deceleration of the vehicle. The question is how hard should we press the brakes to maximize deceleration? And the answer isn’t obvious. If you apply too little brake pressure you’re not slowing down and fast as you can and too much pressure and the wheels will start to skid which also reduces your braking effectiveness.  In fact, this interaction between the tire and the road is governed by a curve that looks something like this. So, there is a perfect amount of brake pressure that will cause the perfect amount of wheel slip that will maximize the braking force.  

This is not a quadratic function. Also, this curve changes based on the road and tire conditions. And not only is it changing, but to create a model of this system would require knowledge of the road surface and melting characteristics of rubber and so much more. So, because of these conditions, this is a not a good candidate for LQR.  

Now, there are different ways to solve a problem like this, but I want to talk particularly about perturb and observe type algorithms. These algorithms don’t require a system model, they don’t require quadratic objective functions, and they can run real-time and adapt to the changing system dynamics.

Also, another benefit is that at their core, perturb and observe just makes a ton of sense. The basic algorithm works like this. Start with an initial guess. In our case, that’s a specific brake pressure. Record the current objective value which for us is the deceleration of the vehicle. Now, we perturb our best guess by stepping in one direction, let’s say increase the brake pressure a little bit. Now we check to see if the objective increased or decreased to determine if that step was in the right direction. For example, if the vehicle is decelerating more than it was previously, then we know we’ve marched up the hill in the right direction and we keep that value. We then take another step, increasing the brake pressure some more and checking the result. Eventually, we’ll reach the optimal braking pressure. At this point, we’ll step past it and realize that the objective has decreased and therefore we’ve gone too far. Then we step back in the other direction.  

And we have to keep stepping back and forth around the maximum point because a change in the environment or the system dynamics could move that maximum point and we always want to be probing for it and tracking it. This is the basic idea of how we can find an optimal solution for a dynamic system without needing a model.

Now, there are some drawbacks to this very simple algorithm, and one of which is that taking large steps will converge faster, but will produce an average objective value that is lower than is could be since you’d be jumping further away from the optimal value every step. Now, there are algorithms that try to estimate the gradient of the function and then use that to adjust the step size so that when you’re near the optimal point it doesn’t take as large of steps. One version of this type of gradient estimating algorithm is extremum seeking which is what we’re going to talk about for the rest of this video.

All right, so let’s head over to Simulink to build this controller.

I have here the plant that we wish to control and it takes a single input, u, and it produces a single output.  Now, in this case, the output is the objective that we’re trying to maximize, however, in general this might not be the case. We may take the outputs from the system and combine them with other signals in some way to generate an objective function, but to keep this simple, I’ve wrapped it all into this one system. So, we’re trying to find u that will maximize the output.

Inside this function you can see that it’s just a really basic quadratic equation. So, it’s not a model of real but it’ll still help us realize what extremum seeking is doing. I’ll show you what the result of this system looks like by ramping the input from 0 with a slope of 2 units per second. We’ll run this real quick and check out the scope. The yellow line is the input ramping from zero, and you can see that right at u = 10, the output reaches a maximum of 20. This is the optimal condition for this particular system.  

To relate this back to the braking example, this is saying that a brake pressure of 10 units would produce the maximum deceleration. Anything more or less than this would result in a vehicle that takes longer to stop.

Now, let’s pretend that we don’t actually know that an input of 10 is the optimal value and instead we have an initial guess of say 5. We want our controller to automatically determine if 5 is too low, too high, or just right, and the way we’re going to do that is to add a sine wave to this value. Basically, instead of feeding in a constant five, we’re going to add a higher frequency ripple onto the signal that will perturb it slightly higher and lower. I’m choosing a sine wave of 30 rad/s with amplitude of 0.3 for this example, but this is one of several places where the designer can tune the controller to their plant.

Let’s run this now and check the scope. The yellow line is the input signal with that sine wave ripple and the blue line is the system output. Since 5 is lower than the optimal input, we’d expect these two signals to be in phase. That is, when we increased 5 a little bit, the objective also increases in phase with it. And if we change the input to 12 which is too high, we’d expect the two signals to be out of phase.  

The trouble is that these two signals are hard to compare in this state since they are offset from each other. Luckily, we don’t necessarily care about the absolute value of these signals, we want to know how the change in the input signal creates a change in the output signal. The change in the input is easy, it’s just the sine wave itself. But getting the change in the output is actually pretty easy also. We can do that by adding a high pass filter. I’m choosing one with a cutoff frequency of 5 rad/s. This will essentially block the low frequency information like the offset from zero and pass the 30 rad/s signal through without affecting it much.  

And check it out. The offset is removed by the high pass filter and we can now see clearly how the system output changes when we change the system input. Since these two signals are completely out of phase with each other we know we need to lower the input value to reach the maximum output. So, a value of 12 is too high.

And if we change the input back to 5, we see that these two signals are now in phase with each other, therefore, we know that the current input value of 5 is too low. But how do we get our controller to understand this kind of logic?

It turns out, we can just multiply the two signals together because if the two signals mostly have the same sign, like they have when they’re in phase then the product will be mostly positive.

And indeed that is the case when the input is too low. You can see here that the signal is almost entirely positive. Well let me hide the yellow signal because it’s cluttering up the scope.  And now you can see that the signal is mostly above zero.  

And if we set our input to 12, you can see that the product is mostly below zero since the two signals now have mostly opposite signs.  

Now we can integrate this signal, the summation will tend to rise when the input is too low, and tend to decrease when the input is too high. And we can see that here, the output from this integral is tending down, indicating that 12 is too high and should be lowered.

Not only does the summation increase when the input is too low and decrease if it’s too high, but the speed with which it increases and decreases is proportional to the gradient, or slope of the objective function.  To show you what I mean, let’s set the input to a ramp and watch the summation as the input value sweeps through the optimal input.  And how cool is this. So, in our case with a quadratic function, the summation increases and decreases faster when we’re further from the optimal value, and the steps get finer as we reach that goal, and at the optimal value, the sum stays constant. That’s pretty awesome!

In this way, we can feedback this summation as our best estimate of the input and the system will ultimately converge on the optimal condition. This particular set up is taking its time to get there. So, another tuning adjustment we can make is to add a gain to the summation which will allow us to speed up and slow down convergence. We have to be a little careful here because speeding it up too much will cause instability, but this value looks ok for us. The input converges on 10, which produces the maximum output of 20.

And what’s cool about this controller is that it can adapt to changing plant dynamics, well as long as those dynamics are relatively slow compare to the convergence rate of the controller. The plant dynamics can’t be faster than what the controller can converge to otherwise it’ll continuously lag behind the maximum value.  

Let me know show you a quick example of the controller adapting to a changing plant by changing our plant equation to be a function of time. Here, I’m basically shifting the quadratic curve to the right as time increases which means our controller needs to constantly increase the input value u in order to maintain the maximum output value of 20.  So, let’s run this. You can see the output stays really close to the maximum value but the input is constantly changing, it’s tracking the value that creates that maximum output.  

All right, so, these are the basic components that make up the extremum seeking controller. And, if you have Simulink Control Design, you can just pull in an extremum seeking controller into Simulink rather than write it all out yourself.   

And some of the benefits of using this block is that there is some error checking that is done for you such as making sure that the frequencies between the modulating sine wave and the filters aren’t stepping on each other, also, this block can handle multi-input multi-output systems, and it exposes all of the configurable parameters in a single interface. But as you can see, the logic that it implements is exactly what we just walked through. Now, there is this extra low pass filter that can be used if there is high frequency measurement noise in your system, but I left it off since my example didn’t have any measurement noise. But otherwise the same.

Ok, hopefully, you can see that we could use an algorithm like this to do something like track the ideal brake force that will stop a car in the shortest distance on an unknown surface. And if you’re interested in seeing that in action, in the description there is a link to another video that shows how to use extremum seeking control for anti-lock braking.  

All right, before we end this video I really quick want to talk about some of the drawbacks with this method so that you’ll be more prepared to decide if it’s right for your control problem. For one, this method will only converge on a local optimum. If your system has multiple optima, then you need to make sure you initialize it such that it finds the global optimum.  

Also, even though it is a relatively straightforward controller, it is more complicated than a simple perturb and observe algorithm. We have a lot of tuning parameters that we need to tweak to get a result that converges quickly and robustly on the optimal solution.  

And finally, we need a plant that responds quickly to input changes, so that we can actually observe the perturbation, but a plant whose dynamics don’t change too quickly over time so that the controller is capable of tracking the optimal point.  

But despite all of this, I hope you can see that this is a pretty powerful controller if you’re dealing with a system that is hard to model and changes over time. I think a good way to get more experience with extremum seeking control is to try it out yourself. I’ve left links to examples and other documentation that should provide you with a good start.  

All right, that’s where I’m going to leave this video.  If you don’t miss any future Tech Talk videos, don’t forget to subscribe to this channel. Also, if you want to check out my channel, Control System Lectures I cover more control theory topics there as well. Thanks for watching and I’ll see you next time.