Hey Nik,
The example you linked trains agents to maximize coverage, but if you want agents to move to specific predefined destinations in the shortest time, you’ll need to modify the reward function and action space.
Steps to Modify the Example
Define DestinationsStore your 10 destinations in a matrix:destinations = [2,2; 11,2; 3,6; ...]; % Add all 10 destinations
Assign Each Agent a Destination
- You can randomly assign a destination at the start of each episode.
- You can also assign dynamically based on a policy.
Modify the Reward Function
- Give a negative reward based on the distance to the target.
- Give a high reward when the agent reaches the destination.
Example:
function reward = getReward(agentPos, destination)
distance = norm(agentPos - destination);
reward = -distance; % Penalize distance to encourage shortest path
if distance < 0.5 % If agent reaches destination
reward = reward + 100;
end
end
Modify State Space
- Instead of covering an area, define states as (x, y) agent position and target (x, y).
Modify the Training Environment
- Instead of rewarding area coverage, focus on time-to-goal.
- Ensure the action space includes movements toward the destination.
Run Training
Modify the reinforcement learning setup from the MathWorks example and train using PPO or another RL algorithm.
This should help your agents learn the fastest paths to their destinations. Follow me so you can message me anytime with future MATLAB questions.