Discarded Messages with SPMD and labReceive ... why?

조회 수: 3 (최근 30일)
EvanThomas 2022년 7월 18일
편집: EvanThomas 2022년 7월 20일
I am using SPMD and trying to get some workers communicating w/ each other. There is a flag they need to send/receive. Whoever gets there job done and comitted first, sends out the flag, which the remaining workers should receive and therefore not commit their work.
Here is some abstact code that hopefully gets the point across of what I am trying to do. I would have thought the labBarrier at the bottom would have ensured all workers coming in 2nd place and after would have received the flag from the first workker finished. Some do, but .... I also get many of the warning messages similar to the following:
Lab 1:
Warning: An incoming message was discarded from lab 2 (tag: 2)
Indeed some workers are indeed missing the message, even if they finish seconds after that flag was sent out.
How does labSend work? I am missing something here?
% Emulating workers doing some variable time task
pause(randi([1 15]));
% See if other workers got their first and sent an update
for i=1:1:length(agentVec)
if i==labindex
if labProbe(i,2)
[Updates(i),srcWkrIdx,tag] = labReceive(i,2);
Updates(i) = 0;
if ~any(Updates)
% Commit work
flag = 1
% Otherwise take a nap
flag = 0
labSend(flag,agentVec(agentVec ~= labindex),2);
  댓글 수: 3
EvanThomas 2022년 7월 19일
편집: EvanThomas 2022년 7월 19일
Hi Edric, thanks for the response. You can ignore the matching end as this isn't the code I am running. It was just meant to be a simple, absctract example demonstrating the concept of what I am trying to do. (Although, I just edited and removed the extra "end")
Right now, the flag is literally just a 1 or 0, which seems about as small a message can get. Is it still possible MPI is considering this "large".
I get the impression MPI isn't a very reliable tool for communication, as far as predictable behaviour. Is this generally true? Maybe I can't achieve what I am hoping, as a result?
Edric Ellis
Edric Ellis 2022년 7월 20일
I would actually say exactly the opposite - MPI is (generally) very reliable and predictable. I shall post an answer with a suggestion as to how you might proceed.
In the code that you've written, each worker is guaranteed to labSend to each other worker. However, each worker is not guaranteed to labReceive from each other worker. There are guaranteed to be mismatched send/receives.

댓글을 달려면 로그인하십시오.

답변 (1개)

Edric Ellis
Edric Ellis 2022년 7월 20일
Using conditional receives in this way is not a robust way to get the workers to collaborate - you have an ordering problem that cannot be solved. I think you can probably achieve your goal by using one of the "reduction" functions which are designed to collect together results from multiple workers. In particular, you could try gcat to allow each worker to find out what happened on every other worker. gcat (effectively) collects values from all workers and concatenates them together on each worker. In this way, you don't need the labBarrier call either. Something a bit like this:
myResult = doSomeWork();
allResults = gcat(myResult);
% Now, choose what to do based on the results from all workers.
  댓글 수: 1
EvanThomas 2022년 7월 20일
편집: EvanThomas 2022년 7월 20일
Thanks again for the feedback. Unfortunately, I'm not sure that will work for me, as it looks like SPMD waits until it here's back from all workers, which make sense given that it is concatenating all their responses and they are running asynchronously. So the function containing gcat won't complete until all worerks are done, if I understand things correctly. This takes me away from the asynchronous behavior I was needing at the next step.
For example, whenever Agent A is done it needs data from the workers that finished up to that point only. So, I would need a "partial" gcat, or some way to concatenate results from the subset of workers that finished only before Agent A. Not sure that is possible, though. Hopefully, my description makes sense
I felt like labSend and labReceive would be the only way to accomplish this. Unfortunately, that is not working, either.

댓글을 달려면 로그인하십시오.


Help CenterFile Exchange에서 MATLAB에 대해 자세히 알아보기




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by