Why it is not possible to use breakpoints with parallel toolbox workers

조회 수: 9 (최근 30일)
Realisation that, it does not seem possible to use the breakpoints in applications using parallel toolbox workers https://uk.mathworks.com/matlabcentral/answers/459249-how-to-get-parpool-to-pause-at-breakpoints, is a massive blow, in my view.
- Why is that a problem? Is there any plans to support that?
- I understand that commonly workers get mapped to local machine cores, but the client that support breakpoints also needs to be run on one of available cores?
- So, what is the difference that is possible to run debugger on client but not on workers?
This is a massive disappointment. I cannot see how you can sell Parallel Toolbox with this restriction. People working with Matlab get used to comfort of using debugger that cannot be replaced by debugging using printfs. In bigger picture this limits Matlab to be used in smaller single threaded applications, but we know that any a bit more bigger applications, require multi threading. So I started a project as single threaded application with idea to convert it to a multi threading at some point, which just came. But after facing the breakpoints restrictions I realised that I have no arguments to "sell" this approach within the company and there is no justification to ask to buy more Parallel toolbox licences.

채택된 답변

Walter Roberson
Walter Roberson 2022년 2월 6일
Parallel pools are not restricted to running on local cores: they can also run on compute clusters; https://www.mathworks.com/help/parallel-computing/run-code-on-parallel-pools.html
Parallel workers do not just run on "cores": each worker is associated with a different process .
The client does not have direct access to the address space of the worker: all data transfer is done by using inter-process messaging. (A different kind of messaging is used for SPMD than is used for parfor / parfeval).
It would not necessarily be impossible for Mathworks to provide a method to debug a remote process, but it would not be trivial.
  댓글 수: 3
Walter Roberson
Walter Roberson 2022년 2월 6일
If I recall correctly, the communication with workers is by way of tcp for parfor and parfeval. The same mechanism is used for compute clusters and local processes.
This differs from spmd, which uses a message passing library that has an internal optimization layer that can use shared memory if the processes are on the same system.
When different processes are being used, the operating system does not allow direct access from the client to the memory of the worker, except under limited situations. Generally speaking, the limits are typically that a process is able directly access memory only of a process it has directly started on the same host, unless the starting process is a privileged process, in which case it may be able to access memory of another process on the same host.
The matter gets complicated when you are using a system that consists of several compute nodes tied together by high speed interconnects, with Non-Uniform Memory Access (NUMA): the nodes might each be running their own operating system instance, or there might be a single unified operating system. Large scale computers are hard to get right, especially if you need shared memory.
Because the same design is used for compute clusters as is used for local workers, and compute clusters are typically remote systems that you cannot directly start processes on, it is not generally possible for MATLAB to be the parent process of the workers. (And the system calls to gain access to the memory of a child process are quite different between Windows and Unix).
So we are left with two possibilities:
1. Mathworks could rewrite the code to make internal access to local workers fundamentally different than to non-local workers, and use operating system dependent debugger facilities to drive the workers in order to provide MATLAB debugger services to local workers; or 2. Mathworks could write an inter-process (probably tcp based) layer that could request access to variables and call stacks. This might potentially be somewhat easier: there just might be a fair bit of the needed framework already in place.
My memory is that historically MATLAB has been internally structured as several Java threads, that Java has been the controlling process layer. However we also know that Mathworks has been rewriting MATLAB to remove Java. That rewrite could potentially provide new opportunities for layering in remote debugging, but it could also potentially instead remove opportunities: we do not have sufficient information about the old internal structure and the rewrite process to know.
Java is being removed for several reasons:
  • the performance of Java is considered to be too slow
  • the Java object model is considered too limiting
  • Java added in a per-desktop (including per- virtual machine) license for execution; before that, Java developers had to pay a developer license but end users did not have to pay Oracle to execute programs built on Java. Oracle is known for being aggressive in license compliance, and compliance audits are an utter pain. (And remember this is per desktop or VM, so my Windows boot partition would need a separate license than my MacOS install on the same host, which would need a different license than Parallels running Windows importing from my Windows boot partition... and then there are the old boot partitions that I keep around in case I need to execute something not compatible with current releases... or because I want to test out compatibility of a new OS before committing to it. Oracle wants licenses for each of those, even though they cannot be run simultaneously...)
  • User interfaces are being rewritten in HTML5, which provides increased flexibility and customization possibilities, and which is also part of a market trend towards running software remotely such as cloud services or Software As A Service.
Note that in all this, I am not saying that progress is not happening behind the scenes (I would not know): I am saying that it is not just a simple extension to local debugging. Not having access to the address space of the workers is a Challenge.
Dusko Vujadinovic
Dusko Vujadinovic 2022년 2월 7일
Thanks a lot for very detailed and comprehensive answer. Very appreciated. I see now the scale of the problems.
Indeed it looks very difficult to support the debugger in the computer cluster environment, with Java problems etc...
But the question is, is it worth having ambitions to support all of that or maybe just to scale the requirements down to support the multi process application running on single core local computer?
I am coming for wireless modem world. I see great potential of Matlab there, with the code generation, parallel toolbox features and the fixed pointing framework support, to massively accelerate the modem development process. But this lack of debugging support with multi process applications reduces that potential.
The modem is done typically by mapping the signal processing and the control code over a few cores. Irrespective on Matlab, I do not think that people like myself expect to have a debugger running over multiple cores. It is not reasonable to run real time application and debug it by stepping through and setting breakpoints over multiple cores. The system is debugged in simulation model on single core first before deploying it on real target. The logs/traces are used for debugging on the real time target, rather than the debugger itself.
However, the simulation model is run as multi process rather that the single process application and that are the expectations for the Matlab simulation model as well. It is the multi process design that reflects the real time dynamics of the application - the multi core is another level, where I do not have that much of expectations from Matlab to help. Running the Matlab application by using multiple cores is the matter of the speeding it up "only".
Therefore, supporting Matlab debugger with multi processes on a single core only would be maximum that probably people coming from my world expect. Of course, Matlab is used in number of fields I have no clue about, hence I cannot see the big picture. So this feedback is just my small input as a Matlab customer from one specific area.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Parallel Computing Fundamentals에 대해 자세히 알아보기

제품


릴리스

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by