Why it is not possible to use breakpoints with parallel toolbox workers
조회 수: 9 (최근 30일)
이전 댓글 표시
Realisation that, it does not seem possible to use the breakpoints in applications using parallel toolbox workers https://uk.mathworks.com/matlabcentral/answers/459249-how-to-get-parpool-to-pause-at-breakpoints, is a massive blow, in my view.
- Why is that a problem? Is there any plans to support that?
- I understand that commonly workers get mapped to local machine cores, but the client that support breakpoints also needs to be run on one of available cores?
- So, what is the difference that is possible to run debugger on client but not on workers?
This is a massive disappointment. I cannot see how you can sell Parallel Toolbox with this restriction. People working with Matlab get used to comfort of using debugger that cannot be replaced by debugging using printfs. In bigger picture this limits Matlab to be used in smaller single threaded applications, but we know that any a bit more bigger applications, require multi threading. So I started a project as single threaded application with idea to convert it to a multi threading at some point, which just came. But after facing the breakpoints restrictions I realised that I have no arguments to "sell" this approach within the company and there is no justification to ask to buy more Parallel toolbox licences.
댓글 수: 0
채택된 답변
Walter Roberson
2022년 2월 6일
Parallel pools are not restricted to running on local cores: they can also run on compute clusters; https://www.mathworks.com/help/parallel-computing/run-code-on-parallel-pools.html
Parallel workers do not just run on "cores": each worker is associated with a different process .
The client does not have direct access to the address space of the worker: all data transfer is done by using inter-process messaging. (A different kind of messaging is used for SPMD than is used for parfor / parfeval).
It would not necessarily be impossible for Mathworks to provide a method to debug a remote process, but it would not be trivial.
댓글 수: 3
Walter Roberson
2022년 2월 6일
If I recall correctly, the communication with workers is by way of tcp for parfor and parfeval. The same mechanism is used for compute clusters and local processes.
This differs from spmd, which uses a message passing library that has an internal optimization layer that can use shared memory if the processes are on the same system.
When different processes are being used, the operating system does not allow direct access from the client to the memory of the worker, except under limited situations. Generally speaking, the limits are typically that a process is able directly access memory only of a process it has directly started on the same host, unless the starting process is a privileged process, in which case it may be able to access memory of another process on the same host.
The matter gets complicated when you are using a system that consists of several compute nodes tied together by high speed interconnects, with Non-Uniform Memory Access (NUMA): the nodes might each be running their own operating system instance, or there might be a single unified operating system. Large scale computers are hard to get right, especially if you need shared memory.
Because the same design is used for compute clusters as is used for local workers, and compute clusters are typically remote systems that you cannot directly start processes on, it is not generally possible for MATLAB to be the parent process of the workers. (And the system calls to gain access to the memory of a child process are quite different between Windows and Unix).
So we are left with two possibilities:
1. Mathworks could rewrite the code to make internal access to local workers fundamentally different than to non-local workers, and use operating system dependent debugger facilities to drive the workers in order to provide MATLAB debugger services to local workers; or 2. Mathworks could write an inter-process (probably tcp based) layer that could request access to variables and call stacks. This might potentially be somewhat easier: there just might be a fair bit of the needed framework already in place.
My memory is that historically MATLAB has been internally structured as several Java threads, that Java has been the controlling process layer. However we also know that Mathworks has been rewriting MATLAB to remove Java. That rewrite could potentially provide new opportunities for layering in remote debugging, but it could also potentially instead remove opportunities: we do not have sufficient information about the old internal structure and the rewrite process to know.
Java is being removed for several reasons:
- the performance of Java is considered to be too slow
- the Java object model is considered too limiting
- Java added in a per-desktop (including per- virtual machine) license for execution; before that, Java developers had to pay a developer license but end users did not have to pay Oracle to execute programs built on Java. Oracle is known for being aggressive in license compliance, and compliance audits are an utter pain. (And remember this is per desktop or VM, so my Windows boot partition would need a separate license than my MacOS install on the same host, which would need a different license than Parallels running Windows importing from my Windows boot partition... and then there are the old boot partitions that I keep around in case I need to execute something not compatible with current releases... or because I want to test out compatibility of a new OS before committing to it. Oracle wants licenses for each of those, even though they cannot be run simultaneously...)
- User interfaces are being rewritten in HTML5, which provides increased flexibility and customization possibilities, and which is also part of a market trend towards running software remotely such as cloud services or Software As A Service.
Note that in all this, I am not saying that progress is not happening behind the scenes (I would not know): I am saying that it is not just a simple extension to local debugging. Not having access to the address space of the workers is a Challenge.
추가 답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Parallel Computing Fundamentals에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!