Issues with reproducibility in multistart with parallelization
이전 댓글 표시
I am running several model fits (1139) using multistart with parallelization. I first ran with 50 start points (including my initial guess). I then wanted to re-run with 150 start points, to compare the reduction in fval. So, when running the first 50, I batched my fits manually across a handful of nodes, and saved the rng state to a mat file. When running the 150, I batched the fits in the same way, except for a set of about 100 where that node was unavailable, and loaded the rng state.
To test the effect of the node (I think I read the node influences the number generator, and was curious what would happen with reproducibility) I ran that set of 100 fits across two nodes, loading the same state each time. In this case, I got identical outputs, identical function values.
However, I did not get good reproducibility from the 50 start point run to the 150 start point run. 27% of the 1139 fits were worse (had higher function values in a minimization problem) than the 50 start point fits. I also found that of the 1139 fits, 17% had greater than 1% higher fval, and 3% had 10% greater fval - I thought maybe its rounding or something, but this seems pretty high.
What am I missing? How can I make these fits reproducible?
답변 (1개)
I don't see why you would expect agreement between a 50-point multistart and a 150-point multistart. Only if both versions succeed in finding the global minimum would the results be guaranteed to agree.
댓글 수: 4
If you have the same initial random state, wouldn't the 50-point multistart be contained within the first 50 start points of the 150-point multistart?
I'm sure there are ways to examine directly what the starting points were in each case. However, your intuition seems to be that because 150 is an integer multiple of 50, the search domain will be subdivided in a way that includes the original 50. That reasoning doesn't necessarily work, e.g.,
linspace(0,1, 3)
linspace(0,1, 6)
Here 6 is an integer multiple of 3, but the second linspace does not include all points from the first linspace. The 0.5 is missing.
Renee
2025년 9월 2일
Matt J
2025년 9월 3일
You are quite welcome, but when/if you are convinced this is the correct answer please accept-click it.
카테고리
도움말 센터 및 File Exchange에서 Global or Multiple Starting Point Search에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!