Issues with reproducibility in multistart with parallelization

I am running several model fits (1139) using multistart with parallelization. I first ran with 50 start points (including my initial guess). I then wanted to re-run with 150 start points, to compare the reduction in fval. So, when running the first 50, I batched my fits manually across a handful of nodes, and saved the rng state to a mat file. When running the 150, I batched the fits in the same way, except for a set of about 100 where that node was unavailable, and loaded the rng state.
To test the effect of the node (I think I read the node influences the number generator, and was curious what would happen with reproducibility) I ran that set of 100 fits across two nodes, loading the same state each time. In this case, I got identical outputs, identical function values.
However, I did not get good reproducibility from the 50 start point run to the 150 start point run. 27% of the 1139 fits were worse (had higher function values in a minimization problem) than the 50 start point fits. I also found that of the 1139 fits, 17% had greater than 1% higher fval, and 3% had 10% greater fval - I thought maybe its rounding or something, but this seems pretty high.
What am I missing? How can I make these fits reproducible?

답변 (1개)

Matt J
Matt J 2025년 9월 2일
편집: Matt J 2025년 9월 2일

1 개 추천

I don't see why you would expect agreement between a 50-point multistart and a 150-point multistart. Only if both versions succeed in finding the global minimum would the results be guaranteed to agree.

댓글 수: 4

Renee
Renee 2025년 9월 2일
편집: Renee 2025년 9월 2일
I thought the minimum in a 150-point multistart would be equal to or less than the minimum in the 50-point multistart, if the initial random state was the same - do you think not? If you have the same initial random state, wouldn't the 50-point multistart be contained within the first 50 start points of the 150-point multistart?
It's not changing my results much, I want to make sure I understand for reproducibility purposes!
On second thought, maybe the start points are determined such that the sets would be different.
If you have the same initial random state, wouldn't the 50-point multistart be contained within the first 50 start points of the 150-point multistart?
I'm sure there are ways to examine directly what the starting points were in each case. However, your intuition seems to be that because 150 is an integer multiple of 50, the search domain will be subdivided in a way that includes the original 50. That reasoning doesn't necessarily work, e.g.,
linspace(0,1, 3)
ans = 1×3
0 0.5000 1.0000
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
linspace(0,1, 6)
ans = 1×6
0 0.2000 0.4000 0.6000 0.8000 1.0000
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
Here 6 is an integer multiple of 3, but the second linspace does not include all points from the first linspace. The 0.5 is missing.
That must be it - I was assuming the start points were random for some reason. This makes sense, thank you.
You are quite welcome, but when/if you are convinced this is the correct answer please accept-click it.

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Global or Multiple Starting Point Search에 대해 자세히 알아보기

질문:

2025년 9월 2일

댓글:

2025년 9월 3일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by