Reproducibility convolutional neural network training with gpu
이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
이전 댓글 표시
Hello,
I am training a CNN using my local GPU (to speed up training) for classification problems and would like to try different parameterizations. To avoid the variability effects due to different data and/or weights initialization I am resetting the random seeds each time before training:
% Initialize random seed (thus same dataset on same architecture would lead
% to predictable result)
rng(0);
%parallel.gpu.rng(0, 'CombRecursive');
randStream = parallel.gpu.RandStream('CombRecursive', 'Seed', 0);
parallel.gpu.RandStream.setGlobalStream(randStream);
% Train the CNN network
net = trainNetwork(TR.data,TR.reference,layers,options);
The problem is that when using GPU I am getting different results on each execution, even if initializing the GPU random seed to the same value. Strange thing is if I use CPU instead, then I do get the reproducible results. I am doing something wrong with GPU random seed initialization? Is there a know problem for this situation or something I am missing?
Thanks beforehand.
PS: I am using Matlab R2017b
채택된 답변
Joss Knight
2018년 9월 20일
Use of the GPU has non-deterministic behaviour. You cannot guarantee identical results when training your network, because it depends on the whims of floating point precision and parallel computations of the form (a + b) + c ~= a + (b + c).
Most of our GPU algorithms are in fact deterministic but a few are not, for instance, backward convolution.
댓글 수: 14
Very interesting and good to know! Thanks you
I am encountering the same issue and I am very surprised and I should say very disappointed by Mathworks: as a Matlab user since version 3.5, I cannot imagine that people developping software can accept their code not to be reproductible? It's a jok! Mathworks has to correct this bug or to propose a solution to customers: what about moving single precision GPU code in double precision as this is now available ? (and you claim it is coming from whims of floating point precision)
Can you let us know what non-deterministic behaviour it is that you're experiencing, specifically? As far as I'm aware deep learning training is the only place this happens, and that particular behaviour is true across all the deep learning frameworks because they use the same underlying NVIDIA library that has this behaviour. Maybe there is some randomness in your particular application that we're missing?
Hello,
@Joss Knight (or any other Matlab Staff Member), my colleague reffered to this Link and said that it is now possible to acchieve deterministic results in TensorFlow for Deep Learning algorithms on the GPU.
Is this something that Matlab will be / is able to implement in the near future?
Thanks,
Barry
Joss Knight
2020년 9월 3일
편집: Joss Knight
2020년 9월 3일
I believe we have a plan to add support for deterministic training in a future release. As I say, as far as I know backward convolution and backward max-pooling are the only sources of indeterminism (other than certain kinds of parallel training) which means the problem is limited to training a deep network. If you know of other sources let me know.
@Joss Knight Repeatability and reporducibility are extremely important. How can someone even consider using MATLAB deep learning software for serious science if repeating the experiment yields slightly different results every time? I hope the plans to add deterministic behaviour to future releases happens sooner rather than later. It's unfortunate that this was not made a priority in the 2021 release
People use TensorFlow and pyTorch all the time for serious science and they have the exact same issue so I guess people don't consider it that bad a problem. You should only see this indeterminism during training which is typically initialized with random numbers anyway.
Aled Catherall
2022년 2월 4일
편집: Aled Catherall
2022년 2월 4일
@Joss Knight - Has progress been made on fixing the issue? Lack of deterministic and repeateable training is proving to be quite a problem for some applications. For example, when I make a small change to the input data or the network, I want to know if differences in my results are due to the changes I have made and not the vagaries of non-deterministic floating point arithmetic. An update on this issue would be welcome, thanks.
Also, please note that you shouldn't be using the term "random rumbers" - but rather pseudorandom numbers, since they are generated by Matlab from a deterministic algorithm and not a stochastic process (like nuclear decay)
We are working on a solution and will let you know when it lands!
Joss Knight: I'm looking forward to seeing it soon. Please hurry
@Joss Knight, can you perhaps link some references that say that backward convolution and backward max pooling are non-deterministic?
@Joss Knight have you found a solution?
I am also facing the same problem
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Parallel and Cloud에 대해 자세히 알아보기
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
