Making use of multiple harddrives to avoid IO bottlenecks?

조회 수: 2 (최근 30일)
Science Machine
Science Machine 2022년 6월 20일
댓글: Science Machine 2022년 6월 21일
I am reading in a lot of data (1.5 terabyte). So I would like to minimize disk IO.
  • I have 4 NVME drives (2 tb each)
  • 'a lot' of ram (okay, a lot = 128 gb, which could mean not that much in fact)
  • I have data that I would like to postprocess in matlab
  • I am using parfor loops to read data
Typically, I would put all the data on 1 drive. Even though NVME drives IO is quite quick (~4000 mb/s), my question is:
  • Would it make sense to distribute the (to be postprocessed data) on all 4 drives, which would then be read in by matlab, in order to minimize IO bottlenecks?

채택된 답변

Walter Roberson
Walter Roberson 2022년 6월 20일
You should ideally distribute the data to different drives and distribute the drives to different controllers.
However you might be constrained by your architecture. I seem to recall having read about some architectures that could only handle three full-width PCIx and the fourth one had to run at half speed. You also need to take into account that the other drives on your system will need some lanes. PCIx cannot allocate (for example) 12 lanes for one device, and 2 for each of two other devices for a total of 16: if I recall correctly, you can only allocate powers of 2 - so the first device could get 8, and the other 2 each, with the remaining 4 unused.
You might be interested in some of the Linustech videos, as in some of them he shows difficulty in maxing out drives.
The reviews seem to say that in the mass pro market these days (not very low volume specialty manufacturers), the Samsung 9x0 are close to the best read rates (not always the best write rates compared some of the small manufacturers).
While I am on the topic: anyone using external enclosures and needing high performance, should look seriously at some of Thunderbolt 4 NAS or DAS. The performance ratings for the well designed enclosures are sometimes several times what you would get from the low cost mass market drives.
  댓글 수: 3
Walter Roberson
Walter Roberson 2022년 6월 21일
If the cluster is cloud computing that is emulating drives over some internal layer, then that is probably something that would require getting a specific service agreement for separate hardware.
If the cluster can give you multiple drives each on separate controllers, you would typically prefer that. If you are using spinning platter drives, then two drives per controller is commonly the most efficient.
Science Machine
Science Machine 2022년 6월 21일
Great, thanks 😊

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 MATLAB에 대해 자세히 알아보기

태그

제품


릴리스

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by