필터 지우기
필터 지우기

Loading large binary files in Matlab, quickly

조회 수: 52 (최근 30일)
Josh
Josh 2018년 8월 20일
댓글: OCDER 2018년 9월 12일
I have some pretty massive data files (256 channels, on the order of 75-100 million samples) in int16 format. It is written in flat binary format, so the structure is something like: CH1S1,CH2S1,CH3S1 ... CH256S1,CH1S2,CH2S2,...
I need to read in each channel separately, filter and offset correct it, then save. My current bottleneck is loading each channel, which takes about 7-8 minutes... scale that up 256 times, and I'm looking at nearly 30 hours just to load the data! I am trying to intelligently use fread, to skip bytes as I read each channel; I have the following code in a loop over all 256 channels to do this:
offset = i - 1;
fseek(fid,offset*2,'bof');
dat = fread(fid,[1,nSampsTotal],'*int16',(nChan-1)*2);
Reading around, this is typically the fastest way to load parts of a large binary file, but is the file simply too large to do this any faster? Any suggestions would be much appreciated!
System details: MATLAB 2017a, Windows 7, 64bit
  댓글 수: 4
dpb
dpb 2018년 8월 20일
How much RAM do you actually have? Sounds like the performance hit is probably that you're running into actually being swapped in/out of virtual memory; fread is pretty quick for straight data transfer to/from memory.
Is the processing required dependent upon having the whole timeseries in memory or can you do it piecewise on each channel?
You may just have a system limitation here...
Josh
Josh 2018년 8월 20일
편집: Josh 2018년 8월 20일
We have 32GB installed, but it is a shared computer, so the availability varies.
For the processing, I only need one channel at a time, but for the filtering and offset correction I'm doing it's necessary to have the entire timeseries per channel, to avoid filtering artifacts that might arise from splitting the timeseries.
I'm a bit confused about the RAM allocation, though. As I'm only trying to load in a subset of the of data (using the "skip" parameter in fread), it should definitely be doable from a RAM standpoint... (for the 37GB file I'm testing now, 1 channel out of the 256 should only be 149MB). Unless the 'skip' function of fread allocates memory in a way that I don't know of?

댓글을 달려면 로그인하십시오.

채택된 답변

OCDER
OCDER 2018년 8월 20일
Seems like you have to use stream processing. Essentially load N frames of data for 256 channels, do the processing, save the frame, and repeat until done. Trying to do channel by channel by skipping 256 channel x 2 bytes seems slow. Here are some example for how to set that up.
The other option is to buy >64 GB RAM.
  댓글 수: 7
OCDER
OCDER 2018년 8월 21일
Nice! Glad it worked!
OCDER
OCDER 2018년 9월 12일
Hi Livio, to get an answer for your problem, please create a separate Question post instead of responding to this thread that is closed (answer is accepted).
Also, in your new Question post, format your code by selecting your code and pushing the {}Code button.
this is how to format code
for j = 1
end

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Direction of Arrival Estimation에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by