slicing a char array seems to internally allocate far more memory than expected

조회 수: 3 (최근 30일)
My laptop has 64G ram.
I am reading a large text file (> 8G) with s=fileread(filename).
The resulting char array is
K>> whos('s')
Name Size Bytes Class Attributes
s 1x8656516488 17313032976 char
Grabbing everything but the first, e.g. 200 characters results in this:
K>> v = s(201:end);
Requested 8656516288x1 (64.5GB) array exceeds maximum array size preference. Creation of arrays greater than this limit may take a long time and cause MATLAB to become unresponsive.
See array size limit or preference panel for more information.
It seems that slicing a char array allocates 8x(number of chars). Is this expected?
[[Yes, I know of plenty of workarounds -- datastores, reading it pieces, fscanfs, etc., but this code is already a workaround to deal with the glacial speed of readtable(), so, I don't have to have it working, it's just somewhat surprising]]

채택된 답변

Walter Roberson
Walter Roberson 2022년 2월 16일
That indexing 201:end is not internally optimized as passing subsref 'type', '()', 'subs', {{201, ':', 8656516488}}) and expecting the internal implementation of subsref to handle the range.
What is done instead is that the colon operator 201:8656516488 is executed, producing a double precision array of length 8656516288 to hold the indices, and that array is passed to subsref. But that array of indices is 64 gigabytes...
Copy the first 200 elements out of the array, and then
s(1:200) = [] ;
which only needs creating a vector of length 200 of double precision indices.
  댓글 수: 3
Dmitry Kaplan
Dmitry Kaplan 2022년 2월 17일
Thank you. That's exactly the road I was following. Matlab is getting better at the "tens of gigabytes", but isn't quite there with the "hundreds of gigs or more" yet. Multithreaded C (or typescript) shreds through that much data, but yes, that's an unfair comparison. Thank you again for your help.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Sourav Karmakar
Sourav Karmakar 2022년 2월 16일
Hey Dmitri,
I've tried to create a char array of size 1x8656516488, but as it exceeds the maximum array size, it throws an error message. For example,
>> s = char(1:8656516488);
Requested 8656516488x1 (64.5GB) array exceeds maximum array size preference (48.0GB). This might cause
MATLAB to become unresponsive.
As you have 64G RAM , it should produce the same error. Try creating the 's' char array in your matlab workspace and check whether you are getting the same error message or not. Because, slicing only 200 chars from the array, does not change significantly in the memory( see difference between sizes of 's' and 'v' ).
You can refer to the following document for more reference:
Hope this helps!
  댓글 수: 1
Dmitry Kaplan
Dmitry Kaplan 2022년 2월 16일
편집: Dmitry Kaplan 2022년 2월 16일
Thank you. I belive that Walter's answer below is getting to the crux of the problem. The real issue is that I can easily allocate two 8G int8 arrays, I simply can't copy huge pieces of one into another without the internal allocation of huge double arrays to hold the indices.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Logical에 대해 자세히 알아보기

제품


릴리스

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by