Are there any faster alternatives to readlines?

Question

Rahim Zaman 2025년 2월 6일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2173741-are-there-any-faster-alternatives-to-readlines

답변: Steve Eddins 2025년 2월 7일

I have a MATLAB script that uses readlines to get the input from two separate short text files, the first one a single time and the second one multiple times in a for loop. I am using the profiler tool to optimize the runtime of my script and readlines currently takes 46% of the time (18 s) for my script to run. Are there any faster alternatives to readlines to shorten the runtime?

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Steve Eddins 2025년 2월 6일

Can you share more details with us? Why is the second file being read more than once? What is the size of the file being read in the loop, and how many text lines does it contain? How many times does the loop execute? Are you able to post a sample file?

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Steve Eddins 2025년 2월 7일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2173741-are-there-any-faster-alternatives-to-readlines#answer_1559249

MATLAB Online에서 열기

Try this:

Read the file using fileread.
Convert to string.
Call split.

Using Walter's idea for a sample text file:

filename = fullfile(matlabroot,"license_agreement.txt");
chars = fileread(filename);
text = string(chars);
lines = split(text,newline);
whos lines
  Name          Size             Bytes  Class     Attributes

  lines      1627x1             235436  string              
lines(1:5)
ans = 5x1 string array
    "The MathWorks, Inc.  Software License Agreement "
    ""
    "IMPORTANT NOTICE"
    ""
    "THIS IS THE SOFTWARE LICENSE AGREEMENT (THE "AGREEMENT") OF THE MATHWORKS, INC."

See my comment under Walter's example for detailed timing comparisons.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 2

Walter Roberson 2025년 2월 6일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2173741-are-there-any-faster-alternatives-to-readlines#answer_1559234

MATLAB Online에서 열기

For the purposes of the below test, I will assume that it is important that the text be split into lines, but that it is not important whether those lines are represented as a string array or as a cell array of character vectors.

filename = fullfile(matlabroot,"license_agreement.txt");
tic; S1 = readlines(filename); t1 = toc; whos S1
  Name         Size             Bytes  Class     Attributes

  S1        1627x1             235436  string              
tic; S2 = fileread(filename); S2a = regexp(S2, '\r?\n', 'split'); t2 = toc; whos S2a
  Name      Size               Bytes  Class    Attributes

  S2a       1x1627            360296  cell               
tic; fid = fopen(filename); S3 = fread(fid, [1 inf], '*char'); fclose(fid); S3a = regexp(S3, '\r?\n', 'split'); t3 = toc; whos S3a
  Name      Size               Bytes  Class    Attributes

  S3a       1x1627            360296  cell               
tic; fid = fopen(filename); S4 = textscan(fid, '%s', 'Delimiter', '\n'); fclose(fid); S4a = S4{1}; t4 = toc; whos S4a
  Name         Size             Bytes  Class    Attributes

  S4a       1626x1             354016  cell               
format long g
[t1; t2; t3; t4]
ans = 4×1
                  0.195489
                  0.004772
                  0.001693
                   0.00203
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

So fastest is fread() followed by splitting. Second fastest is textscan(). Third fastest is fileread() followed by splitting. Slowest by a noticable amount is readlines.

tic; S3b = string(S3a); toc
Elapsed time is 0.000439 seconds.

If string representation is necessary, converting from cell array of character vector to string takes a small but measureable time.

Note: textscan() is handling end-of-file slightly differently than the alternatives. The issue comes about because the file ends in a newline. textscan() eats the final newline and then looks for more content and does not find it, and declares that the file has finished. The alternatives on the other hand treat the newline as a separator and split at the newline, and so end up with a final empty string.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Steve Eddins 2025년 2월 7일

MATLAB Online에서 열기

Walter, I think your readlines result may be suffering from first-time measurement effects.

I took your very nice collection of methods, put them into functions, and timed them using timeit. Looks like readlines is slower, by roughly 2-3x. The other three methods you suggested are all in the same neighborhood and faster than readlines.

Another variation, fileread_string_split_method, looks like it may be faster than the others. The steps:

Read the file using fileread.
Convert to string.
Call split.

I know that the dev team who worked on string and its methods put a lot of effort into optimizing things.

filename = fullfile(matlabroot,"license_agreement.txt");
f_readlines_method = @() readlines_method(filename);
f_fileread_regexp_method = @() fileread_regexp_method(filename);
f_fread_regexp_method = @() fread_regexp_method(filename);
f_textscan_method = @() textscan_method(filename);
f_fileread_string_split_method = @() fileread_string_split_method(filename);
t_readlines_method = timeit(f_readlines_method)
t_readlines_method = 0.0025
t_fileread_regexp_method = timeit(f_fileread_regexp_method)
t_fileread_regexp_method = 0.0010
t_fread_regexp_method = timeit(f_fread_regexp_method)
t_fread_regexp_method = 9.6474e-04
t_textscan_method = timeit(f_textscan_method)
t_textscan_method = 0.0014
t_fileread_string_split_method = timeit(f_fileread_string_split_method)
t_fileread_string_split_method = 7.2574e-04
function out = readlines_method(filename)
out = readlines(filename);
end
function out = fileread_regexp_method(filename)
out = regexp(fileread(filename), '\r?\n', 'split');
end
function out = fread_regexp_method(filename)
fid = fopen(filename); 
chars = fread(fid, [1 inf], '*char'); 
fclose(fid); 
out = regexp(chars, '\r?\n', 'split');
end
function out = textscan_method(filename)
fid = fopen(filename); 
out_cell = textscan(fid, '%s', 'Delimiter', '\n'); 
fclose(fid); 
out = out_cell{1};
end
function out = fileread_string_split_method(filename)
out = split(string(fileread(filename)),newline);
end

댓글을 달려면 로그인하십시오.

Are there any faster alternatives to readlines?

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

채택된 답변

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

추가 답변 (1개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Are there any faster alternatives to readlines?

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

채택된 답변

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

추가 답변 (1개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기