Help!!! how to search for some xx xx xx xx(hex) in a dat file very fast!!!
조회 수: 8 (최근 30일)
이전 댓글 표시
Help!!! I have a dat file, about 40MB,
I want to search for xx xx xx xx (hex),
I can do it using for or while loop, but it's too slow because of 40 million Bytes !
how to speed up,thanks!
채택된 답변
Guillaume
2019년 6월 30일
편집: Guillaume
2019년 6월 30일
Unlike per isakson, I'm assuming that you're looking for a byte pattern (given in hexadecimal format) in a binary file. If you're looking for a pattern of hexadecimal characters in a text file see per's answer.
%input
hexpattern = ['41'; 'AB'; 'FF'; '7E']; %you haven't specified how this is stored. Taking a guess
filetosearch = 'C:\somewhere\somefolder\somefile.dat'; %doesn't have to have .dat extension
%read file
fid = fopen(filetosearch, 'r');
assert(fid > 0, 'Failed to open file. Most likely the wrong path was specified');
filecontent = fread(fid, [1 Inf], '*uint8'); %read all bytes at once
fclose(fid);
%pattern search
patternvalues = hex2dec(hexpattern);
patternlocation = strfind(filecontent, patternvalues); %despite its name strfind also works for numbers
sprintf('Hex pattern was found at byte(s) %s', strjoin(compose('%d', patternlocation), ', '));
edited as I got per isakson and dpb mixed up
추가 답변 (2개)
per isakson
2019년 6월 30일
편집: per isakson
2019년 6월 30일
Your question is very vaque and leaves room for interpretation.
I assume that dat-file is an ordinary text file. I cannot guess in what form you want the hex-strings, which are found.
However, I made a little test
- created a 10MB text file, cssm.txt
- created a script, cssm.m
%%
tic
txt = fileread( 'cssm.txt' );
toc
%%
tic
cac = regexp( txt, '([0-9A-F]{2} ){3}[0-9A-F]{2}', 'match' );
toc
- ran cssm
Elapsed time is 0.133106 seconds.
Elapsed time is 0.357219 seconds.
- and peeked at the result
>> cac{[1,2,3601]}
ans =
'01 23 45 67'
ans =
'89 AB CD EF'
ans =
'01 23 45 67'
>>
I doubt that you can do it significantly faster with plain Matlab on a standard desktop PC
Triggered by Guillaume's answer: To get the locations of the hex-strings replace
cac = regexp( txt, '([0-9A-F]{2} ){3}[0-9A-F]{2}', 'match' );
by
[cac,loc] = regexp( txt, '([0-9A-F]{2} ){3}[0-9A-F]{2}', 'match', 'start' );
and peek
>> loc([1,2,3601])
ans =
33 2793 9936083
댓글 수: 0
dpb
2019년 6월 30일
편집: dpb
2019년 6월 30일
If it's performance you're looking for, pass the job off to a grep utility...there are any number of freeware versions available for Windows if not one already installed on your system...
ADDENDUM
Altho seem to now recall there may be a FEX submission in mex form...I didn't search to see if really is, but suggest probably worth doing so...
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Data Import and Export에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!