MATLAB Answers

Help!!! how to search for some xx xx xx xx(hex) in a dat file very fast!!!

조회 수: 5(최근 30일)
Eric Jiang
Eric Jiang 29 Jun 2019
Commented: Eric Jiang 3 Jul 2019
Help!!! I have a dat file, about 40MB,
I want to search for xx xx xx xx (hex),
I can do it using for or while loop, but it's too slow because of 40 million Bytes !
how to speed up,thanks!

  댓글 수: 2

로그인 to comment.

채택된 답변

Guillaume
Guillaume 30 Jun 2019
Guillaume 님이 편집함. 30 Jun 2019
Unlike per isakson, I'm assuming that you're looking for a byte pattern (given in hexadecimal format) in a binary file. If you're looking for a pattern of hexadecimal characters in a text file see per's answer.
%input
hexpattern = ['41'; 'AB'; 'FF'; '7E']; %you haven't specified how this is stored. Taking a guess
filetosearch = 'C:\somewhere\somefolder\somefile.dat'; %doesn't have to have .dat extension
%read file
fid = fopen(filetosearch, 'r');
assert(fid > 0, 'Failed to open file. Most likely the wrong path was specified');
filecontent = fread(fid, [1 Inf], '*uint8'); %read all bytes at once
fclose(fid);
%pattern search
patternvalues = hex2dec(hexpattern);
patternlocation = strfind(filecontent, patternvalues); %despite its name strfind also works for numbers
sprintf('Hex pattern was found at byte(s) %s', strjoin(compose('%d', patternlocation), ', '));
edited as I got per isakson and dpb mixed up

  댓글 수: 3

Eric Jiang
Eric Jiang 1 Jul 2019
thanks,you got what i meant。because the pc I use is a little bit old,when i try to read all bytes at once, it came out a out of memory error, how to solve this problem? i dont understand, the dat file is 40MB, howcome it would be out of memory?
Guillaume
Guillaume 1 Jul 2019
Did you use the code in my answer? You shouldn't get an out of memory error with it for a 40 MB file.
What does
memory
say?
Eric Jiang
Eric Jiang 3 Jul 2019
thank you!!!, you code is very usefull, but another problem came out, when i use strfind,the out of memory error happened, i guess the algrithm of strfind needs lots of memory to do the job, how to solve this problem?

로그인 to comment.

More Answers (2)

per isakson
per isakson 30 Jun 2019
per isakson 님이 편집함. 30 Jun 2019
Your question is very vaque and leaves room for interpretation.
I assume that dat-file is an ordinary text file. I cannot guess in what form you want the hex-strings, which are found.
However, I made a little test
  • created a 10MB text file, cssm.txt
  • created a script, cssm.m
%%
tic
txt = fileread( 'cssm.txt' );
toc
%%
tic
cac = regexp( txt, '([0-9A-F]{2} ){3}[0-9A-F]{2}', 'match' );
toc
  • ran cssm
Elapsed time is 0.133106 seconds.
Elapsed time is 0.357219 seconds.
  • and peeked at the result
>> cac{[1,2,3601]}
ans =
'01 23 45 67'
ans =
'89 AB CD EF'
ans =
'01 23 45 67'
>>
I doubt that you can do it significantly faster with plain Matlab on a standard desktop PC
Triggered by Guillaume's answer: To get the locations of the hex-strings replace
cac = regexp( txt, '([0-9A-F]{2} ){3}[0-9A-F]{2}', 'match' );
by
[cac,loc] = regexp( txt, '([0-9A-F]{2} ){3}[0-9A-F]{2}', 'match', 'start' );
and peek
>> loc([1,2,3601])
ans =
33 2793 9936083

  댓글 수: 0

로그인 to comment.


dpb
dpb 30 Jun 2019
dpb 님이 편집함. 30 Jun 2019
If it's performance you're looking for, pass the job off to a grep utility...there are any number of freeware versions available for Windows if not one already installed on your system...
ADDENDUM
Altho seem to now recall there may be a FEX submission in mex form...I didn't search to see if really is, but suggest probably worth doing so...

  댓글 수: 0

로그인 to comment.

이 질문에 답변하려면 로그인을(를) 수행하십시오.


Translated by