Regular expressions on uint8 or single byte characters
조회 수: 5 (최근 30일)
이전 댓글 표시
I have a 200 MB text file encoded in UTF-8. My maximum array size is around 350 MB, so I can safely read it in using fread('filename','*uint8'). For using regular expressions, I need to turn this into a char array, which blows up the array size by at least a factor of two (depending on encoding, but for my application I can ignore all fancy characters), and thus leads to an "out of memory" error.
I wrote some code that breaks up the original array, so that the matching of the regular expressions works on smaller chunks, but I am still wondering: Can I somehow run regular expressions on the uint8 array? Or is there a char-like variable type that only uses 1 byte per character?
댓글 수: 5
dpb
2013년 8월 26일
Instead of 'unit8', try 'uchar' Not sure it'll help but it is at least a character class, not an integer.
Cedric
2013년 8월 27일
편집: Cedric
2013년 8월 27일
Actually, it is simpler to ask what you are trying to match instead of the pattern (copy/paste of chunk of file content or string, and an explanation of what you want to extract). With a little luck, we can perform this using STRFIND (which works on uint8 arrays) or some numeric test on uint8's.
답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Characters and Strings에 대해 자세히 알아보기
제품
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!