Reading/fetching text from text/PDF file for pre-processing
이전 댓글 표시
I have text/pdf files which contains millions of words(text). If i use str = extractFileText(filename) then firstly matlab became very slow also some time hancked . Also variable is not able to hold such a large data.
I want to read file word by word so i can filter text and make a smaller array of filtered data. Or i want to make filtered data temp file for next processing of data(as t will be small).
i need help in this also if you have any other solution of my probelm do reply.
댓글 수: 2
Did you try using the function with name, value pair?
for i = 1:numel(pages)
str = extractFileText(filename, 'pages', pages(i)); % get only one page per time
% do whatever you want with str
end
moin khan
2021년 3월 21일
채택된 답변
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Text Data Preparation에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!