Extracting data from pdf files

조회 수: 57 (최근 30일)
joseph Frank
joseph Frank 2014년 4월 19일
답변: Christopher Creutzig 2021년 4월 27일
Hi,
I have around 300 pdf files with 19 pages each. I want to extract from each of them a fraction of a table on page 4 in order to build a research data set. Is i possible to do so using matlab? if so,which toolboxes and functions I need. I have matlab 2013a.

채택된 답변

Kristian Gennaci
Kristian Gennaci 2014년 4월 21일
Hi Joseph,
Have you tried using this File Exchange submission?
This seems like the most promising solution. Alternatively, if you could convert the tables to an excel spreadsheet/CSV format, they can then easily be parsed using MATLAB's Excel/CSV functions:
I'll let you know if I find any other solutions.
Best,
Kristian

추가 답변 (1개)

Christopher Creutzig
Christopher Creutzig 2021년 4월 27일
JFTR, since R2017b, extractFileText('filename.pdf','Pages',4) from Text Analytics Toolbox gives you the text on ("physical") page 4 of the PDF, from which you can then extract the parts you need with string operations (extractBetween, regexp, etc.).

카테고리

Help CenterFile Exchange에서 Startup and Shutdown에 대해 자세히 알아보기

태그

제품

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by