Convert a table in a pdf to a MATLAB cell structure

조회 수: 11 (최근 30일)
Charles D'Onofrio
Charles D'Onofrio 2021년 1월 29일
답변: Suraj 2023년 3월 29일
I have a pdf file that contains an Nx9 table of data that I need to turn into a matlab cell structure of an excel file. Some of the (row,column) entries are blank.
So far, I have tried reading the pdf using:
txt = extractFileText('filename.pdf');
This produces a 1x1 string file with multiple spaces breaking up rows in a seemingly random order. The (row,column) combinations do not appear in a logical position in txt. Is there another command that can read a PDF table?
  댓글 수: 4
dpb
dpb 2021년 1월 30일
Which is, it seems, what the scraping utilities do...get the boundaries of the table as rendered and then suck that area up.
Sim
Sim 2023년 3월 12일
I have the same problem @Charles D'Onofrio @dpb @Stephen23...
The following function is not really helpful when a PDFs contains tables with blank cells:
txt = extractFileText('filename.pdf');
Has a new tool been created in the meantime, i.e. between January 2021 and today, middle of March 2023 ?

댓글을 달려면 로그인하십시오.

답변 (2개)

the cyclist
the cyclist 2023년 3월 12일
I can strongly recommend using Tabula to first extract the table from the PDF file. Then use a MATLAB function (e.g. readtable) to bring the Tabula output into MATLAB.
  댓글 수: 2
Sim
Sim 2023년 3월 12일
편집: Sim 2023년 3월 13일
Thanks a lot @the cyclist! Do you know if Tabula is safe in terms of privacy and confidentiality?
the cyclist
the cyclist 2023년 3월 13일
I've haven't used it for data that I would have privacy concerns about, but I think there are strong reasons to believe it is safe:
  • It's open-source, so you can see all the code on github
  • It doesn't seem to send your data anywhere else. Although it might seem like it is sending your data to a web site, it looks to me like it only opens a local browser window.
  • It was first built by journalists, who tend to care about privacy (at least of their own data!)

댓글을 달려면 로그인하십시오.


Suraj
Suraj 2023년 3월 29일
Hi Charles
Your question seems very similar to one I've answered recenlty. Please have a look at this answer.
Hope this helps.

카테고리

Help CenterFile Exchange에서 Tables에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by