How to read table from pdf

조회 수: 59 (최근 30일)
Rizwan Khan
Rizwan Khan 2020년 11월 22일
댓글: dpb 2020년 11월 27일
I have a pdf, it text within a table
I am able to read the text into a varible, but then i get a string with all the text in it.
i make use of extractFileText to read it into a string.
How can i then turn this text into a table?
I've pasted a sample of the string i read in, it has no table column names, its just actual data
So what i want to do is ignore the first to rows below and from there you see three records (lines)
Each line needs to be a row in the table, and the delimeter between each column value is the three arrows (which i think is a newline)
Weekly Gazettes 1 ↵↵↵
NEW SOUTH WALES WEEKLY ISSUE ↵ ↵↵↵
3 RIVERS ESTATE, 140 001 976 ↵↵↵374 KALKITE RD KALKITE NSW 2627 ↵↵↵Creditor: CONSULT SURVEY GRA PTY LTD ↵↵↵DEFAULT JUDGEMENT (NSW) 02/11/2020 ↵↵↵00262008/20/163, $113,237.00 ↵↵↵
ABCD PROJECTS, 618 354 331 ↵↵↵8 17 GARTMORE AVE BANKSTOWN NSW 2200 ↵↵↵Creditor: WORKERS COMPENSATION NOMINAL I ↵↵↵DEFAULT JUDGEMENT (NSW) 03/11/2020 ↵↵↵00063818/20/METN, $2,553.00 ↵↵↵
ABOUT CONCRETE CONSTRUCTIONS, 156 080 241 ↵↵↵46 NEW HORIZON AVE BAHRS SCRUB QLD 4207 ↵↵↵Creditor: HUSQVARNA AUSTRALIA PTY LTD ↵↵↵DEFAULT JUDGEMENT (NSW) 03/11/2020 ↵↵↵00223837/20/3, $1,298.00 ↵↵↵
AC SHOPFITTING SPECIALIST, 635 292 376 ↵↵↵12 CURTIN ST CABRAMATTA NSW 2166 ↵↵↵Creditor: WORKERS COMPENSATION NOMINAL I ↵↵↵DEFAULT JUDGEMENT (NSW) 06/11/2020 ↵↵↵00266709/20/METN, $5,191.00 ↵↵↵
ACN 607735080, 607 735 080 ↵↵↵14 BARNES ST WOOLGOOLGA NSW 2456 ↵↵↵Creditor: BIDFOOD AUSTRALIA LTD ↵↵↵DEFAULT JUDGEMENT (NSW) 02/11/2020 ↵↵↵00271889/20/METN, $9,891.00 ↵↵↵
  댓글 수: 6
Stephen23
Stephen23 2020년 11월 24일
"i guess this remains an open issue, and unsure how to resolve."
You could upload a .mat file of the imported data, just as dpb requested here.
Rizwan Khan
Rizwan Khan 2020년 11월 25일
Dear Sir,
if we see my text i pasted from teh variable.
Then, each of those arrows represents a new variable.
How can i loop through them using that arrow (left arrow) as a delimeter?
The record completed after the currency.
So the problem no longer is how i read pdf, i am doing that, the problem now is, how do i loop through that str which has all the pdf content?

댓글을 달려면 로그인하십시오.

답변 (1개)

Mathieu NOE
Mathieu NOE 2020년 11월 23일
hello
I don't know where the function extractFileText comes from
So I'd did it my way : converted the pdf in excel file (on internet) and then was very easy:
T = readtable('weekly-gazettes-12-11-20-converti.xlsx');
C = table2cell(T)
C =
133×2 cell array
{'ABCD PROJECTS, 618 …'} {'DEFAULT JUDGEMENT (…'}
{'ABOUT CONCRETE CONS…'} {'DEFAULT JUDGEMENT (…'}
{'AC SHOPFITTING SPEC…'} {'DEFAULT JUDGEMENT (…'}
{'ACN 607735080, 607 …'} {'DEFAULT JUDGEMENT (…'}
{'ACP ACCOUNTANTS & C…'} {'DEFAULT JUDGEMENT (…'}
etc......
  댓글 수: 9
Rizwan Khan
Rizwan Khan 2020년 11월 26일
Thanks Mathieu,
Why do i need to use regular expressions, if i have a common delimeter between each variable?
Can i somehow just use the delimeter?
dpb
dpb 2020년 11월 27일
Sure. See split

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Text Files에 대해 자세히 알아보기

제품


릴리스

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by