How can I convert a scanned PDF to an image using MATLAB?

조회 수: 323(최근 30일)
How can I import a scanned PDF into MATLAB and convert it to image files?
I tried to use extractFileText() from Text Analytics Toolbox, but it only works for native PDFs and not scanned PDFs:
>> extractFileText('example.pdf')
ans =
<missing>

채택된 답변

MathWorks Support Team
MathWorks Support Team 2021년 1월 5일
MATLAB ships with the Apache PDFBox Java library which allows importing and rendering PDF files. Use the following MATLAB function PDFtoImg() to import a scanned PDF, and save each page as a separate PNG file:
function images = PDFtoImg(pdfFile)
import org.apache.pdfbox.*
import java.io.*
filename = fullfile(pwd,pdfFile);
jFile = File(filename);
document = pdmodel.PDDocument.load(jFile);
pdfRenderer = rendering.PDFRenderer(document);
count = document.getNumberOfPages();
images = [];
for ii = 1:count
    bim = pdfRenderer.renderImageWithDPI(ii-1, 300, rendering.ImageType.RGB);
    images = [images (filename + "-" +"Page" + ii + ".png")];
    tools.imageio.ImageIOUtil.writeImage(bim, filename + "-" +"Page" + ii + ".png", 300);
end
document.close()
Notes:
1. It is important to split the input PDF data into images for each PDF page. For example, if “example.pdf” contains 13 pages, then we should convert 13 pages to 13 images.
2. For subsequent OCR tasks, is important to render the PDF pages with 300 dpi or higher resolution:\n
>> bim = pdfRenderer.renderImageWithDPI(ii-1, 300, rendering.ImageType.RGB);
  댓글 수: 14
Karolina Charaziak
Karolina Charaziak 2022년 4월 7일
편집: Karolina Charaziak 2022년 4월 7일
For a change I keep getting this error:
Error using PDFtoImg Too many input arguments.
I am using script as posted on mathworkswebinar (weirdly it works when i run it as live script but i get an error when I just copy paste it to the command line)

댓글을 달려면 로그인하십시오.

추가 답변(1개)

Zhongcheng Sun
Zhongcheng Sun 2022년 5월 15일
I am facing an error with line "pdmodel.PDDocument.load(jFile)", which is "Arguments must contain strings."
could you pls help solve this?

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by