How To Read PDF Files In Python Using PyPDF2 Library

Reading and Writing to PDF files in Python is quite easy, we have different libraries or packages in Python which can help us to achieve our task. In this article, I will show you how to read PDF files in Python using PyPDF2 package.

In case you are new to automation then do check our Selenium tutorial which covers everything from basic till advance.

Official Link for PyPDF2 https://pypi.org/project/PyPDF2/

How To Read PDF Files In Python Using PyPDF2 Library

Step 1- Install PyPDF2

pip install PyPDF2

Step 2- Write the below code which can help you read pdf

import PyPDF2
#Open File in read binary mode
file=open("sample.pdf","rb")

# pass the file object to PdfFileReader
reader=PyPDF2.PdfFileReader(file)

# getPage will accept index
page1=reader.getPage(0)

# numPage will return number of pages in pdf
print(reader.numPages)

#extractText will return the text
pdfData=page1.extractText()

#print the data
print(pdfData)

page2=reader.getPage(1)

print("Data from page 2",page2.extractText())

Add assert to verify the PDF content

import PyPDF2

file=open("sample.pdf","rb")

reader=PyPDF2.PdfFileReader(file)

page1=reader.getPage(1)

pdfData=page1.extractText()

print(pdfData)

# asserting the keyword in PDFData which is retured from PDF
assert "boring" in pdfData

assert "Mukesh" in pdfData

I hope this post was useful to you. Keep learning.

How To Read PDF Files In Python Using PyPDF2 Library

How To Read PDF Files In Python Using PyPDF2 Library

Add assert to verify the PDF content

Categories

Recent Post

Top Posts & Pages

How To Read PDF Files In Python Using PyPDF2 Library

Add assert to verify the PDF content

Reader Interactions

Leave a Reply Cancel reply

Footer

Categories

Recent Post

Top Posts & Pages