Reading and Writing to PDF files in Python is quite easy, we have different libraries or packages in Python which can help us to achieve our task. In this article, I will show you how to read PDF files in Python using PyPDF2 package.
In case you are new to automation then do check our Selenium tutorial which covers everything from basic till advance.
Official Link for PyPDF2 https://pypi.org/project/PyPDF2/
How To Read PDF Files In Python Using PyPDF2 Library
Step 1- Install PyPDF2
pip install PyPDF2
Step 2- Write the below code which can help you read pdf
import PyPDF2 #Open File in read binary mode file=open("sample.pdf","rb") # pass the file object to PdfFileReader reader=PyPDF2.PdfFileReader(file) # getPage will accept index page1=reader.getPage(0) # numPage will return number of pages in pdf print(reader.numPages) #extractText will return the text pdfData=page1.extractText() #print the data print(pdfData) page2=reader.getPage(1) print("Data from page 2",page2.extractText())
Add assert to verify the PDF content
import PyPDF2 file=open("sample.pdf","rb") reader=PyPDF2.PdfFileReader(file) page1=reader.getPage(1) pdfData=page1.extractText() print(pdfData) # asserting the keyword in PDFData which is retured from PDF assert "boring" in pdfData assert "Mukesh" in pdfData
I hope this post was useful to you. Keep learning.
Leave a Reply