• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Home
  • Programming Languages
    • Java Tutorials
    • Python Tutorials
    • JavaScript Tutorials
  • Automation Tools and Different Tools
    • Web Automation
      • Selenium with Java
        • Selenium Basic
        • Selenium Advance
        • Selenium Realtime
        • Framework
        • Selenium Interview
        • Selenium Videos
        • Selenium with Docker
      • Selenium with Python
      • WebdriverIO
        • Selenium Webdriver C# Tutorial
      • Cypress
      • Playwright
    • TestNG
    • Cucumber
    • Mobile Automation
      • Appium
    • API Testing
      • Postman
      • Rest Assured
      • SOAPUI
    • testRigor
    • Katalon
    • TestProject
    • Serenity BDD
    • Gradle- Build Tool
    • RPA-UiPath
    • Protractor
    • Windows Automation
  • Automation For Manual Testers
  • Services
  • Online Training
  • Contact us
  • About me
  • Follow us
    • Linkedin
    • Facebook Group
    • Facebook Page
    • Instagram

Automation

Selenium WebDriver tutorial Step by Step

You are here: Home / Basic Selenium / How To Read PDF Files In Python Using PyPDF2 Library

How To Read PDF Files In Python Using PyPDF2 Library

October 12, 2020 by Mukesh Otwani Leave a Comment

How To Read PDF Files In Python Using PyPDF2 Library

Reading and Writing to PDF files in Python is quite easy, we have different libraries or packages in Python which can help us to achieve our task. In this article, I will show you how to read PDF files in Python using PyPDF2 package.

In case you are new to automation then do check our Selenium tutorial which covers everything from basic till advance.

Official Link for PyPDF2 https://pypi.org/project/PyPDF2/

How To Read PDF Files In Python Using PyPDF2 Library

Step 1- Install PyPDF2

pip install PyPDF2

Step 2- Write the below code which can help you read pdf

import PyPDF2
#Open File in read binary mode
file=open("sample.pdf","rb")

# pass the file object to PdfFileReader
reader=PyPDF2.PdfFileReader(file)

# getPage will accept index
page1=reader.getPage(0)

# numPage will return number of pages in pdf
print(reader.numPages)

#extractText will return the text
pdfData=page1.extractText()

#print the data
print(pdfData)

page2=reader.getPage(1)

print("Data from page 2",page2.extractText())

Add assert to verify the PDF content

import PyPDF2

file=open("sample.pdf","rb")

reader=PyPDF2.PdfFileReader(file)

page1=reader.getPage(1)

pdfData=page1.extractText()

print(pdfData)

# asserting the keyword in PDFData which is retured from PDF
assert "boring" in pdfData

assert "Mukesh" in pdfData

I hope this post was useful to you. Keep learning.

Filed Under: Basic Selenium

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Free Selenium Videos

https://www.youtube.com/watch?v=w_iPCT1ETO4

Search topic

Top Posts & Pages

  • Selenium Webdriver tutorial for beginners
  • How To Fix Eclipse Autocomplete Or Code Suggestion In Eclipse
  • Selenium Webdriver C# Tutorial
  • WHAT ARE YOUR EXPECTATIONS FROM US?

Stay connected via Facebook

Stay connected via Facebook

Archives

Footer

Categories

Recent Post

  • API Testing Using Postman And RestAssured
  • Disable Personalise Your Web Experience Microsoft Edge Prompt In Selenium
  • How To Fix Error: No tests found In Playwright
  • How To Fix Eclipse Autocomplete Or Code Suggestion In Eclipse
  • Best and easy way to Group test cases in selenium

Top Posts & Pages

  • Selenium Webdriver tutorial for beginners
  • How To Fix Eclipse Autocomplete Or Code Suggestion In Eclipse
  • Selenium Webdriver C# Tutorial
  • WHAT ARE YOUR EXPECTATIONS FROM US?