Python scripting
Performance Assessment 3.5 – Web scraping and reading PDF files
Task 1 – Pseudocode
Now you are going to create pseudocode for three functions readwebpage, soup, and output quotes along with the main program. These functions should work together to read, write and append a line onto a file.
Let’s start with the pseudocode
Function readwebpage
End.
Function parsehtml
End.
Function outputquotes
End.
Program Main
End.
Deliverables for Task 1
· Pseudocode to read in and write out data to a file
Task 2 – Writing the program for webscraping
Write a program to read in a web page, process the data, and write out the quotes to the screen called <Your Name>_PA32 that will scrape the webpage https://quotes.toscrape.com/page/2/ onto your program screen. Make sure to include your student id in the first print statement of the program and output the parsed quotes with their authors.
Create a function named readwebpage which opens the url https://quotes.toscrape.com/page/2/ parses the data using a second function and the outputs the quotes and authors using a 3rd function.
Take a screenshot of your completed program and another of your output.
Deliverables for Task 2
· Screenshot of your completed program and the output
Task 3 – Writing the program to read PDFs
Write a program to pull text information from a PDF document including 2 functions. Pull the data data from the pdf file USCensus.pdf into a text file called USCensus_Output.txt.
Take a screenshot of your completed program and another of your output.
Deliverables for Task 3
· Screenshot of your completed program and the output