can python be used to scrape unstructured and unformatted text in a pdf document and take it to excel? I also have the pdfs in html format, I am also open to suggestions for oth