Ignore all data after References - Python
问题 I am working on a Python project, where I need to process some PDF research papers' data. I'm able to parse papers, extract data from them and identify sections using PyPDF2 . import PyPDF2 pdfFileObj = open('fileName.pdf','rb') pdfReader = PyPDF2.PdfFileReader(pdfFileObj) pageCount = pdfReader.numPages count = 0 text = '' while count < pageCount: pageObj = pdfReader.getPage(count) count +=1 text += pageObj.extractText() Every paper contains References at the end of paper, which I'm able to