I would like to do some analysis on some properties listed in an upcoming auction. Unfortunately, the city running the auction does not publish the information in a structured f
Convert to text with Xpdf using command pdftotext
.
I converted your file with the following:
pdftottext.exe -layout -f 23 -l 510 AuctionBook2013.pdf AuctionBook2013.txt
This conversion leaves text exactly in its original layout (due to -layout
option). Options -f
and -l
indicate the first and last page numbers of the range of pages to extract.
From there, parsing should be simple -- a number in column 8 indicates the first line of a record, a blank line ends the record. Follow the guide for the exact positioning of elements within a record.