How to extract table data from PDF as CSV from the command line?

前端 未结 5 1996
生来不讨喜
生来不讨喜 2021-02-02 12:13

I want to extract all rows from here while ignoring the column headers as well as all page headers, i.e. Supported Devices.

pdftotext -layout DAC06         


        
5条回答
  •  野性不改
    2021-02-02 12:29

    As Martin R commented, tabula-java is the new version of tabula-extractor and active. 1.0.0 was released on July 21st, 2017.

    Download the jar file and with the latest java:

    java -jar ./tabula-1.0.0-jar-with-dependencies.jar \
        --pages=all \
        ./DAC06E7D1302B790429AF6E84696FCFAB20B.pdf
        > support_devices.csv
    

提交回复
热议问题