How to convert PDF to Excel or CSV in Rails 4

前端 未结 3 2065
醉梦人生
醉梦人生 2021-01-06 15:43

I have searched a lot. I have no choice unless asking this here. Do you guys know an online convertor which has API or Gem/s that can convert PDF to Excel or CSV file?

3条回答
  •  孤街浪徒
    2021-01-06 16:05

    Ok, After lots of research I couldn't find an API or even a proper software that does it. Here how I did it.

    I first extract the Table out of the PDF into the Table with this API pdftables. It is cheap.

    Then I convert the HTML table to CSV.

    (This is not ideal but it works)

    Here is the code:

    require 'httmultiparty'
    class PageTextReceiver
      include HTTMultiParty
      base_uri 'http://localhost:3000'
    
      def run
        response = PageTextReceiver.post('https://pdftables.com/api?key=myapikey', :query => { f: File.new("/path/to/pdf/uploaded_pdf.pdf", "r") })
    
        File.open('/path/to/save/as/html/response.html', 'w') do |f|
          f.puts response
        end
      end
    
      def convert
        f = File.open("/path/to/saved/html/response.html")
        doc = Nokogiri::HTML(f)
        csv = CSV.open("path/to/csv/t.csv", 'w',{:col_sep => ",", :quote_char => '\'', :force_quotes => true})
        doc.xpath('//table/tr').each do |row|
          tarray = []
          row.xpath('td').each do |cell|
            tarray << cell.text
          end
          csv << tarray
        end
        csv.close
      end
    end
    

    Now Run it like this:

    #> page = PageTextReceiver.new
    #> page.run
    #> page.convert
    

    It is not refactored. Just proof of concept. You need to consider performance.

    I might use the gem Sidekiq to run it in background and move the result to the main thread.

提交回复
热议问题