问题
I'm new to Ruby on Rails and wanted to create a crawler that scrapes data and inserts it into the database. I'm currently using Heroku so I can't access the database directly and was wondering what the best way to integrate a crawler script into the RoR framework would be. I would be using an hourly or daily cron to run the script.
回答1:
If you are using Rails on Heroku you can just use an ORM adapter like Datamapper or ActiveRecord. This then gives you access to your database but through a layer basically. If you need to send raw sql to the database you can but it's usually not recommended since the ORM's provide pretty much everything you need.
You would basically just create models within your rails application like normal and the associated fields in a table.
rails g model page meta_title:string page_title:string
rake db:migrate # This has to be run on heroku too "heroku rake db:migrate" after you have pushed your code up
Then in your crawler script you can create records by just using your model...
Page.create(:title => crawler[:title], :meta_title => crawler[:meta_title])
Normally you can use Whenever(https://github.com/javan/whenever) to manage your cronjobs but on Heroku I'm not sure how it works since I haven't set any up on Heroku before.
回答2:
I'd suggest 1 of 2 options:
Use a ruby script that uses
require rubygems
along with other helper libraries (like Rails, ActiveRecord, whatever) you want to accomplish the task, and then cron that script.If you're using Rails to also serve web apps, use the machine's hosts file so that a
wget
(or similar) on that machine will properly map requests to that instance of rails; from there, just set it up as a web app, and use thewget
command in your CRON. Not terribly efficient, but if you're just looking for something quick and dirty based on an existing setup, that would work nicely. Just make sure to sendSTDOUT
andSTDERR
to/dev/null
so you don't end up amassing CRON files.
来源:https://stackoverflow.com/questions/5332408/insert-into-rails-database