Saving table data obtained while scraping a webpage using casperjs [closed]

问题

Which would be the best method to save table data obtained while scraping a webpage using casperjs?

Using a json object and store it as a file after serializing.
Using ajax request to php then storing it in a mysql db.

回答1:

For simplicity sake, view CasperJS as a way to getting data & handle it after in another language. I would go for option #1 - get the data in JSON format, and save it to a file to do work on later.

To do this, you can use the File System API that PhantomJS provides. You can also couple this with CasperJS's cli interface to allow you to pass arguments into the script (a temporary file to write to for example).

Your script to handle all of this would look like:

Get temporary file path (mktemp on linux systems).
Call your CasperJS script, passing in that temporary file path as an argument.
Get your data, write it to that file using the File System API, and exit.
Read in the file, do work with it (save to database, etc), remove the temporary file.

回答2:

I just use the second case:

First: get the info stored in a globalInfo variable

var globalInfo;
casper.thenOpen("www.targetpage.cl/valuableInfo", function() {
    globalInfo = this.evaluate(function(){
       var domInfo = {};
       domInfo.title = "this is the info";
       domInfo.body  = "scrap in the dom for info";
       return domInfo;
   });
});

Second: visit a page to store the captured data

casper.then(function(){
   casper.thenOpen("www.mipage.com/saveIntheDBonPost.php", {
      method: 'post',
      data:{              
          'title': ''+globalInfo.title,
          'body': ''+globalInfo.body
      }
   });
});

www.mipage.com/saveIntheDBonPost.php takes the data in the $_POST parameter and stores it to a DB.

来源：https://stackoverflow.com/questions/18945281/saving-table-data-obtained-while-scraping-a-webpage-using-casperjs

标签

javascript

mysql

json

phantomjs

casperjs