问题
I've been writing a program which extracts data from web searches. To get more data, I'd ideally like to extract more results per query through a script (let's say 100 or so).
My question is, is there a way to modify the URL for Google, Yahoo, or Bing (preference in that order) so that I can get more than 10 results per query?
For Google, appending &num=99
used to work at one point but no longer works :(
I saw a similar append of &count=50
but that didn't work on any of the search engines either.
回答1:
The reason num=99
doesn't work for Google is because the num
parameter's actual value isn't used, but is instead compared to a list of allowed values.
The allowed values are 10, 20, 30, 40, 50, and 100. Any other values for this field are ignored.
For Bing, the parameter is count=##
where ## can be anything from 1-100.
For Yahoo, the parameter is n=##
where ## can be anything from 1-100.
In most cases, the URL parameter will only work if the users hasn't specified the number of search results to show in the search engine's search settings. Otherwise, that cookie will take precedence.
回答2:
I don't know what programming language you're using, but the general idea is to load the google search page with the proper cookie settings (that is how they are stored at the time of this writing).
You can set and then view cookies in Google Chrome. To avoid unnecessary cookies, start by opening a new incognito window (Ctrl+Shift+N), and navigating to the search settings (https://www.google.com/preferences).
At the time of writing, you will want to check "Never show instant results", and then adjust the slider of "Results per page" to whatever value you want. After hitting "Save" at the bottom, you can now view your cookies by opening the developer console (Ctrl+Shift+J), and navigating to the resource tab.
Again, at the time of writing, Google sets two variables, NID
and PREF
. PREF
is the one we're interested in to get the search results to change. An example of what it may look like:
ID=8155cce71859f7d0:U=fe6e69e174148b7b:FF=0:LD=en:NR=40:TM=1379366492:LM=1379366586:SG=2:S=FoybwBhek8noyp0t
(This key fetches 40 results as indicated by NR=40
)
With this key (PREF
) and value for it (as seen above), you can send the cookie when requesting a page via wget, curl, etc. In my most recent project related to this, I was using node with the requests library.
Here is a snippet on how you may go about fetching a Google page with 40 results (modified example from the requests documentation):
var j = request.jar();
var cookie = request.cookie('PREF=ID=8155cce71859f7d0:U=fe6e69e174148b7b:FF=0:LD=en:NR=40:TM=1379366492:LM=1379366586:SG=2:S=FoybwBhek8noyp0t');
j.add(cookie);
request({url: 'https://www.google.com/search', jar: j},
function(error, response, body) {
// do something with the body (html) of the page!
});
Or take a look at the man pages for wget / curl. I know that wget specifies a --load-cookies
flag that you can use.
You can apply this to any other cookie-based website that you need content from. Yahoo! uses cookie based settings - I'm not sure what Bing uses.
回答3:
Add &n=100
to links. Get page with 100 results
来源:https://stackoverflow.com/questions/17660910/getting-more-search-results-per-page-via-url