问题
I am trying to access historical google page rankings or alexa rankings over time to add some weightings on a search engine I am making for fun. This would be a separate function that I would call in Python (ideally) and pass in the paramaters of the URL and how long I wanted to get the average over, measured in days and then I could just use that information to weight my results!
I think it could be fun to work on, but I also feel that this may be easy to do with some trick of the APIs some guru might be able to show me and save me a few sleepless weeks! Can anyone help?
Thanks a lot !
回答1:
If you look at the Alexa page for stack overflow you can see that next to the global rank it offers change of the site's rank over the past three months. This may not be down to the level of granularity that you would like, but you could scrape out this information relatively easily and I doubt that you would gain much additional information from looking at changes of different lengths of time. The long term answer is to collect and store the rankings yourself so that you have a historical record going forward.
Update: Here is sample code.
import mechanize
import cookielib
from BeautifulSoup import BeautifulSoup
def changerankscrapper(site):
"""
Takes a site url, scrapes that site's Alexa page,
and returns the site's global Alexa rank and the
change in that rank over the past three months.
"""
#Create Alexa URL
url = "http://www.alexa.com/siteinfo/" + site
#Get HTML
cj = cookielib.CookieJar()
mech = mechanize.OpenerFactory().build_opener(mechanize.HTTPCookieProcessor(cj))
request = mechanize.Request(url)
response = mech.open(request)
html = response.read()
#Parse HTML with BeautifulSoup
soup = BeautifulSoup(html)
globalrank = int(soup.find("strong", { "class" : "metricsUrl font-big2 valign" }).text)
changerank = int(soup.find("span", { "class" : "change-wrapper change-up" }).text)
return globalrank, changerank
#Example
site = "http://stackoverflow.com/"
globalrank, changerank = changerankscrapper(site)
print(globalrank)
print(changerank)
回答2:
I know four services or databases which provide online access to the historical alexa ranking data. You may want to check, if necessary information for your site is available:
- http://www.rank2traffic.com/
- http://siterankdata.com/
- http://alexarankhistory.com/
- http://www.alexarankchart.com/
Hope it helps!
回答3:
Alexa (via AWS) charges to use their API to access Alexa rankings. The charge per query is micro so you can get hundreds of thousands of ranks relatively cheaply. I used to run a few search directories that indexed Alexa rankings over time, so I have experience with this. The point is, you're being evil by scraping vast amounts of data when you can pay for the legitimate service.
Regarding PageRank... Google do not provide a way to access this data. The sites that offer to show your PageRank use a trick to get the PageRank via the Google Toolbar. So again, this is not legitimate, and I wouldn't count on it for long-term data mining, especially not in bulk quantities.
Besides, PageRank counts for very little these days, since Google now relies on about 200 other factors to rank search results, as opposed to just measuring sites' link authority.
回答4:
What kind of google rankings you want to get access to? If it is Alexa global rank you will need to buy the api access – they give a trial period so you can get it and test it. If you look for PageRank you can go to timer4web.com, I am not sure if they provide api, but you can ask them.
Regards Kate
来源:https://stackoverflow.com/questions/19215815/possible-to-get-alexa-information-or-google-page-rankings-over-time