I'm new in web programming. I want to build a crawler for crawling the social graph in Foursquare by Python.
I've got a "manually" controlled crawler by using the apiv2
library. The main method is like:
def main():
CODE = "******"
url = "https://foursquare.com/oauth2/authenticate?client_id=****&response_type=code&redirect_uri=****"
key = "***"
secret = "****"
re_uri = "***"
auth = apiv2.FSAuthenticator(key, secret, re_uri)
auth.set_token(code)
finder = apiv2.UserFinder(auth)
#DO SOME REQUIRES By USING THE FINDER
finder.finde(ANY_USER_ID).mayorships()
bla bla bla
The problem is that at present, I have to type the URL in my browser and pick up the CODE from the redirect URL, and then update the CODE in my program, and run it again. I think there might be some way that I can code the CODE taking progress into my current program and make it automatic.
Any instruction or sample code is appreciated.
You should check out the python-oauth2 module. It seems to be the most stable thing out there.
In particular, this blog post has a really good run down on how to do Oauth easily with Python. The example code uses the Foursquare API, so I would check that out first.
I recently had to get oauth working with Dropbox, and wrote this module containing the necessary steps to do oauth exchange.
For my system, the simplest thing I could think of was to pickle
the Oauth client. My blog package just deserialized the pickled client and requested endpoints with the following function:
get = lambda x: client.request(x, 'GET')[1]
Just makes sure your workers have this client object and you should be good to go :-)
Get your app authenticated by oauth2 first. This is an example of how to use oauth for twitter authentication. http://popdevelop.com/2010/07/an-example-on-how-to-use-oauth-and-python-to-connect-to-twitter/
Similarly, you can find more examples, at https://code.google.com
Then you can use BeautifulSoup or lxml for html parsing. You can extract relevant data from page source that you will get after your request is complete.
BeautifulSoup Documentation - http://www.crummy.com/software/BeautifulSoup/
To download images, videos, etc you can use openers. Read more about openers on http://docs.python.org/library/urllib2.html
You don't have to do it every time. They'll give you a token
that is good for X hours/day. Eventually you'll get 403 http code and you'll need to re-authenticate
来源:https://stackoverflow.com/questions/9038690/how-to-build-a-python-crawler-for-websites-using-oauth2