Well
Mainly you have to separate the 'scraper'/crawler the python lib/program/function that will download the files/data from the webserver and the Parser that will read this data and interpret the data.
In my case I had to scrap and get some govt info that is 'open' but not download/data friendly.
For this project I used scrapy[1].
Mainly I set the 'starter_urls' that are the urls my robot will crawl/get and after I use a function 'parser' to retrieve/parse this data.
For parsing/retrieving you are going to need some html,lxml extractor as the 90% of your data will be that.
Now focusing in your question:
For data crawling
- Scrapy
- Requests [2]
- Urllib [3]
For parsing data
- Scrapy/lxml or scrapy+other
- lxml[4]
- beatiful soup [5]
And please remember 'crawling' and scrapping is not only for web, emails too. you can check another question about that here [6]
[1] = http://scrapy.org/
[2] - http://docs.python-requests.org/en/latest/
[3] - http://docs.python.org/library/urllib.html
[4] - http://lxml.de/
[5] - http://www.crummy.com/software/BeautifulSoup/
[6] - Python read my outlook email mailbox and parse messages