I\'d like to build a webapp to help other students at my university create their schedules. To do that I need to crawl the master schedules (one huge html page) as well as a lin
I liked using BeatifulSoup for extracting html data
It's as easy as this:
from BeautifulSoup import BeautifulSoup import urllib ur = urllib.urlopen("http://pragprog.com/podcasts/feed.rss") soup = BeautifulSoup(ur.read()) items = soup.findAll('item') urls = [item.enclosure['url'] for item in items]