Check out the Natural Language Toolkit. Its a very useful python library if you're doing any text-processing.
Then look at this paper by HP Luhn (1958). It describes a naive but effective method of generating summaries of text.
Use the nltk.probability.FreqDist object to track how often words appear in text and then score sentences according to how many of the most frequent words appear in them. Then select the sentences with the best scores and voila, you have a summary of the document.
I suspect the NLTK should have a means of loading documents from the web and getting all of the HTML tags out of the way. I haven't done that kind of thing myself, but if you look up the corpus readers you might find something helpful.