I second the recommendation for python (or Beautiful Soup). I'm currently in the middle of a small screen-scraping project using python, and python 3's automatic handling of things like cookie authentication (through CookieJar and urllib) are greatly simplifying things. Python supports all of the more advanced features you might need (like regexes), as well as having the benefit of being able to handle projects like this quickly (not too much overhead in dealing with low level stuff). It's also relatively cross-platform.