I use Hpricot on Ruby. As an example this is a snippet of code that I use to retrieve all book titles from the six pages of my HireThings account (as they don't seem to provide a single page with this information):
pagerange = 1..6
proxy = Net::HTTP::Proxy(proxy, port, user, pwd)
proxy.start('www.hirethings.co.nz') do |http|
pagerange.each do |page|
resp, data = http.get "/perth_dotnet?page=#{page}"
if resp.class == Net::HTTPOK
(Hpricot(data)/"h3 a").each { |a| puts a.innerText }
end
end
end
It's pretty much complete. All that comes before this are library imports and the settings for my proxy.