Using proxies is, by far, the most common way to tackle this problem. There are other higher-level solutions that provide a sort of "page downloading as a service" guaranteeing you get "clean" pages (not 404s, etc). One of these is called Crawlera (provided by my company) but there may be others.