Heritrix has a bit of a steep learning curve, but can be configured in such a way that only the homepage, and a page that "looks like" (using a regex filter) an about page will get crawled.
More open source Java (web) crawlers: http://java-source.net/open-source/crawlers