Scraping pages that do not seem to have URLs

前端未结

关注

 1  1386

I\'m trying to scrape these listings and provide more exposure for these job listings on a site that belongs to a client of mine. The issue is that I need to be able to link to

相关标签:

1条回答

爱一瞬间的悲伤

2021-01-23 09:11

The page you refer to appears to be generated by an Oracle product, so one would think they'd be willing to construct a web form properly (and with reference to accessibility concerns). They haven't, so it occurs to me that either their engineer was having a bad day, or they are deliberately making it (slightly) harder to scrape.

The reason your browser shows no href when you hover over those links is that there isn't one. What the page does instead is to use JavaScript to capture the click event, populate a POST form with some hidden values, and call the submit method programmatically. This can cause problems with screen-readers and other accessibility devices, as well as causing problems with the way in which back buttons have to re-submit the page.

The good news is that constructions of this kind can usually be scraped by creating a form yourself, either using a real one on a third party page, or via a crawler library. If you post the right values to the target URI, reverse-engineered from examining the page's script, the resulting document should be the "linked" page you expect.

0 讨论(0)
发布评论:

提交评论
- 加载中...