Method to detect a parked page?

不打扰是莪最后的温柔 提交于 2019-11-30 03:46:32

Here is a test that I think may catch a decent number of them. It takes advantage of the fact you don't actually want to have real web sites up for your parked domains. It looks for the wildcarding of both subdomain and path. Lets say we have this URL in our system

http://www.example.com/method-to-detect-parked.

First I would check the actual URL and hash it or grab a copy for comparison.

My second check would be to

http://random.example.com/random

If it matches the original link or even succeeds, you have a pretty good indicator that the page is parked. If it fails I might check both the subdomain and path individually. If the page randomly changes some elements, you may want to choose a few items to compare. For example make a list of links included in the page and compare those or maybe the title tag.

I would say that you'll have to examine the WHOIS records for the sites in question and/or the actual content of the pages and develop some heuristics as to what constitutes a "parked page".

Take goooogle.com, looking at their WHOIS record shows that they are owned by "Privacy Protection" and that their DNS servers are ns1/ns2.fastpark.net. If you look at the source for the site, they're silly enough to have a CSS file named "style_park.css" :)

All in all, I don't think you'll be able to come up with a generic way to do it. You'll probably end up with some ever evolving rule base or blacklist

You could just rely on your users to "Report this link"... which would put it into a queue to review later?

Look at the creation date of the dns/whois record, and compare it to the add date of the link. If the DNS is newer, that's a link that needs manual checking.

Or: check http://example.com/ and http://example.com/xxxxxxrandomstringxxxxx . If those two pages are identical, you've got some sort of problem that needs manual checking. Either the primary page you wanted to link to is broken, or the domain is parked and all pages return the same value. This test is not 100%, because some parked pages echo back elements from the URL.

If you just want to check an existing website, a service like http://www.linkalarm.com/ does this well.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!