I often have to work with fragile legacy websites that break in unexpected ways when logic or configuration are updated.
I don't have the time or knowledge of the system needed to create a Selenium script. Besides, I don't want to check a specific use case - I want to verify every link and page on the site.
I would like to create an automated system test that will spider through a site and check for broken links and crashes. Ideally, there would be a tool that I could use to achieve this. It should have as many as possible of the following features, in descending order of priority:
- Triggered via script
- Does not require human interaction
- Follows all links including anchor tags and links to CSS and js files
- Produces a log of all found 404s, 500s etc.
- Can be deployed locally to check sites on intranets
- Supports cookie/form-based authentication
- Free/Open source
There are many partial solutions out there, like FitNesse, Firefox's LinkChecker and the W3C link checker, but none of them do everything I need.
I would like to use this test with projects using a range of technologies and platforms, so the more portable the solution the better.
I realise this is no substitute for proper system testing, but it would be very useful if I had a convenient and automatable way of verifying that no part of the site was obviously broken.
I use Xenu's Link Sleuth for this sort of thing. Quickly check for no deadlinks etc. on a/any site. Just point it at any URI and it'll spider all links on that site.
Desription from site:
Xenu's Link Sleuth (TM) checks Web sites for broken links. Link verification is done on "normal" links, images, frames, plug-ins, backgrounds, local image maps, style sheets, scripts and java applets. It displays a continously updated list of URLs which you can sort by different criteria. A report can be produced at any time.
It meets all you're requirements apart from being scriptable as it's a windows app that requires manually starting.
We use and really like Linkchecker:
http://wummel.github.io/linkchecker/
It's open-source, Python, command-line, internally deployable, and outputs to a variety of formats. The developer has been very helpful when we've contacted him with issues.
We have a Ruby script that queries our database of internal websites, kicks off LinkChecker with appropriate parameters for each site, and parses the XML that LinkChecker gives us to create a custom error report for each site in our CMS.
What part of your list does the W3C link checker not meet? That would be the one I would use.
Alternatively, twill (python-based) is an interesting little language for this kind of thing. It has a link checker module but I don't think it works recursively, so that's not so good for spidering. But you could modify it if you're comfortable with that. And I could be wrong, there might be a recursive option. Worth checking out, anyway.
You might want to try using wget for this. It can spider a site including the "page requisites" (i.e. files) and can be configured to log errors. I don't know if it will have enough information for you but it's Free and available on Windows (cygwin) as well as unix.
InSite is a commercial program that seems to do what you want (haven't used it).
If I was in your shoes, I'd probably write this sort of spider myself...
I'm not sure that it supports form authentication but it will handle cookies if you can get it going on the site and otherwise I think Checkbot will do everything on your list. I've used as a step in build process before to check that nothing broken on a site. There's an example output on the website.
I have always liked linklint for checking links on a site. However, I don't think it meets all your criteria, particularly the aspects that may be JavaScript dependent. I also think it will miss the images called from inside CSS.
But for spidering all anchors, it works great.
Try SortSite. It's not free, but seems to do everything you need and more.
Alternatively, PowerMapper from the same company has a similar-but-different approach. The latter will give you less information about detailed optimisation of your pages, but will still identify any broken links, etc.
Disclaimer: I have a financial interest in the company that makes these products.
Try http://www.thelinkchecker.com it is an online application that checks number of outgoing links, page rank , anchor, number of outgoing links. I think this is the solution you need.
来源:https://stackoverflow.com/questions/1596518/automated-link-checker-for-system-testing