Given a list of urls, I would like to check that each url:
The
I actually wrote something in PHP that does this over a database of 5k+ URLs. I used the PEAR class HTTP_Request, which has a method called getResponseCode(). I just iterate over the URLs, passing them to getResponseCode and evaluate the response.
However, it doesn't work for FTP addresses, URLs that don't begin with http or https (unconfirmed, but I believe it's the case), and sites with invalid security certificates (a 0 is not found). Also, a 0 is returned for server-not-found (there's no status code for that).
And it's probably easier than cURL as you include a few files and use a single function to get an integer code back.
Look into cURL. There's a library for PHP.
There's also an executable version of cURL so you could even write the script in bash.
You only need a bash script to do this. Please check my answer on a similar post here. It is a one-liner that reuses HTTP connections to dramatically improve speed, retries n times for temporary errors and follows redirects.