I would like to create a batch script, to go through 20,000 links in a DB, and weed out all the 404s and such. How would I get the HTTP status code for a remote url?
CURL would be perfect but since you don't have it, you'll have to get down and dirty with sockets. The technique is:
Here is a quick example:
<?php
$url = parse_url('http://www.example.com/index.html');
$host = $url['host'];
$port = $url['port'];
$path = $url['path'];
$query = $url['query'];
if(!$port)
$port = 80;
$request = "HEAD $path?$query HTTP/1.1\r\n"
."Host: $host\r\n"
."Connection: close\r\n"
."\r\n";
$address = gethostbyname($host);
$socket = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
socket_connect($socket, $address, $port);
socket_write($socket, $request, strlen($request));
$response = split(' ', socket_read($socket, 1024));
print "<p>Response: ". $response[1] ."</p>\r\n";
socket_close($socket);
?>
UPDATE: I've added a few lines to parse the URL
This page looks like it has a pretty good setup to download a page using either curl or fsockopen, and can get the HTTP headers using either method (which is what you want, really).
After using that method, you'd want to check $output['info']['http_code'] to get the data you want.
Hope that helps.
If im not mistaken none of the php built-in functions return the http status of a remote url, so the best option would be to use sockets to open a connection to the server, send a request and parse the response status:
pseudo code:
parse url => $host, $port, $path
$http_request = "GET $path HTTP/1.0\nHhost: $host\n\n";
$fp = fsockopen($host, $port, $errno, $errstr, $timeout), check for any errors
fwrite($fp, $request)
while (!feof($fp)) {
$headers .= fgets($fp, 4096);
$status = <parse $headers >
if (<status read>)
break;
}
fclose($fp)
Another option is to use an already build http client class in php that can return the headers without fetching the full page content, there should be a few open source classes available on the net...
You can use PEAR's HTTP::head function.
http://pear.php.net/manual/en/package.http.http.head.php
http://www.webmasterworld.com/forum88/12559.htm a quick bit of googling found this link. The most up-to date version is near the bottom.