How do you get the HTTP status code for a remote domain in php?

后端 未结 5 1311
不思量自难忘°
不思量自难忘° 2020-12-31 23:32

I would like to create a batch script, to go through 20,000 links in a DB, and weed out all the 404s and such. How would I get the HTTP status code for a remote url?

相关标签:
5条回答
  • 2020-12-31 23:53

    CURL would be perfect but since you don't have it, you'll have to get down and dirty with sockets. The technique is:

    1. Open a socket to the server.
    2. Send an HTTP HEAD request.
    3. Parse the response.

    Here is a quick example:

    <?php
    
    $url = parse_url('http://www.example.com/index.html');
    
    $host = $url['host'];
    $port = $url['port'];
    $path = $url['path'];
    $query = $url['query'];
    if(!$port)
        $port = 80;
    
    $request = "HEAD $path?$query HTTP/1.1\r\n"
              ."Host: $host\r\n"
              ."Connection: close\r\n"
              ."\r\n";
    
    $address = gethostbyname($host);
    $socket = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
    socket_connect($socket, $address, $port);
    
    socket_write($socket, $request, strlen($request));
    
    $response = split(' ', socket_read($socket, 1024));
    
    print "<p>Response: ". $response[1] ."</p>\r\n";
    
    socket_close($socket);
    
    ?>
    

    UPDATE: I've added a few lines to parse the URL

    0 讨论(0)
  • 2020-12-31 23:53

    This page looks like it has a pretty good setup to download a page using either curl or fsockopen, and can get the HTTP headers using either method (which is what you want, really).

    After using that method, you'd want to check $output['info']['http_code'] to get the data you want.

    Hope that helps.

    0 讨论(0)
  • 2021-01-01 00:07

    If im not mistaken none of the php built-in functions return the http status of a remote url, so the best option would be to use sockets to open a connection to the server, send a request and parse the response status:

    pseudo code:

    parse url => $host, $port, $path
    $http_request = "GET $path HTTP/1.0\nHhost: $host\n\n";
    $fp = fsockopen($host, $port, $errno, $errstr, $timeout), check for any errors
    fwrite($fp, $request)
    while (!feof($fp)) {
       $headers .= fgets($fp, 4096);
       $status = <parse $headers >
       if (<status read>)
         break;
    }
    fclose($fp)
    

    Another option is to use an already build http client class in php that can return the headers without fetching the full page content, there should be a few open source classes available on the net...

    0 讨论(0)
  • 2021-01-01 00:08

    You can use PEAR's HTTP::head function.
    http://pear.php.net/manual/en/package.http.http.head.php

    0 讨论(0)
  • 2021-01-01 00:14

    http://www.webmasterworld.com/forum88/12559.htm a quick bit of googling found this link. The most up-to date version is near the bottom.

    0 讨论(0)
提交回复
热议问题