PHP file_exists() for URL/robots.txt returns false

馋奶兔 提交于 2019-12-11 08:26:19

问题


I tryed to use file_exists(URL/robots.txt) to see if the file exists on randomly chosen websites and i get a false response;

How do i check if the robots.txt file exists ?

I dont want to start the download before i check.

Using fopen() will do the trick ? because : Returns a file pointer resource on success, or FALSE on error.

and i guess that i can put something like:

$f=@fopen($url,"r"); 
if($f) ...

my code:

http://www1.macys.com/robots.txt maybe it's not there http://www.intend.ro/robots.txt maybe it's not there http://www.emag.ro/robots.txt maybe it's not there http://www1.bloomingdales.com/robots.txt maybe it's not there

try {
            if (file_exists($file)) 
                {
                echo 'exists'.PHP_EOL;
                $curl_tool = new CurlTool();
                $content = $curl_tool->fetchContent($file);
                //if the file exists on local disk, delete it
                if (file_exists(CRAWLER_FILES . 'robots_' . $website_id . '.txt'))
                    unlink(CRAWLER_FILES . 'robots_' . $website . '.txt');
                echo CRAWLER_FILES . 'robots_' . $website_id . '.txt', $content . PHP_EOL;
                file_put_contents(CRAWLER_FILES . 'robots_' . $website_id . '.txt', $content);
            }
            else
            {
                echo 'maybe it\'s not there'.PHP_EOL;
            }
        } catch (Exception $e) {
            echo 'EXCEPTION ' . $e . PHP_EOL;
        }

回答1:


file_exists cannot be used on resources on another websites. It's intended for local filesystem. Have a look here on how to perform the check properly.

As other have mentioned in the comments and as the link says it's (probably) easiest to use get_headers function to do this:

try {
    if (strpos(get_headers($url,1),"404")!==FALSE){
        ... your code ...
    } else {
        ... you get the idea ...
    }
}



回答2:


Just to second what other people said,

it's best to use cURL in php to find out if that http://example.com/robots.txt returns a 404 status code. If it does, then the file does not exist. If it returns a 200 it means it exists.

Be wary of custom 404 pages though, I'm never looked to find out what they return.




回答3:


The http:// wrapper does not support stat() functionality, which file_exists() needs; you will need to check the HTTP response code from e.g. cURL.

As of PHP 5.0.0, this function can also be used with some URL wrappers. Refer to Supported Protocols and Wrappers to determine which wrappers support stat() family of functionality.



来源:https://stackoverflow.com/questions/11966187/php-file-exists-for-url-robots-txt-returns-false

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!