问题
I tryed to use file_exists(URL/robots.txt) to see if the file exists on randomly chosen websites and i get a false response;
How do i check if the robots.txt file exists ?
I dont want to start the download before i check.
Using fopen() will do the trick ? because : Returns a file pointer resource on success, or FALSE on error.
and i guess that i can put something like:
$f=@fopen($url,"r");
if($f) ...
my code:
http://www1.macys.com/robots.txt maybe it's not there http://www.intend.ro/robots.txt maybe it's not there http://www.emag.ro/robots.txt maybe it's not there http://www1.bloomingdales.com/robots.txt maybe it's not there
try {
if (file_exists($file))
{
echo 'exists'.PHP_EOL;
$curl_tool = new CurlTool();
$content = $curl_tool->fetchContent($file);
//if the file exists on local disk, delete it
if (file_exists(CRAWLER_FILES . 'robots_' . $website_id . '.txt'))
unlink(CRAWLER_FILES . 'robots_' . $website . '.txt');
echo CRAWLER_FILES . 'robots_' . $website_id . '.txt', $content . PHP_EOL;
file_put_contents(CRAWLER_FILES . 'robots_' . $website_id . '.txt', $content);
}
else
{
echo 'maybe it\'s not there'.PHP_EOL;
}
} catch (Exception $e) {
echo 'EXCEPTION ' . $e . PHP_EOL;
}
回答1:
file_exists
cannot be used on resources on another websites. It's intended for local filesystem. Have a look here on how to perform the check properly.
As other have mentioned in the comments and as the link says it's (probably) easiest to use get_headers
function to do this:
try {
if (strpos(get_headers($url,1),"404")!==FALSE){
... your code ...
} else {
... you get the idea ...
}
}
回答2:
Just to second what other people said,
it's best to use cURL in php to find out if that http://example.com/robots.txt returns a 404 status code. If it does, then the file does not exist. If it returns a 200 it means it exists.
Be wary of custom 404 pages though, I'm never looked to find out what they return.
回答3:
The http:// wrapper does not support stat() functionality, which file_exists() needs; you will need to check the HTTP response code from e.g. cURL.
As of PHP 5.0.0, this function can also be used with some URL wrappers. Refer to Supported Protocols and Wrappers to determine which wrappers support stat() family of functionality.
来源:https://stackoverflow.com/questions/11966187/php-file-exists-for-url-robots-txt-returns-false