Prevent loading from remote source if file is larger than a given size

爱⌒轻易说出口 提交于 2019-12-06 10:42:59

问题


Let's say I want XML Files only with upto 10MB to be loaded from a remote server.

Something like

$xml_file = "http://example.com/largeXML.xml";// size= 500MB

//PRACTICAL EXAMPLE: $xml_file = "http://www.cs.washington.edu/research/xmldatasets/data/pir/psd7003.xml";// size= 683MB

 /*GOAL: Do anything that can be done to hinder this large file from being loaded by the DOMDocument without having to load the File n check*/

$dom =  new DOMDocument();

$dom->load($xml_file /*LOAD only IF the file_size is <= 10MB....else...echo 'File is too large'*/);

How can this possibly be achieved?.... Any idea or alternative? or best approach to achieving this would be highly appreciated.

I checked PHP: Remote file size without downloading file but when I try with something like

var_dump(
    curl_get_file_size(
        "http://www.dailymotion.com/rss/user/dialhainaut/"
    )
);

I get string 'unknown' (length=7)

When I try with get_headers as suggested below, the Content-Length is missing in the headers, so this will not work reliably either.

Please kindly advise how to determine the length and avoid sending it to the DOMDocument if it exceeds 10MB


回答1:


Ok, finally working. The headers solution was obviously not going to work broadly. In this solution, we open a file handle and read the XML line by line until it hits the threshold of $max_B. If the file is too big, we still have the overhead of reading it up until the 10MB mark, but it's working as expected. If the file is less than $max_B, it proceeds...

$xml_file = "http://www.dailymotion.com/rss/user/dialhainaut/";
//$xml_file = "http://www.cs.washington.edu/research/xmldatasets/data/pir/psd7003.xml";

$fh = fopen($xml_file, "r");  

if($fh){
    $file_string = '';
    $total_B = 0;
    $max_B = 10485760;
    //run through lines of the file, concatenating them into a string
    while (!feof($fh)){
        if($line = fgets($fh)){
            $total_B += strlen($line);
            if($total_B < $max_B){
                $file_string .= $line;
            } else {
                break;
            }
        }
    } 

    if($total_B < $max_B){
        echo 'File ok. Total size = '.$total_B.' bytes. Proceeding...';
        //proceed
        $dom = new DOMDocument();
        $dom->loadXML($file_string); //NOTE the method change because we're loading from a string   

    } else {
        //reject
        echo 'File too big! Max size = '.$max_B.' bytes.';  
    }

    fclose($fh);

} else {
    echo '404 file not found!';
}



回答2:


10MB is equal to 10485760 B. If content-length is not specified, it will use curl which is available since php5. I got this source from somewhere in SO but could not remember it.:

function get_filesize($url) {
    $headers = get_headers($url, 1);
    if (isset($headers['Content-Length'])) return $headers['Content-Length'];
    if (isset($headers['Content-length'])) return $headers['Content-length'];
    $c = curl_init();
    curl_setopt_array($c, array(
        CURLOPT_URL => $url,
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_HTTPHEADER => array('User-Agent: Mozilla/5.0 
         (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.3) 
          Gecko/20090824 Firefox/3.5.3'),
        ));
    curl_exec($c);
    return curl_getinfo($c, CURLINFO_SIZE_DOWNLOAD);
    }
}
    $filesize = get_filesize("http://www.dailymotion.com/rss/user/dialhainaut/");
    if($filesize<=10485760){
        echo 'Fine';
    }else{
       echo $filesize.'File is too big';
    }    

.

Check demo here




回答3:


Edit: New Answer a bit workaroundish:
You can't check the Dom Elements Length, BUT, you can make a header request and get the filesize from the URL:

<?php

function i_hope_this_works( $XmlUrl ) {
    //lets assume we fk up so we set size to -1  
    $size = -1;

      $request = curl_init( $XmlUrl );

      // Go for a head request, so the body of a 1 gb file will take the same as 1 kb
      curl_setopt( $request, CURLOPT_NOBODY, true );
      curl_setopt( $request, CURLOPT_HEADER, true );
      curl_setopt( $request, CURLOPT_RETURNTRANSFER, true );
      curl_setopt( $request, CURLOPT_FOLLOWLOCATION, true );
      curl_setopt( $request, CURLOPT_USERAGENT, get_user_agent_string() );

      $requesteddata = curl_exec( $request );
      curl_close( $request );

      if( $requesteddata ) {
        $content_length = "unknown";
        $status = "unknown";

        if( preg_match( "/^HTTP\/1\.[01] (\d\d\d)/", $requesteddata, $matches ) ) {
          $status = (int)$matches[1];
        }

        if( preg_match( "/Content-Length: (\d+)/", $requesteddata, $matches ) ) {
          $content_length = (int)$matches[1];
        }

        // you can google status qoutes 200 is Ok for example
        if( $status == 200 || ($status > 300 && $status <= 308) ) {
          $result = $content_length;
        }
      }

      return $result;
    }
    ?>

You should now be able to get every Filesize you want by URL just with

$file_size = i_hope_this_works('yourURLasString')


来源:https://stackoverflow.com/questions/36761377/prevent-loading-from-remote-source-if-file-is-larger-than-a-given-size

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!