问题
I need to get the file size of a file over 2 GB in size. (testing on 4.6 GB file). Is there any way to do this without an external program?
Current status:
filesize()
,stat()
andfseek()
failsfread()
andfeof()
works
There is a possibility to get the file size by reading the file content (extremely slow!).
$size = (float) 0;
$chunksize = 1024 * 1024;
while (!feof($fp)) {
fread($fp, $chunksize);
$size += (float) $chunksize;
}
return $size;
I know how to get it on 64-bit platforms (using fseek($fp, 0, SEEK_END)
and ftell()
), but I need solution for 32-bit platform.
Solution: I've started open-source project for this.
Big File Tools
Big File Tools is a collection of hacks that are needed to manipulate files over 2 GB in PHP (even on 32-bit systems).
- answer: https://stackoverflow.com/a/35233556/631369
- github: https://github.com/jkuchar/BigFileTools
回答1:
Here's one possible method:
It first attempts to use a platform-appropriate shell command (Windows shell substitution modifiers or *nix/Mac stat
command). If that fails, it tries COM (if on Windows), and finally falls back to filesize()
.
/*
* This software may be modified and distributed under the terms
* of the MIT license.
*/
function filesize64($file)
{
static $iswin;
if (!isset($iswin)) {
$iswin = (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN');
}
static $exec_works;
if (!isset($exec_works)) {
$exec_works = (function_exists('exec') && !ini_get('safe_mode') && @exec('echo EXEC') == 'EXEC');
}
// try a shell command
if ($exec_works) {
$cmd = ($iswin) ? "for %F in (\"$file\") do @echo %~zF" : "stat -c%s \"$file\"";
@exec($cmd, $output);
if (is_array($output) && ctype_digit($size = trim(implode("\n", $output)))) {
return $size;
}
}
// try the Windows COM interface
if ($iswin && class_exists("COM")) {
try {
$fsobj = new COM('Scripting.FileSystemObject');
$f = $fsobj->GetFile( realpath($file) );
$size = $f->Size;
} catch (Exception $e) {
$size = null;
}
if (ctype_digit($size)) {
return $size;
}
}
// if all else fails
return filesize($file);
}
回答2:
I've started project called Big File Tools. It is proven to work on Linux, Mac and Windows (even 32-bit variants). It provides byte-precise results even for huge files (>4GB). Internally it uses brick/math - arbitrary-precision arithmetic library.
Install it using composer.
composer install jkuchar/BigFileTools
and use it:
<?php
$file = BigFileTools\BigFileTools::createDefault()->getFile(__FILE__);
echo $file->getSize() . " bytes\n";
Result is BigInteger so you can compute with results
$sizeInBytes = $file->getSize();
$sizeInMegabytes = $sizeInBytes->toBigDecimal()->dividedBy(1024*1024, 2, \Brick\Math\RoundingMode::HALF_DOWN);
echo "Size is $sizeInMegabytes megabytes\n";
Big File Tools internally uses drivers to reliably determine exact file size on all platforms. Here is list of available drivers (updated 2016-02-05)
| Driver | Time (s) ↓ | Runtime requirements | Platform
| --------------- | ------------------- | -------------- | ---------
| CurlDriver | 0.00045299530029297 | CURL extension | -
| NativeSeekDriver | 0.00052094459533691 | - | -
| ComDriver | 0.0031449794769287 | COM+.NET extension | Windows only
| ExecDriver | 0.042937040328979 | exec() enabled | Windows, Linux, OS X
| NativeRead | 2.7670161724091 | - | -
You can use BigFileTools with any of these or fastest available is chosen by default (BigFileTools::createDefault()
)
use BigFileTools\BigFileTools;
use BigFileTools\Driver;
$bigFileTools = new BigFileTools(new Driver\CurlDriver());
回答3:
<?php
######################################################################
# Human size for files smaller or bigger than 2 GB on 32 bit Systems #
# size.php - 1.1 - 17.01.2012 - Alessandro Marinuzzi - www.alecos.it #
######################################################################
function showsize($file) {
if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') {
if (class_exists("COM")) {
$fsobj = new COM('Scripting.FileSystemObject');
$f = $fsobj->GetFile(realpath($file));
$file = $f->Size;
} else {
$file = trim(exec("for %F in (\"" . $file . "\") do @echo %~zF"));
}
} elseif (PHP_OS == 'Darwin') {
$file = trim(shell_exec("stat -f %z " . escapeshellarg($file)));
} elseif ((PHP_OS == 'Linux') || (PHP_OS == 'FreeBSD') || (PHP_OS == 'Unix') || (PHP_OS == 'SunOS')) {
$file = trim(shell_exec("stat -c%s " . escapeshellarg($file)));
} else {
$file = filesize($file);
}
if ($file < 1024) {
echo $file . ' Byte';
} elseif ($file < 1048576) {
echo round($file / 1024, 2) . ' KB';
} elseif ($file < 1073741824) {
echo round($file / 1048576, 2) . ' MB';
} elseif ($file < 1099511627776) {
echo round($file / 1073741824, 2) . ' GB';
} elseif ($file < 1125899906842624) {
echo round($file / 1099511627776, 2) . ' TB';
} elseif ($file < 1152921504606846976) {
echo round($file / 1125899906842624, 2) . ' PB';
} elseif ($file < 1180591620717411303424) {
echo round($file / 1152921504606846976, 2) . ' EB';
} elseif ($file < 1208925819614629174706176) {
echo round($file / 1180591620717411303424, 2) . ' ZB';
} else {
echo round($file / 1208925819614629174706176, 2) . ' YB';
}
}
?>
Use as follow:
<?php include("php/size.php"); ?>
And where you want:
<?php showsize("files/VeryBigFile.rar"); ?>
If you want improve it you are welcome!
回答4:
I found a nice slim solution for Linux/Unix only to get the filesize of large files with 32-bit php.
$file = "/path/to/my/file.tar.gz";
$filesize = exec("stat -c %s ".$file);
You should handle the $filesize
as string. Trying to casting as int results in a filesize = PHP_INT_MAX if the filesize is larger than PHP_INT_MAX.
But although handled as string the following human readable algo works:
formatBytes($filesize);
public function formatBytes($size, $precision = 2) {
$base = log($size) / log(1024);
$suffixes = array('', 'k', 'M', 'G', 'T');
return round(pow(1024, $base - floor($base)), $precision) . $suffixes[floor($base)];
}
so my output for a file larger than 4 Gb is:
4.46G
回答5:
$file_size=sprintf("%u",filesize($working_dir."\\".$file));
This works for me on a Windows Box.
I was looking through the bug log here: https://bugs.php.net/bug.php?id=63618 and found this solution.
回答6:
If you have an FTP server you could use fsockopen:
$socket = fsockopen($hostName, 21);
$t = fgets($socket, 128);
fwrite($socket, "USER $myLogin\r\n");
$t = fgets($socket, 128);
fwrite($socket, "PASS $myPass\r\n");
$t = fgets($socket, 128);
fwrite($socket, "SIZE $fileName\r\n");
$t = fgets($socket, 128);
$fileSize=floatval(str_replace("213 ","",$t));
echo $fileSize;
fwrite($socket, "QUIT\r\n");
fclose($socket);
(Found as a comment on the ftp_size page)
回答7:
you may want to add some alternatives to the function you use such as calling system functions such as "dir" / "ls" and get the information from there. They are subject of security of course, things you can check and eventually revert to the slow method as a last resort only.
回答8:
One option would be to seek to the 2gb mark and then read the length from there...
function getTrueFileSize($filename) {
$size = filesize($filename);
if ($size === false) {
$fp = fopen($filename, 'r');
if (!$fp) {
return false;
}
$offset = PHP_INT_MAX - 1;
$size = (float) $offset;
if (!fseek($fp, $offset)) {
return false;
}
$chunksize = 8192;
while (!feof($fp)) {
$size += strlen(fread($fp, $chunksize));
}
} elseif ($size < 0) {
// Handle overflowed integer...
$size = sprintf("%u", $size);
}
return $size;
}
So basically that seeks to the largest positive signed integer representable in PHP (2gb for a 32 bit system), and then reads from then on using 8kb blocks (which should be a fair tradeoff for best memory efficiency vs disk transfer efficiency).
Also note that I'm not adding $chunksize
to size. The reason is that fread
may actually return more or fewer bytes than $chunksize
depending on a number of possibilities. So instead, use strlen
to determine the length of the parsed string.
回答9:
When IEEE double is used (very most of systems), file sizes below ~4EB (etabytes = 10^18 bytes) do fit into double as precise numbers (and there should be no loss of precision when using standard arithmetic operations).
回答10:
You can't reliably get the size of a file on a 32 bit system by checking if filesize() returns negative, as some answers suggest. This is because if a file is between 4 and 6 gigs on a 32 bit system filesize will report a positive number, then negative from 6 to 8 then positive from 8 to 10 and so on. It loops, in a manner of speaking.
So you're stuck using an external command that works reliably on your 32 bit system.
However, one very useful tool is the ability to check if the file size is bigger than a certain size and you can do this reliably on even very big files.
The following seeks to 50 megs and tries to read one byte. It is very fast on my low spec test machine and works reliably even when the size is much greater than 2 gigs.
You can use this to check if a file is greater than 2147483647 bytes (2147483648 is max int on 32 bit systems) and then handle the file differently or have your app issue a warning.
function isTooBig($file){
$fh = @fopen($file, 'r');
if(! $fh){ return false; }
$offset = 50 * 1024 * 1024; //50 megs
$tooBig = false;
if(fseek($fh, $offset, SEEK_SET) === 0){
if(strlen(fread($fh, 1)) === 1){
$tooBig = true;
}
} //Otherwise we couldn't seek there so it must be smaller
fclose($fh);
return $tooBig;
}
回答11:
Below code works OK for any filesize on any version of PHP / OS / Webserver / Platform.
// http head request to local file to get file size
$opts = array('http'=>array('method'=>'HEAD'));
$context = stream_context_create($opts);
// change the URL below to the URL of your file. DO NOT change it to a file path.
// you MUST use a http:// URL for your file for a http request to work
// SECURITY - you must add a .htaccess rule which denies all requests for this database file except those coming from local ip 127.0.0.1.
// $tmp will contain 0 bytes, since its a HEAD request only, so no data actually downloaded, we only want file size
$tmp= file_get_contents('http://127.0.0.1/pages-articles.xml.bz2', false, $context);
$tmp=$http_response_header;
foreach($tmp as $rcd) if( stripos(trim($rcd),"Content-Length:")===0 ) $size= floatval(trim(str_ireplace("Content-Length:","",$rcd)));
echo "File size = $size bytes";
// example output
File size = 10082006833 bytes
回答12:
I iterated on the BigFileTools class/answer:
-option to disable curl method because some platforms (Synology NAS for example) don't support FTP protocol for Curl
-extra non posix, but more accurate, implementation of sizeExec, instead of size on disk the actual filesize is returned by using stat instead of du
-correct size results for big files (>4GB) and almost as fast for sizeNativeSeek
-debug messages option
<?php
/**
* Class for manipulating files bigger than 2GB
* (currently supports only getting filesize)
*
* @author Honza Kuchař
* @license New BSD
* @encoding UTF-8
* @copyright Copyright (c) 2013, Jan Kuchař
*/
class BigFileTools {
/**
* Absolute file path
* @var string
*/
protected $path;
/**
* Use in BigFileTools::$mathLib if you want to use BCMath for mathematical operations
*/
const MATH_BCMATH = "BCMath";
/**
* Use in BigFileTools::$mathLib if you want to use GMP for mathematical operations
*/
const MATH_GMP = "GMP";
/**
* Which mathematical library use for mathematical operations
* @var string (on of constants BigFileTools::MATH_*)
*/
public static $mathLib;
/**
* If none of fast modes is available to compute filesize, BigFileTools uses to compute size very slow
* method - reading file from 0 byte to end. If you want to enable this behavior,
* switch fastMode to false (default is true)
* @var bool
*/
public static $fastMode = true;
//on some platforms like Synology NAS DS214+ DSM 5.1 FTP Protocol for curl is not working or disabled
// you will get an error like "Protocol file not supported or disabled in libcurl"
public static $FTPProtocolCurlEnabled = false;
public static $debug=false; //shows some debug/error messages
public static $posix=true; //more portable but it shows size on disk not actual filesize so it's less accurate: 0..clustersize in bytes inaccuracy
/**
* Initialization of class
* Do not call directly.
*/
static function init() {
if (function_exists("bcadd")) {
self::$mathLib = self::MATH_BCMATH;
} elseif (function_exists("gmp_add")) {
self::$mathLib = self::MATH_GMP;
} else {
throw new BigFileToolsException("You have to install BCMath or GMP. There mathematical libraries are used for size computation.");
}
}
/**
* Create BigFileTools from $path
* @param string $path
* @return BigFileTools
*/
static function fromPath($path) {
return new self($path);
}
static function debug($msg) {
if (self::$debug) echo $msg;
}
/**
* Gets basename of file (example: for file.txt will return "file")
* @return string
*/
public function getBaseName() {
return pathinfo($this->path, PATHINFO_BASENAME);
}
/**
* Gets extension of file (example: for file.txt will return "txt")
* @return string
*/
public function getExtension() {
return pathinfo($this->path, PATHINFO_EXTENSION);
}
/**
* Gets extension of file (example: for file.txt will return "file.txt")
* @return string
*/
public function getFilename() {
return pathinfo($this->path, PATHINFO_FILENAME);
}
/**
* Gets path to file of file (example: for file.txt will return path to file.txt, e.g. /home/test/)
* ! This will call absolute path!
* @return string
*/
public function getDirname() {
return pathinfo($this->path, PATHINFO_DIRNAME);
}
/**
* Gets md5 checksum of file content
* @return string
*/
public function getMd5() {
return md5_file($this->path);
}
/**
* Gets sha1 checksum of file content
* @return string
*/
public function getSha1() {
return sha1_file($this->path);
}
/**
* Constructor - do not call directly
* @param string $path
*/
function __construct($path, $absolutizePath = true) {
if (!static::isReadableFile($path)) {
throw new BigFileToolsException("File not found at $path");
}
if($absolutizePath) {
$this->setPath($path);
}else{
$this->setAbsolutePath($path);
}
}
/**
* Tries to absolutize path and than updates instance state
* @param string $path
*/
function setPath($path) {
$this->setAbsolutePath(static::absolutizePath($path));
}
/**
* Setts absolute path
* @param string $path
*/
function setAbsolutePath($path) {
$this->path = $path;
}
/**
* Gets current filepath
* @return string
*/
function getPath($a = "") {
if(a != "") {
trigger_error("getPath with absolutizing argument is deprecated!", E_USER_DEPRECATED);
}
return $this->path;
}
/**
* Converts relative path to absolute
*/
static function absolutizePath($path) {
$path = realpath($path);
if(!$path) {
// TODO: use hack like http://stackoverflow.com/questions/4049856/replace-phps-realpath or http://www.php.net/manual/en/function.realpath.php#84012
// probaly as optinal feature that can be turned on when you know, what are you doing
throw new BigFileToolsException("Not possible to resolve absolute path.");
}
return $path;
}
static function isReadableFile($file) {
// Do not use is_file
// @link https://bugs.php.net/bug.php?id=27792
// $readable = is_readable($file); // does not always return correct value for directories
$fp = @fopen($file, "r"); // must be file and must be readable
if($fp) {
fclose($fp);
return true;
}
return false;
}
/**
* Moves file to new location / rename
* @param string $dest
*/
function move($dest) {
if (move_uploaded_file($this->path, $dest)) {
$this->setPath($dest);
return TRUE;
} else {
@unlink($dest); // needed in PHP < 5.3 & Windows; intentionally @
if (rename($this->path, $dest)) {
$this->setPath($dest);
return TRUE;
} else {
if (copy($this->path, $dest)) {
unlink($this->path); // delete file
$this->setPath($dest);
return TRUE;
}
return FALSE;
}
}
}
/**
* Changes path of this file object
* @param string $dest
*/
function relocate($dest) {
trigger_error("Relocate is deprecated!", E_USER_DEPRECATED);
$this->setPath($dest);
}
/**
* Size of file
*
* Profiling results:
* sizeCurl 0.00045299530029297
* sizeNativeSeek 0.00052094459533691
* sizeCom 0.0031449794769287
* sizeExec 0.042937040328979
* sizeNativeRead 2.7670161724091
*
* @return string | float
* @throws BigFileToolsException
*/
public function getSize($float = false) {
if ($float == true) {
return (float) $this->getSize(false);
}
$return = $this->sizeCurl();
if ($return) {
$this->debug("sizeCurl succeeded");
return $return;
}
$this->debug("sizeCurl failed");
$return = $this->sizeNativeSeek();
if ($return) {
$this->debug("sizeNativeSeek succeeded");
return $return;
}
$this->debug("sizeNativeSeek failed");
$return = $this->sizeCom();
if ($return) {
$this->debug("sizeCom succeeded");
return $return;
}
$this->debug("sizeCom failed");
$return = $this->sizeExec();
if ($return) {
$this->debug("sizeExec succeeded");
return $return;
}
$this->debug("sizeExec failed");
if (!self::$fastMode) {
$return = $this->sizeNativeRead();
if ($return) {
$this->debug("sizeNativeRead succeeded");
return $return;
}
$this->debug("sizeNativeRead failed");
}
throw new BigFileToolsException("Can not size of file $this->path !");
}
// <editor-fold defaultstate="collapsed" desc="size* implementations">
/**
* Returns file size by using native fseek function
* @see http://www.php.net/manual/en/function.filesize.php#79023
* @see http://www.php.net/manual/en/function.filesize.php#102135
* @return string | bool (false when fail)
*/
protected function sizeNativeSeek() {
$fp = fopen($this->path, "rb");
if (!$fp) {
return false;
}
flock($fp, LOCK_SH);
$result= fseek($fp, 0, SEEK_END);
if ($result===0) {
if (PHP_INT_SIZE < 8) {
// 32bit
$return = 0.0;
$step = 0x7FFFFFFF;
while ($step > 0) {
if (0 === fseek($fp, - $step, SEEK_CUR)) {
$return += floatval($step);
} else {
$step >>= 1;
}
}
}
else { //64bit
$return = ftell($fp);
}
}
else $return = false;
flock($fp, LOCK_UN);
fclose($fp);
return $return;
}
/**
* Returns file size by using native fread function
* @see http://stackoverflow.com/questions/5501451/php-x86-how-to-get-filesize-of-2gb-file-without-external-program/5504829#5504829
* @return string | bool (false when fail)
*/
protected function sizeNativeRead() {
$fp = fopen($this->path, "rb");
if (!$fp) {
return false;
}
flock($fp, LOCK_SH);
rewind($fp);
$offset = PHP_INT_MAX - 1;
$size = (string) $offset;
if (fseek($fp, $offset) !== 0) {
flock($fp, LOCK_UN);
fclose($fp);
return false;
}
$chunksize = 1024 * 1024;
while (!feof($fp)) {
$read = strlen(fread($fp, $chunksize));
if (self::$mathLib == self::MATH_BCMATH) {
$size = bcadd($size, $read);
} elseif (self::$mathLib == self::MATH_GMP) {
$size = gmp_add($size, $read);
} else {
throw new BigFileToolsException("No mathematical library available");
}
}
if (self::$mathLib == self::MATH_GMP) {
$size = gmp_strval($size);
}
flock($fp, LOCK_UN);
fclose($fp);
return $size;
}
/**
* Returns file size using curl module
* @see http://www.php.net/manual/en/function.filesize.php#100434
* @return string | bool (false when fail or cUrl module not available)
*/
protected function sizeCurl() {
// curl solution - cross platform and really cool :)
if (self::$FTPProtocolCurlEnabled && function_exists("curl_init")) {
$ch = curl_init("file://" . $this->path);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
$data = curl_exec($ch);
if ($data=="" || empty($data)) $this->debug(stripslashes(curl_error($ch)));
curl_close($ch);
if ($data !== false && preg_match('/Content-Length: (\d+)/', $data, $matches)) {
return (string) $matches[1];
}
} else {
return false;
}
}
/**
* Returns file size by using external program (exec needed)
* @see http://stackoverflow.com/questions/5501451/php-x86-how-to-get-filesize-of-2gb-file-without-external-program/5502328#5502328
* @return string | bool (false when fail or exec is disabled)
*/
protected function sizeExec() {
// filesize using exec
if (function_exists("exec")) {
if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') { // Windows
// Try using the NT substition modifier %~z
$escapedPath = escapeshellarg($this->path);
$size = trim(exec("for %F in ($escapedPath) do @echo %~zF"));
}else{ // other OS
// If the platform is not Windows, use the stat command (should work for *nix and MacOS)
if (self::$posix) {
$tmpsize=trim(exec("du \"".$this->path."\" | cut -f1"));
//du returns blocks/KB
$size=(int)$tmpsize*1024; //make it bytes
}
else $size=trim(exec('stat "'.$this->path.'" | grep -i -o -E "Size: ([0-9]+)" | cut -d" " -f2'));
if (self::$debug) var_dump($size);
return $size;
}
}
return false;
}
/**
* Returns file size by using Windows COM interface
* @see http://stackoverflow.com/questions/5501451/php-x86-how-to-get-filesize-of-2gb-file-without-external-program/5502328#5502328
* @return string | bool (false when fail or COM not available)
*/
protected function sizeCom() {
if (class_exists("COM")) {
// Use the Windows COM interface
$fsobj = new COM('Scripting.FileSystemObject');
if (dirname($this->path) == '.')
$this->path = ((substr(getcwd(), -1) == DIRECTORY_SEPARATOR) ? getcwd() . basename($this->path) : getcwd() . DIRECTORY_SEPARATOR . basename($this->path));
$f = $fsobj->GetFile($this->path);
return (string) $f->Size;
}
}
// </editor-fold>
}
BigFileTools::init();
class BigFileToolsException extends Exception{}
回答13:
Well easyest way to do that would be to simply add a max value to your number. This means on x86 platform long number add 2^32:
if($size < 0) $size = pow(2,32) + $size;
example: Big_File.exe - 3,30Gb (3.554.287.616 b) your function returns -740679680 so you add 2^32 (4294967296) and get 3554287616.
You get negative number because your system reserves one bit of memory to the negative sign, so you are left with 2^31 (2.147.483.648 = 2G) maximum value of either negative or positive number. When system reaches this maximum value it doesn't stop but simply overwrites that last reserved bit and your number is now forced to negative. In simpler words, when you exceed maximum positive number you will be forced to maximum negative number, so 2147483648 + 1 = -2147483648. Further addition goes towards zero and again towards maximum number.
As you can see it is like a circle with highest and lowest numbers closing the loop.
Total maximum number that x86 architecture can "digest" in one tick is 2^32 = 4294967296 = 4G, so as long as your number is lower than that, this simple trick will always work. In higher numbers you must know how many times you have passed the looping point and simply multiply it by 2^32 and add it to your result:
$size = pow(2,32) * $loops_count + $size;
Ofcourse in basic PHP functions this is quite hard to do, because no function will tell you how many times it has passed the looping point, so this won't work for files over 4Gigs.
回答14:
I wrote an function which returns the file size exactly and is quite fast:
function file_get_size($file) {
//open file
$fh = fopen($file, "r");
//declare some variables
$size = "0";
$char = "";
//set file pointer to 0; I'm a little bit paranoid, you can remove this
fseek($fh, 0, SEEK_SET);
//set multiplicator to zero
$count = 0;
while (true) {
//jump 1 MB forward in file
fseek($fh, 1048576, SEEK_CUR);
//check if we actually left the file
if (($char = fgetc($fh)) !== false) {
//if not, go on
$count ++;
} else {
//else jump back where we were before leaving and exit loop
fseek($fh, -1048576, SEEK_CUR);
break;
}
}
//we could make $count jumps, so the file is at least $count * 1.000001 MB large
//1048577 because we jump 1 MB and fgetc goes 1 B forward too
$size = bcmul("1048577", $count);
//now count the last few bytes; they're always less than 1048576 so it's quite fast
$fine = 0;
while(false !== ($char = fgetc($fh))) {
$fine ++;
}
//and add them
$size = bcadd($size, $fine);
fclose($fh);
return $size;
}
来源:https://stackoverflow.com/questions/5501451/php-x86-how-to-get-filesize-of-2-gb-file-without-external-program