问题
I create a file transfer program which upload files (huge file about 4gb) using html5 chunking. each chunk is sized of 100MB (I just choose this for no reason, as I try use 10MB, it does not really have any difference as far as I can tell).
It upload correctly each chunk. but at the end of finish uploading, I try to merge the file back into 1 piece, but it takes so much time. If I try to refresh the web ui for the uploader, it won't work until it finish merging.
my merge code something like this:
$final_file_path = fopen($target_path.$file_name, "ab");
//Reconstructed File
for ($i = 0; $i <= $file_total_chunk; $i++) {
$file_chunk = $target_path.$file_name.$i;
if ( $final_file_path ) {
// Read binary input stream and append it to temp file
$in = fopen($file_chunk, "rb");
if ( $in ) {
//while ( $buff = fread( $in, 1048576 ) ) {
while ( $buff = fread( $in, 104857600 ) ) {
fwrite($final_file_path, $buff);
}
}
if(fclose($in)) {
unlink($file_chunk);
}
}
}
fclose($final_file_path);
Is there anyway to do it efficiently and fast. I'm using PHP.
Thank you
回答1:
You probably should think about splitting the upload and the concatenation process into two separate processes. The uploading and informing the user that the file has been uploaded (via the web page) can be done together and the backend processing should probably be done in a completely separate process.
I'd look at setting up a job queue to handle the concatenation process, where the PHP upload script, once completed, puts a job in the queue and daemon running on the server spawns a worker to do the concatenation.
Personally, I'd have the worker do the concatenation using cat
.
$> cat chunk_1 chunk_2 ... chunk_n > uploaded_file_name
If you still wanted to do this in PHP, then you do something like:
for ($1 = 0; $i <= $file_total_chunk; $i++) {
$files[] = $target_path.$file_name.$i;
}
$catCmd = "cat " . implode(" ", $files) . " > " . $final_file_path;
exec($catCmd);
Make sure you've sanitized your filenames otherwise it'll be possible to inject arbitrary code that will be executed on the commandline here.
回答2:
If you dont want to wait when using php with exec
function, you can use gearman work queue with asynchronous response from workers. Inside worker you can use @hafichuk solution. Queue make your whole application more scalable.
来源:https://stackoverflow.com/questions/14205445/how-to-merge-chunks-of-file-result-of-html5-chunking-into-one-file-fast-and-ef