I search for a good way to copy a file (binary or text). I\'ve written several samples, everyone works. But I want hear the opinion of seasoned programmers.
I missin
For those who like boost:
boost::filesystem::path mySourcePath("foo.bar");
boost::filesystem::path myTargetPath("bar.foo");
// Variant 1: Overwrite existing
boost::filesystem::copy_file(mySourcePath, myTargetPath, boost::filesystem::copy_option::overwrite_if_exists);
// Variant 2: Fail if exists
boost::filesystem::copy_file(mySourcePath, myTargetPath, boost::filesystem::copy_option::fail_if_exists);
Note that boost::filesystem::path is also available as wpath for Unicode. And that you could also use
using namespace boost::filesystem
if you do not like those long type names
I'm not quite sure what a "good way" of copying a file is, but assuming "good" means "fast", I could broaden the subject a little.
Current operating systems have long been optimized to deal with run of the mill file copy. No clever bit of code will beat that. It is possible that some variant of your copy techniques will prove faster in some test scenario, but they most likely would fare worse in other cases.
Typically, the sendfile
function probably returns before the write has been committed, thus giving the impression of being faster than the rest. I haven't read the code, but it is most certainly because it allocates its own dedicated buffer, trading memory for time. And the reason why it won't work for files bigger than 2Gb.
As long as you're dealing with a small number of files, everything occurs inside various buffers (the C++ runtime's first if you use iostream
, the OS internal ones, apparently a file-sized extra buffer in the case of sendfile
). Actual storage media is only accessed once enough data has been moved around to be worth the trouble of spinning a hard disk.
I suppose you could slightly improve performances in specific cases. Off the top of my head:
copy_file
sequentially (though you'll hardly notice the difference as long as the file fits in the OS cache)But all that is outside the scope of a general purpose file copy function.
So in my arguably seasoned programmer's opinion, a C++ file copy should just use the C++17 file_copy
dedicated function, unless more is known about the context where the file copy occurs and some clever strategies can be devised to outsmart the OS.
I want to make the very important note that the LINUX method using sendfile() has a major problem in that it can not copy files more than 2GB in size! I had implemented it following this question and was hitting problems because I was using it to copy HDF5 files that were many GB in size.
http://man7.org/linux/man-pages/man2/sendfile.2.html
sendfile() will transfer at most 0x7ffff000 (2,147,479,552) bytes, returning the number of bytes actually transferred. (This is true on both 32-bit and 64-bit systems.)
Copy a file in a sane way:
#include <fstream>
int main()
{
std::ifstream src("from.ogv", std::ios::binary);
std::ofstream dst("to.ogv", std::ios::binary);
dst << src.rdbuf();
}
This is so simple and intuitive to read it is worth the extra cost. If we were doing it a lot, better to fall back on OS calls to the file system. I am sure boost
has a copy file method in its filesystem class.
There is a C method for interacting with the file system:
#include <copyfile.h>
int
copyfile(const char *from, const char *to, copyfile_state_t state, copyfile_flags_t flags);
Qt has a method for copying files:
#include <QFile>
QFile::copy("originalFile.example","copiedFile.example");
Note that to use this you have to install Qt (instructions here) and include it in your project (if you're using Windows and you're not an administrator, you can download Qt here instead). Also see this answer.
Too many!
The "ANSI C" way buffer is redundant, since a FILE
is already buffered. (The size of this internal buffer is what BUFSIZ
actually defines.)
The "OWN-BUFFER-C++-WAY" will be slow as it goes through fstream
, which does a lot of virtual dispatching, and again maintains internal buffers or each stream object. (The "COPY-ALGORITHM-C++-WAY" does not suffer this, as the streambuf_iterator
class bypasses the stream layer.)
I prefer the "COPY-ALGORITHM-C++-WAY", but without constructing an fstream
, just create bare std::filebuf
instances when no actual formatting is needed.
For raw performance, you can't beat POSIX file descriptors. It's ugly but portable and fast on any platform.
The Linux way appears to be incredibly fast — perhaps the OS let the function return before I/O was finished? In any case, that's not portable enough for many applications.
EDIT: Ah, "native Linux" may be improving performance by interleaving reads and writes with asynchronous I/O. Letting commands pile up can help the disk driver decide when is best to seek. You might try Boost Asio or pthreads for comparison. As for "can't beat POSIX file descriptors"… well that's true if you're doing anything with the data, not just blindly copying.