Tried and true simple file copying code in C?

前端 未结 7 1153
醉梦人生
醉梦人生 2020-12-03 09:09

This looks like a simple question, but I didn\'t find anything similar here.

Since there is no file copy function in C, we have to implement file copying ourselves,

相关标签:
7条回答
  • Depending on what you mean by copying a file, it is certainly far from trivial. If you mean copying the content only, then there is almost nothing to do. But generally, you need to copy the metadata of the file, and that's surely platform dependent. I don't know of any C library which does what you want in a portable manner. Just handling the filename by itself is no trivial matter if you care about portability.

    In C++, there is the file library in boost

    0 讨论(0)
  • 2020-12-03 09:14

    Here is a very easy and clear example: Copy a file. Since it is written in ANSI-C without any particular function calls I think this one would be pretty much portable.

    0 讨论(0)
  • 2020-12-03 09:14

    The accepted answer written by Steve Jessop does not answer to the first part of the quession, Jonathan Leffler do it, but do it wrong: code should be written as

    while ((n = fread(buffer, 1, sizeof(buffer), f1)) > 0)
        if (fwrite(buffer, n, 1, f2) != 1)
            /* we got write error here */
    
    /* test ferror(f1) for a read errors */
    

    Explanation:

    1. sizeof(char) = 1 by definition, always: it does not matter how many bits in it, 8 (in most cases), 9, 11 or 32 (on some DSP, for example) — size of char is one. Note, it is not an error here, but an extra code.
    2. The fwrite function writes upto nmemb (second argument) elements of specified size (third argument), it does not required to write exactly nmemb elements. To fix this you must write the rest of the data readed or just write one element of size n — let fwrite do all his work. (This item is in question, should fwrite write all data or not, but in my version short writes impossible until error occurs.)
    3. You should test for a read errors too: just test ferror(f1) at the end of loop.

    Note, you probably need to disable buffering on both input and output files to prevent triple buffering: first on read to f1 buffer, second in our code, third on write to f2 buffer:

    setvbuf(f1, NULL, _IONBF, 0);
    setvbuf(f2, NULL, _IONBF, 0);
    

    (Internal buffers should, probably, be of size BUFSIZ.)

    0 讨论(0)
  • 2020-12-03 09:16

    One thing I found when implementing my own file copy, and it seems obvious but it's not: I/O's are slow. You can pretty much time your copy's speed by how many of them you do. So clearly you need to do as few of them as possible.

    The best results I found were when I got myself a ginourmous buffer, read the entire source file into it in one I/O, then wrote the entire buffer back out of it in one I/O. If I even had to do it in 10 batches, it got way slow. Trying to read and write out each byte, like a naieve coder might try first, was just painful.

    0 讨论(0)
  • 2020-12-03 09:17

    the size of each read need to be a multiple of 512 ( sector size ) 4096 is a good one

    0 讨论(0)
  • 2020-12-03 09:32

    As far as the actual I/O goes, the code I've written a million times in various guises for copying data from one stream to another goes something like this. It returns 0 on success, or -1 with errno set on error (in which case any number of bytes might have been copied).

    Note that for copying regular files, you can skip the EAGAIN stuff, since regular files are always blocking I/O. But inevitably if you write this code, someone will use it on other types of file descriptors, so consider it a freebie.

    There's a file-specific optimisation that GNU cp does, which I haven't bothered with here, that for long blocks of 0 bytes instead of writing you just extend the output file by seeking off the end.

    void block(int fd, int event) {
        pollfd topoll;
        topoll.fd = fd;
        topoll.events = event;
        poll(&topoll, 1, -1);
        // no need to check errors - if the stream is bust then the
        // next read/write will tell us
    }
    
    int copy_data_buffer(int fdin, int fdout, void *buf, size_t bufsize) {
        for(;;) {
           void *pos;
           // read data to buffer
           ssize_t bytestowrite = read(fdin, buf, bufsize);
           if (bytestowrite == 0) break; // end of input
           if (bytestowrite == -1) {
               if (errno == EINTR) continue; // signal handled
               if (errno == EAGAIN) {
                   block(fdin, POLLIN);
                   continue;
               }
               return -1; // error
           }
    
           // write data from buffer
           pos = buf;
           while (bytestowrite > 0) {
               ssize_t bytes_written = write(fdout, pos, bytestowrite);
               if (bytes_written == -1) {
                   if (errno == EINTR) continue; // signal handled
                   if (errno == EAGAIN) {
                       block(fdout, POLLOUT);
                       continue;
                   }
                   return -1; // error
               }
               bytestowrite -= bytes_written;
               pos += bytes_written;
           }
        }
        return 0; // success
    }
    
    // Default value. I think it will get close to maximum speed on most
    // systems, short of using mmap etc. But porters / integrators
    // might want to set it smaller, if the system is very memory
    // constrained and they don't want this routine to starve
    // concurrent ops of memory. And they might want to set it larger
    // if I'm completely wrong and larger buffers improve performance.
    // It's worth trying several MB at least once, although with huge
    // allocations you have to watch for the linux 
    // "crash on access instead of returning 0" behaviour for failed malloc.
    #ifndef FILECOPY_BUFFER_SIZE
        #define FILECOPY_BUFFER_SIZE (64*1024)
    #endif
    
    int copy_data(int fdin, int fdout) {
        // optional exercise for reader: take the file size as a parameter,
        // and don't use a buffer any bigger than that. This prevents 
        // memory-hogging if FILECOPY_BUFFER_SIZE is very large and the file
        // is small.
        for (size_t bufsize = FILECOPY_BUFFER_SIZE; bufsize >= 256; bufsize /= 2) {
            void *buffer = malloc(bufsize);
            if (buffer != NULL) {
                int result = copy_data_buffer(fdin, fdout, buffer, bufsize);
                free(buffer);
                return result;
            }
        }
        // could use a stack buffer here instead of failing, if desired.
        // 128 bytes ought to fit on any stack worth having, but again
        // this could be made configurable.
        return -1; // errno is ENOMEM
    }
    

    To open the input file:

    int fdin = open(infile, O_RDONLY|O_BINARY, 0);
    if (fdin == -1) return -1;
    

    Opening the output file is tricksy. As a basis, you want:

    int fdout = open(outfile, O_WRONLY|O_BINARY|O_CREAT|O_TRUNC, 0x1ff);
    if (fdout == -1) {
        close(fdin);
        return -1;
    }
    

    But there are confounding factors:

    • you need to special-case when the files are the same, and I can't remember how to do that portably.
    • if the output filename is a directory, you might want to copy the file into the directory.
    • if the output file already exists (open with O_EXCL to determine this and check for EEXIST on error), you might want to do something different, as cp -i does.
    • you might want the permissions of the output file to reflect those of the input file.
    • you might want other platform-specific meta-data to be copied.
    • you may or may not wish to unlink the output file on error.

    Obviously the answers to all these questions could be "do the same as cp". In which case the answer to the original question is "ignore everything I or anyone else has said, and use the source of cp".

    Btw, getting the filesystem's cluster size is next to useless. You'll almost always see speed increasing with buffer size long after you've passed the size of a disk block.

    0 讨论(0)
提交回复
热议问题