fseek to a 32-bit unsigned offset

问题

I am reading a file format (TIFF) that has 32-bit unsigned offsets from the beginning of the file.

Unfortunately the prototype for fseek, the usual way I would go to particular file offset, is:

int fseek ( FILE * stream, long int offset, int origin );

so the offset is signed. How should I handle this situation? Should I be using a different function for seeking?

回答1:

You can try to use lseek64() (man page)

  #define _LARGEFILE64_SOURCE     /* See feature_test_macros(7) */
  #include <sys/types.h>
  #include <unistd.h>

  off64_t lseek64(int fd, off64_t offset, int whence);

With

  int fd = fileno (stream);

Notes from The GNU C lib - Setting the File Position of a Descriptor

This function is similar to the lseek function. The difference is that the offset parameter is of type off64_t instead of off_t which makes it possible on 32 bit machines to address files larger than 2^31 bytes and up to 2^63 bytes. The file descriptor filedes must be opened using open64 since otherwise the large offsets possible with off64_t will lead to errors with a descriptor in small file mode.

When the source file is compiled with _FILE_OFFSET_BITS == 64 on a 32 bits machine this function is actually available under the name lseek and so transparently replaces the 32 bit interface.

About fd and stream, from Streams and File Descriptors

Since streams are implemented in terms of file descriptors, you can extract the file descriptor from a stream and perform low-level operations directly on the file descriptor. You can also initially open a connection as a file descriptor and then make a stream associated with that file descriptor.

回答2:

After studying this question more deeply and considering the other comments and answers (thank you), I think the simplest approach is to do two seeks if the offset is greater than 2147483647 bytes. This allows me to keep the offsets as uint32_t and continue using fseek. The positioning code is therefore like this:

// note: error handling code omitted
uint32_t offset = ... (whatever it is)
if( offset > 2147483647 ){
   fseek( file, 2147483647, SEEK_SET );
   fseek( file, (long int)( offset - 2147483647 ), SEEK_CUR );
} else {
   fseek( file, (long int) offset, SEEK_SET );
}

The problem with using 64-bit types is that the code might be running on a 32-bit architecture (among other things). There is a function fsetpos which uses a structure fpos_t to manage arbitrarily large offsets, but that brings with it a range of complexities. Although fsetpos might make sense if I was truly using offsets of arbitrarily large size, since I know the largest possible offset is uint32_t, then the double seek meets that need.

Note that this solution allows all TIFF files to be handled on a 32-bit system. The advantage of this is obvious if you consider commercial programs like PixInsight. PixInsight can only handle TIFF files smaller than 2147483648 bytes when running on 32-bit systems. To handle full sized TIFF files, a user has to use the 64-bit version of PixInsight on a 64-bit computer. This is probably because the PixInsight programmers used a 64-bit type to handle the offsets internally. Since my solution only uses 32-bit types, I can handle full-sized TIFF files on a 32-bit system (as long as the underlying operating system can handle files that large).

来源：https://stackoverflow.com/questions/47739641/fseek-to-a-32-bit-unsigned-offset

标签

file

types

tiff