Difference between “file pointer”, “stream”, “file descriptor” and … “file”?

那年仲夏 提交于 2020-06-16 09:51:09

问题


There are a few related concepts out there, namely file pointer, stream and file descriptor. I know that a file pointer is a pointer to the data type FILE (declared in e.g. FILE.h and struct_FILE.h). I know a file descriptor is an int, e.g. member _fileno of FILE (and _IO_FILE).

As for the subtle difference between stream and file, I am still learning.

But from here, I am not clear if there is yet another type of entity to which the "file status flags" apply. Concretely, I wouldn't know if "file status flags" apply to a FILE, to a file descriptor, or what. I am looking for official references that show the specifics.

Related:

What's the difference between a file descriptor and file pointer?

Whats is difference between file descriptor and file pointer?

What is the concept behind file pointer or the stream pointer?

Specification of file descriptors (I asked this)

difference between file descriptor and socket file descriptor


回答1:


File Handle

When you visit a web site for the first time, the site might provide your browser with a cookie. The value of this cookie will automatically be provided to the web site on future requests by the browser.

The value of this cookie is likely gibberish to you, but it has meaning to that one specific web server. It's called a session id, and it's a key to look up a record in some kind of database. This record is called a session.

Sessions allow the web server to react to one request based on earlier requests and the consequences of earlier requests. For example, it allows the server to know that the browser provided credentials to the server in an earlier request, and that these credentials were successfully authenticated. This is why you don't need to resupply your credentials every time you want to post/vote/edit as a specific user on StackOverflow.

The cookie's value, the session id, is an opaque value. It doesn't have any meaning to you. The only way it's useful is by providing it back to the web server that gave it to you. Giving it to another web server isn't going to achieve anything useful. It's just a means of identifying a resource that exists in another system.

When that other system is an operating system, we call these resource-identifying opaque values "handles". This is by no means the only time the word handle is used this way, but it's the most common. In much the same way that a session id cookie provides the web server a way of linking web requests together, a handle provides the OS a way of linking system calls together. There are handles for all kinds of resources. There are window handles. There are handles for allocated memory buffers. And there are file handles.

By using the same file handle across multiple calls to read or write, the OS knows where the previous one left off and thus from where to continue. It also knows that you have access to the file from which you are reading or to which you are writing because those checks were done when the file was opened.

File handles aren't just for plain files. A file handle can also reference a pipe, a socket, or one of a number of other things. Once the handle is created, you just have to tell the OS you want to read from it or write to it, and it will use the handle to look up the information it needs to do that.


File Descriptor

This is the name given to file handles in the unix world. open(2) is said to return a file descriptor. read(2) is said to take a file descriptor.


FILE* aka FILE Pointer aka File Pointer

This is also a file handle. But unlike a file descriptor, it's not from the OS. A FILE* is a C library file handle. You can't pass a FILE* to read(2) (a system call) any more than you can pass a file descriptor to fread(3) (a C library function).

You should never access the members of FILE, assuming it even has any. Like all handles, it's meant to be opaque to those receiving it. It's meant to be a box into which you can't see. Code that breaks this convention isn't portable and can break at any time.

Most C library file handles reference an object that includes a file descriptor. (Ones returned by fmemopen and open_memstream don't.) It also includes support for buffering, and possibly more.


File Status Flags

This is not a term you will ever need to use. It's my first time hearing it. Or maybe I just forgot hearing it because it's not important. In the linked document, it's used to refer to a group of constants. Various system calls can be provided some combinations of some of the constants in this group for certain arguments. Refer to the documentation of each system to see what flags it can accept, and what meanings those flags has to it.


Stream

Earlier, I compared file handles to session ids. If a session id allows a web server to look up a session, what is a file handle used to look up? The documentation for the C library I/O functions calls it a stream.

A stream is a loose term that usually refers to a sequence of indeterminate length. It's a term commonly used in communication to refer to the data being communicated between a writer/sender/producer and a reader/receiver/consumer.

A stream is accessed sequentially, whether it's out of necessity or because it's convenient. The possibility of jumping to a different point in the stream doesn't automatically disqualify the use of the term. Like I mentioned above, it's a loose term.

The length of a stream is often unknown. It might be even be unknown to the sender. Take for example a task producing a stream on the fly, possibly from other streams. A stream could even be infinitely long. Sometimes, the length of the stream is knowable, but simply disregarded. And sometimes, the length is known but not in usable units. A program reading lines of variable length from a stream probably can't do anything useful with the length of the stream in bytes.

Take two programs communicating via a pipe like in cat <file1 | cat >file2. We can refer to the data going through the pipe as a stream. The sender may or may not know how many bytes/lines/messages it will eventually send. The sender will send some bytes and later some more, until it eventually signals that no more will follow. The reader often has no idea how many bytes/lines/messages will eventually be sent by the producer. It will get some bytes and later some more, until it's eventually notified that the end of the stream has been reached.

Sometimes, it's more about how the data is treated. For example, reading from a file is often treated as reading from a stream. While it's possible to obtain the length of a file, this information is often disregarded. Instead, the programs that disregard this information just keeps pulling bytes or lines from the file handle until it receives an indication that it reached the end of the stream.

Random access is an example of a file not being treated as a stream. Random access refers to the practice of retrieving data from arbitrary locations of the file. One might do this when one has an index of what's found in the file. An index is some mapping between a key and the location of the item identified by that key in the file. For example, if I know the data pertaining to a user is found at a certain location in a file, I can request that portion of the file from the OS rather than reading the file from the start.



来源:https://stackoverflow.com/questions/61911258/difference-between-file-pointer-stream-file-descriptor-and-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!