A related question is here, but my question is different.
But, I\'d like to know more about the internals of getchar() and stdin. I know that getchar() just ultimately c
getchar()'s input is line-buffered, and the input-buffer is limited, usually it's 4 kB. What you see at first is the echo of each character you're typing. When your press ENTER, then getchar() starts returning characters up to the LF (which is converted to CR-LF). When you keep on pressing keys without LF for some time, it stops echoing after 4096 characters, you have to press ENTER to continue.
I know that
getchar()
just ultimately callsfgetc(stdin)
.
Not necessarily. getchar
and getc
might as well expand to the actual procedure of reading from a file, with fgetc
implemented as
int fgetc(FILE *fp)
{
return getc(fp);
}
Hey, there's nothing in the buffer, so let stdin gather what the user types. [...] it seems this is more of a behavioral artifact of
stdin
rather thanfgetc()
.
I can only tell you what I know, and that is how Unix/Linux works. On that platform, a FILE
(including the thing that stdin
points to) holds a file descriptor (an int
) that is passed to the OS to indicate from which input source the FILE
gets data, plus a buffer and some other bookkeeping stuff.
The "gather" part then means "call the read
system call on the file descriptor to fill the buffer again". This varies per implementation of C, though.
The behaviour you're observing has nothing to do with C and getchar()
, but with the teletype (TTY) subsystem in the OS kernel.
For this you need to know how processes get their input from your keyboard and how they write their output to your terminal window (I assume you use UNIX and the following explanations apply specifically to UNIX, i.e. Linux, macOS, etc.):
The box entitled "Terminal" in above diagram is your terminal window, e.g. xterm, iTerm, or Terminal.app. In the old times, terminals where separate hardware devices, consisting of a keyboard and a screen, and they were connected to a (possibly remote) computer over a serial line (RS-232). Every character typed on the terminal keyboard was sent over this line to the computer and consumed by an application that was connected to the terminal. And every character that the application produced as output was sent over the same line to the terminal which displayed it on the screen.
Nowadays, terminals are not hardware devices anymore, but they moved "inside" the computer and became processes that are referred to as terminal emulators. xterm, iTerm2, Terminal.app, etc., are all terminal emulators.
However, the communication mechanism between applications and terminal emulators stayed the same as it was for hardware terminals. Terminal emulators emulate hardware terminals. That means, from the point of view of an application, talking to a terminal emulator today (e.g. iTerm2) works the same as talking to a real terminal (e.g. a DEC VT100) back in 1979. This mechanism was left unchanged so that applications developed for hardware terminals would still work with software terminal emulators.
So how does this communication mechanism work? UNIX has a subsystem called TTY in the kernel (TTY stands for teletype, which was the earliest form of computer terminals that didn't even have a screen, just a keyboard and a printer). You can think of TTY as a generic driver for terminals. TTY reads bytes from the port to which a terminal is connected (coming from the keyboard of the terminal), and writes bytes to this port (being sent to the display of the terminal).
There is a TTY instance for every terminal that is connected to a computer (or for every terminal emulator process running on the computer). Therefore, a TTY instance is also referred to as a TTY device (from the point of view of an application, talking to a TTY instance is like talking to a terminal device). In the UNIX manner of making driver interfaces available as files, these TTY devices are surfaced as /dev/tty*
in some form, for example, on macOS they are /dev/ttys001
, /dev/ttys002
, etc.
An application can have its standard streams (stdin, stdout, stderr) directed to a TTY device (in fact, this is the default, and you can find out to which TTY device your shell is connected with the tty
command). This means that whatever the user types on the keyboard becomes the standard input of the application, and whatever the application writes to its standard output is sent to the terminal screen (or terminal window of a terminal emulator). All this happens through the TTY device, that is, the application only communicates with the TTY device (this type of driver) in the kernel.
Now, the crucial point: the TTY device does more than just passing every input character to the standard input of the application. By default, the TTY device applies a so-called line discipline to the received characters. That means, it locally buffers them and interprets delete, backspace and other line editing characters, and only passes them to standard input of the application when it receives a carriage return or line feed, which means that the user has finished entering and editing a whole line.
That means until the user hits return, getchar()
doesn't see anything in stdin. It's like nothing had been typed so far. Only when the user hits return, the TTY device sends these characters to the standard input of the application, where getchar()
immediately reads them as.
In that sense, there is nothing special about the behaviour of getchar()
. It just immediately reads characters in stdin as they become available. The line buffering that you observe happens in the TTY device in the kernel.
Now to the interesting part: this TTY device can be configures. You can do it, for example, from a shell with the stty
command. This allows you to configure almost every aspect of the line discipline that the TTY device applies to incoming characters. Or you can disable any processing whatsoever by setting the TTY device to raw mode. In this case, the TTY device forwards every received character immediately to stdin of the application without any form of editing.
If you enable raw mode in the TTY device, you will see that getchar()
immediately receives every character that you type on the keyboard. The following C program demonstrates this:
#include <stdio.h>
#include <unistd.h> // STDIN_FILENO, isatty(), ttyname()
#include <stdlib.h> // exit()
#include <termios.h>
int main() {
struct termios tty_opts_backup, tty_opts_raw;
if (!isatty(STDIN_FILENO)) {
printf("Error: stdin is not a TTY\n");
exit(1);
}
printf("stdin is %s\n", ttyname(STDIN_FILENO));
// Back up current TTY settings
tcgetattr(STDIN_FILENO, &tty_opts_backup);
// Change TTY settings to raw mode
cfmakeraw(&tty_opts_raw);
tcsetattr(STDIN_FILENO, TCSANOW, &tty_opts_raw);
// Read and print characters from stdin
int c, i = 1;
for (c = getchar(); c != 3; c = getchar()) {
printf("%d. 0x%02x (0%02o)\r\n", i++, c, c);
}
printf("You typed 0x03 (003). Exiting.\r\n");
// Restore previous TTY settings
tcsetattr(STDIN_FILENO, TCSANOW, &tty_opts_backup);
}
The program sets the current process' TTY device to raw mode, then uses getchar()
to read and print characters from stdin in a loop. The characters are printed as ASCII codes in hexadecimal and octal notation. The program specially interprets the ETX
character (ASCII code 0x03) as a trigger to terminate. You can produce this character on your keyboard by typing Ctrl-C
.