How does the C/C++ compiler manipulate the escape character ["\"] in source code? How is compiler grammar written for processing that character? What does the compiler do after encountering that character?
Most compilers are divided into parts: the compiler front-end is called a lexical analyzer or a scanner. This part of the compiler reads the actual characters and creates tokens. It has a state machine which decides, upon seeing an escape character, whether it is genuine (for example when it appears inside a string) or it modifies the next character. The token is output accordingly as the escape character or some other token (such as a tab or a newline) to the next part of the compiler (the parser). The state machine can group several characters into a token.
An interesting note on this subject is On Trusting Trust [PDF link].
The paper describes one way a compiler could handle this problem exactly, shows how the c-written-in-c compiler does not have an explicit translation of the codes into ASCII values; and how to bootstrap a new escape code into the compiler so that the understanding of the ASCII value for the new code is also implicit.
It generally escapes the following character:
- In a string literal or character literal, it means escape the next character.
\a
means 'alert' (flashing the terminal, beeping or whatever),\n
means 'linefeed',\xNUM
means an hexadecimal number for example. - If it appears as the last visible character before a newline, whether within a string or not (and even within a line-wide comment!), it acts as a line-continuation: The following newline character is ignored, and the next line is merged with the current line.
escape character with a following character (like \n
) is a single character for C compiler - scanner presents it to parser as character token, so there is no need in special syntax rules in parser for escape character.
来源:https://stackoverflow.com/questions/323407/whats-the-magic-behind-escape-character