The precedence among the syntax rules of translation is specified by the
1.Physical source file characters are mapped, in an
to the basic source character set (introducing new-line characters for end-of-line
indicators) if necessary. The set of physical source file characters accepted is implementation-defined. Any
source file character not in the
basic source character set is replaced by the
designates that character. An implementation may use any internal
encoding, so long as an actual extended character encountered in the
source file, and the same extended character expressed in the source
file as a universal-character-name (e.g., using the \
uXXXX notation), are handled equivalently
except where this replacement is reverted ([lex.pptoken]) in a raw string literal.
2.Each instance of a backslash character (\)
immediately followed by a new-line character is deleted, splicing
physical source lines to form logical source lines. Only the last
backslash on any physical source line shall be eligible for being part
of such a splice. Except for splices reverted in a raw string literal, if a splice results in
a character sequence that matches the
syntax of a universal-character-name, the behavior is
undefined. A source file that is not empty and that does not end in a new-line
character, or that ends in a new-line character immediately preceded by a
backslash character before any such splicing takes place,
shall be processed as if an additional new-line character were appended
to the file.
3.The source file is decomposed into preprocessing
tokens and sequences of white-space characters
(including comments). A source file shall not end in a partial
preprocessing token or in a partial comment.7
Each comment is replaced by one space character. New-line characters are
retained. Whether each nonempty sequence of white-space characters other
than new-line is retained or replaced by one space character is
unspecified. The process of dividing a source file's
characters into preprocessing tokens is context-dependent.
: See the handling of < within a #include preprocessing
directive. —end example
4.Preprocessing directives are executed, macro invocations are
expanded, and _Pragma unary operator expressions are executed. If a character sequence that matches the syntax of a
universal-character-name is produced by token
concatenation, the behavior is undefined. A
#include preprocessing directive causes the named header or
source file to be processed from phase 1 through phase 4, recursively. All preprocessing directives are then deleted.
5.Each basic source character set member in a character literal or a string
literal, as well as each escape sequence and universal-character-name in a
character literal or a non-raw string literal, is converted to the corresponding
member of the execution character set ([lex.ccon], [lex.string]); if
there is no corresponding member, it is converted to an implementation-defined member other
than the null (wide) character.8
6.Adjacent string literal tokens are concatenated.
7.White-space characters separating tokens are no longer
significant. Each preprocessing token is converted into a
token. The resulting tokens are syntactically and
semantically analyzed and translated as a translation unit.
: The process of analyzing and translating the tokens may occasionally
result in one token being replaced by a sequence of other
tokens ([temp.names]). —end note
whether the sources for
module units and header units
on which the current translation unit has an interface
dependency ([module.unit], [module.import])
are required to be available.
: Source files, translation
units and translated translation units need not necessarily be stored as
files, nor need there be any one-to-one correspondence between these
entities and any external representation. The description is conceptual
only, and does not specify any particular implementation. —end note
8.Translated translation units and instantiation units are combined
: These are similar
to translated translation units, but contain no references to
uninstantiated templates and no template definitions. —end note
program is ill-formed if any instantiation fails.
9.All external entity references are resolved. Library
components are linked to satisfy external references to
entities not defined in the current translation. All such translator
output is collected into a program image which contains information
needed for execution in its execution environment.
A partial preprocessing
token would arise from a source file
ending in the first portion of a multi-character token that requires a
terminating sequence of characters, such as a header-name
that is missing the closing "
A partial comment
would arise from a source file ending with an unclosed /*