1. | Physical source file characters are mapped, in an
implementation-defined manner,
to the translation character set ([lex.charset])
(introducing new-line characters for end-of-line indicators). The set of physical source file characters accepted is implementation-defined. |
2. | Each sequence of a backslash character (\)
immediately followed by
zero or more whitespace characters other than new-line followed by
a new-line character is deleted, splicing
physical source lines to form logical source lines. Only the last
backslash on any physical source line shall be eligible for being part
of such a splice. Except for splices reverted in a raw string literal, if a splice results in
a character sequence that matches the
syntax of a universal-character-name, the behavior is
undefined. A source file that is not empty and that does not end in a new-line
character, or that ends in a splice,
shall be processed as if an additional new-line character were appended
to the file. |
3. | The source file is decomposed into preprocessing
tokens ([lex.pptoken]) and sequences of whitespace characters
(including comments).
Each comment is replaced by one space character. New-line characters are
retained. Whether each nonempty sequence of whitespace characters other
than new-line is retained or replaced by one space character is
unspecified. As characters from the source file are consumed
to form the next preprocessing token
(i.e., not being consumed as part of a comment or other forms of whitespace),
except when matching a
c-char-sequence,
s-char-sequence,
r-char-sequence,
h-char-sequence, or
q-char-sequence,
universal-character-names are recognized and
replaced by the designated element of the translation character set. The process of dividing a source file's
characters into preprocessing tokens is context-dependent. |
4. | Preprocessing directives are executed, macro invocations are
expanded, and _Pragma unary operator expressions are executed. A #include preprocessing directive causes the named header or
source file to be processed from phase 1 through phase 4, recursively. All preprocessing directives are then deleted. |
5. | For a sequence of two or more adjacent string-literal tokens,
a common encoding-prefix is determined
as specified in [lex.string]. |
6. | |
7. | Whitespace characters separating tokens are no longer
significant. Each preprocessing token is converted into a
token ([lex.token]). The resulting tokens are syntactically and
semantically analyzed and translated as a translation unit. [Note 1: The process of analyzing and translating the tokens can occasionally
result in one token being replaced by a sequence of other
tokens ([temp.names]). — end note]
It is
implementation-defined
whether the sources for
module units and header units
on which the current translation unit has an interface
dependency ([module.unit], [module.import])
are required to be available. [Note 2: Source files, translation
units and translated translation units need not necessarily be stored as
files, nor need there be any one-to-one correspondence between these
entities and any external representation. The description is conceptual
only, and does not specify any particular implementation. — end note] |
8. | Translated translation units and instantiation units are combined
as follows:
Each translated translation unit is examined to
produce a list of required instantiations. [Note 4: This can include
instantiations which have been explicitly
requested ([temp.explicit]). — end note]
The definitions of the
required templates are located. It is implementation-defined whether the
source of the translation units containing these definitions is required
to be available.
All the required instantiations
are performed to produce
instantiation units.
The
program is ill-formed if any instantiation fails. |
9. |
character | glyph | ||
U+0009 | character tabulation | ||
U+000b | line tabulation | ||
U+000c | form feed | ||
U+0020 | space | ||
U+000a | line feed | new-line | |
U+0021 | exclamation mark | ! | |
U+0022 | quotation mark | " | |
U+0023 | number sign | # | |
U+0025 | percent sign | % | |
U+0026 | ampersand | & | |
U+0027 | apostrophe | ' | |
U+0028 | left parenthesis | ( | |
U+0029 | right parenthesis | ) | |
U+002a | asterisk | * | |
U+002b | plus sign | + | |
U+002c | comma | , | |
U+002d | hyphen-minus | - | |
U+002e | full stop | . | |
U+002f | solidus | / | |
U+0030 .. U+0039 | digit zero .. nine | 0 1 2 3 4 5 6 7 8 9 | |
U+003a | colon | : | |
U+003b | semicolon | ; | |
U+003c | less-than sign | < | |
U+003d | equals sign | = | |
U+003e | greater-than sign | > | |
U+003f | question mark | ? | |
U+0041 .. U+005a | latin capital letter a .. z | A B C D E F G H I J K L M | |
N O P Q R S T U V W X Y Z | |||
U+005b | left square bracket | [ | |
U+005c | reverse solidus | \ | |
U+005d | right square bracket | ] | |
U+005e | circumflex accent | ^ | |
U+005f | low line | _ | |
U+0061 .. U+007a | latin small letter a .. z | a b c d e f g h i j k l m | |
n o p q r s t u v w x y z | |||
U+007b | left curly bracket | { | |
U+007c | vertical line | | | |
U+007d | right curly bracket | } | |
U+007e | tilde | ~ |
alignas | constinit | false | public | true | |
alignof | const_cast | float | register | try | |
asm | continue | for | reinterpret_cast | typedef | |
auto | co_await | friend | requires | typeid | |
bool | co_return | goto | return | typename | |
break | co_yield | if | short | union | |
case | decltype | inline | signed | unsigned | |
catch | default | int | sizeof | using | |
char | delete | long | static | virtual | |
char8_t | do | mutable | static_assert | void | |
char16_t | double | namespace | static_cast | volatile | |
char32_t | dynamic_cast | new | struct | wchar_t | |
class | else | noexcept | switch | while | |
concept | enum | nullptr | template | ||
const | explicit | operator | this | ||
consteval | export | private | thread_local | ||
constexpr | extern | protected | throw |
and | and_eq | bitand | bitor | compl | not | |
not_eq | or | or_eq | xor | xor_eq |
integer-literal other than decimal-literal | |||
none | int | int | |
long int | unsigned int | ||
long long int | long int | ||
unsigned long int | |||
long long int | |||
unsigned long long int | |||
u or U | unsigned int | unsigned int | |
unsigned long int | unsigned long int | ||
unsigned long long int | unsigned long long int | ||
l or L | long int | long int | |
long long int | unsigned long int | ||
long long int | |||
unsigned long long int | |||
Both u or U | unsigned long int | unsigned long int | |
and l or L | unsigned long long int | unsigned long long int | |
ll or LL | long long int | long long int | |
unsigned long long int | |||
Both u or U | unsigned long long int | unsigned long long int | |
and ll or LL | |||
z or Z | the signed integer type corresponding | the signed integer type | |
to std::size_t ([support.types.layout]) | corresponding to std::size_t | ||
std::size_t | |||
Both u or U | std::size_t | std::size_t | |
and z or Z |
Encoding | Kind | Type | Associated char- | Example | |
prefix | acter encoding | ||||
none | char | ordinary | 'v' | ||
non-encodable ordinary character literal | int | literal | '\U0001F525' | ||
ordinary multicharacter literal | int | encoding | 'abcd' | ||
L | wchar_t | wide | L'w' | ||
non-encodable wide character literal | wchar_t | literal | L'\U0001F32A' | ||
wide multicharacter literal | wchar_t | encoding | L'abcd' | ||
u8 | char8_t | UTF-8 | u8'x' | ||
u | char16_t | UTF-16 | u'y' | ||
U | char32_t | UTF-32 | U'z' |
character | |||
U+000a | line feed | \n | |
U+0009 | character tabulation | \t | |
U+000b | line tabulation | \v | |
U+0008 | backspace | \b | |
U+000d | carriage return | \r | |
U+000c | form feed | \f | |
U+0007 | alert | \a | |
U+005c | reverse solidus | \\ | |
U+003f | question mark | \? | |
U+0027 | apostrophe | \' | |
U+0022 | quotation mark | \" |
type | ||
none | double | |
f or F | float | |
l or L | long double |
Encoding | Kind | Type | Associated | Examples | |
prefix | character | ||||
encoding | |||||
none | array of n const char | ordinary literal encoding | "ordinary string" R"(ordinary raw string)" | ||
L | array of n const wchar_t | wide literal encoding | L"wide string" LR"w(wide raw string)w" | ||
u8 | array of n const char8_t | UTF-8 | u8"UTF-8 string" u8R"x(UTF-8 raw string)x" | ||
u | array of n const char16_t | UTF-16 | u"UTF-16 string" uR"y(UTF-16 raw string)y" | ||
U | array of n const char32_t | UTF-32 | U"UTF-32 string" UR"z(UTF-32 raw string)z" |