5 Lexical conventions [lex]

5.1 Separate translation [lex.separate]

The text of the program is kept in units called source files in this International Standard.
A source file together with all the headers and source files included via the preprocessing directive #include, less any source lines skipped by any of the conditional inclusion preprocessing directives, is called a translation unit.
[Note
:
A C++ program need not all be translated at the same time.
end note
]
[Note
:
Previously translated translation units and instantiation units can be preserved individually or in libraries.
The separate translation units of a program communicate ([basic.link]) by (for example) calls to functions whose identifiers have external linkage, manipulation of objects whose identifiers have external linkage, or manipulation of data files.
Translation units can be separately translated and then later linked to produce an executable program.
end note
]

5.2 Phases of translation [lex.phases]

The precedence among the syntax rules of translation is specified by the following phases.12
  1. 1.
    Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary.
    The set of physical source file characters accepted is implementation-defined.
    Any source file character not in the basic source character set is replaced by the universal-character-name that designates that character.
    An implementation may use any internal encoding, so long as an actual extended character encountered in the source file, and the same extended character expressed in the source file as a universal-character-name (e.g., using the \uXXXX notation), are handled equivalently except where this replacement is reverted ([lex.pptoken]) in a raw string literal.
  2. 2.
    Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines.
    Only the last backslash on any physical source line shall be eligible for being part of such a splice.
    Except for splices reverted in a raw string literal, if a splice results in a character sequence that matches the syntax of a universal-character-name, the behavior is undefined.
    A source file that is not empty and that does not end in a new-line character, or that ends in a new-line character immediately preceded by a backslash character before any such splicing takes place, shall be processed as if an additional new-line character were appended to the file.
  3. 3.
    The source file is decomposed into preprocessing tokens and sequences of white-space characters (including comments).
    A source file shall not end in a partial preprocessing token or in a partial comment.13
    Each comment is replaced by one space character.
    New-line characters are retained.
    Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is unspecified.
    The process of dividing a source file's characters into preprocessing tokens is context-dependent.
    [Example
    :
    See the handling of < within a #include preprocessing directive.
    end example
    ]
  4. 4.
    Preprocessing directives are executed, macro invocations are expanded, and _­Pragma unary operator expressions are executed.
    If a character sequence that matches the syntax of a universal-character-name is produced by token concatenation, the behavior is undefined.
    A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively.
    All preprocessing directives are then deleted.
  5. 5.
    Each source character set member in a character literal or a string literal, as well as each escape sequence and universal-character-name in a character literal or a non-raw string literal, is converted to the corresponding member of the execution character set ([lex.ccon], [lex.string]); if there is no corresponding member, it is converted to an implementation-defined member other than the null (wide) character.14
  6. 6.
    Adjacent string literal tokens are concatenated.
  7. 7.
    White-space characters separating tokens are no longer significant.
    Each preprocessing token is converted into a token.
    The resulting tokens are syntactically and semantically analyzed and translated as a translation unit.
    [Note
    :
    The process of analyzing and translating the tokens may occasionally result in one token being replaced by a sequence of other tokens ([temp.names]).
    end note
    ]
    [Note
    :
    Source files, translation units and translated translation units need not necessarily be stored as files, nor need there be any one-to-one correspondence between these entities and any external representation.
    The description is conceptual only, and does not specify any particular implementation.
    end note
    ]
  8. 8.
    Translated translation units and instantiation units are combined as follows:
    [Note
    :
    Some or all of these may be supplied from a library.
    end note
    ]
    Each translated translation unit is examined to produce a list of required instantiations.
    [Note
    :
    This may include instantiations which have been explicitly requested.
    end note
    ]
    The definitions of the required templates are located.
    It is implementation-defined whether the source of the translation units containing these definitions is required to be available.
    [Note
    :
    An implementation could encode sufficient information into the translated translation unit so as to ensure the source is not required here.
    end note
    ]
    All the required instantiations are performed to produce instantiation units.
    [Note
    :
    These are similar to translated translation units, but contain no references to uninstantiated templates and no template definitions.
    end note
    ]
    The program is ill-formed if any instantiation fails.
  9. 9.
    All external entity references are resolved.
    Library components are linked to satisfy external references to entities not defined in the current translation.
    All such translator output is collected into a program image which contains information needed for execution in its execution environment.
Implementations must behave as if these separate phases occur, although in practice different phases might be folded together.
A partial preprocessing token would arise from a source file ending in the first portion of a multi-character token that requires a terminating sequence of characters, such as a header-name that is missing the closing " or >.
A partial comment would arise from a source file ending with an unclosed /* comment.
An implementation need not convert all non-corresponding source characters to the same execution character.

5.3 Character sets [lex.charset]

The basic source character set consists of 96 characters: the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:15
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " '
The universal-character-name construct provides a way to name other characters.
hex-quad:
	hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit
universal-character-name:
	\u hex-quad
	\U hex-quad hex-quad
The character designated by the universal-character-name \UNNNNNNNN is that character whose character short name in ISO/IEC 10646 is NNNNNNNN; the character designated by the universal-character-name \uNNNN is that character whose character short name in ISO/IEC 10646 is 0000NNNN.
If the hexadecimal value for a universal-character-name corresponds to a surrogate code point (in the range 0xD800–0xDFFF, inclusive), the program is ill-formed.
Additionally, if the hexadecimal value for a universal-character-name outside the c-char-sequence, s-char-sequence, or r-char-sequence of a character or string literal corresponds to a control character (in either of the ranges 0x00–0x1F or 0x7F–0x9F, both inclusive) or to a character in the basic source character set, the program is ill-formed.16
The basic execution character set and the basic execution wide-character set shall each contain all the members of the basic source character set, plus control characters representing alert, backspace, and carriage return, plus a null character (respectively, null wide character), whose value is 0.
For each basic execution character set, the values of the members shall be non-negative and distinct from one another.
In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous.
The execution character set and the execution wide-character set are implementation-defined supersets of the basic execution character set and the basic execution wide-character set, respectively.
The values of the members of the execution character sets and the sets of additional members are locale-specific.
The glyphs for the members of the basic source character set are intended to identify characters from the subset of ISO/IEC 10646 which corresponds to the ASCII character set.
However, because the mapping from source file characters to the source character set (described in translation phase 1) is specified as implementation-defined, an implementation is required to document how the basic source characters are represented in source files.
A sequence of characters resembling a universal-character-name in an r-char-sequence does not form a universal-character-name.

5.4 Preprocessing tokens [lex.pptoken]

preprocessing-token:
	header-name
	identifier
	pp-number
	character-literal
	user-defined-character-literal
	string-literal
	user-defined-string-literal
	preprocessing-op-or-punc
	each non-white-space character that cannot be one of the above
Each preprocessing token that is converted to a token shall have the lexical form of a keyword, an identifier, a literal, an operator, or a punctuator.
A preprocessing token is the minimal lexical element of the language in translation phases 3 through 6.
The categories of preprocessing token are: header names, identifiers, preprocessing numbers, character literals (including user-defined character literals), string literals (including user-defined string literals), preprocessing operators and punctuators, and single non-white-space characters that do not lexically match the other preprocessing token categories.
If a ' or a " character matches the last category, the behavior is undefined.
Preprocessing tokens can be separated by white space; this consists of comments, or white-space characters (space, horizontal tab, new-line, vertical tab, and form-feed), or both.
As described in [cpp], in certain circumstances during translation phase 4, white space (or the absence thereof) serves as more than preprocessing token separation.
White space can appear within a preprocessing token only as part of a header name or between the quotation characters in a character literal or string literal.
If the input stream has been parsed into preprocessing tokens up to a given character:
  • If the next character begins a sequence of characters that could be the prefix and initial double quote of a raw string literal, such as R", the next preprocessing token shall be a raw string literal.
    Between the initial and final double quote characters of the raw string, any transformations performed in phases 1 and 2 (universal-character-names and line splicing) are reverted; this reversion shall apply before any d-char, r-char, or delimiting parenthesis is identified.
    The raw string literal is defined as the shortest sequence of characters that matches the raw-string pattern
    encoding-prefix R raw-string
    
  • Otherwise, if the next three characters are <​::​ and the subsequent character is neither : nor >, the < is treated as a preprocessing token by itself and not as the first character of the alternative token <:.
  • Otherwise, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token, even if that would cause further lexical analysis to fail, except that a header-name is only formed within a #include directive.
[Example
:
#define R "x"
const char* s = R"y";           // ill-formed raw string, not "x" "y"
end example
]
[Example
:
The program fragment 0xe+foo is parsed as a preprocessing number token (one that is not a valid floating or integer literal token), even though a parse as three preprocessing tokens 0xe, +, and foo might produce a valid expression (for example, if foo were a macro defined as 1).
Similarly, the program fragment 1E1 is parsed as a preprocessing number (one that is a valid floating literal token), whether or not E is a macro name.
end example
]
[Example
:
The program fragment x+++++y is parsed as x ++ ++ + y, which, if x and y have integral types, violates a constraint on increment operators, even though the parse x ++ + ++ y might yield a correct expression.
end example
]

5.5 Alternative tokens [lex.digraph]

Alternative token representations are provided for some operators and punctuators.17
In all respects of the language, each alternative token behaves the same, respectively, as its primary token, except for its spelling.18
The set of alternative tokens is defined in Table 1.
Table 1 — Alternative tokens
Alternative
Primary
Alternative
Primary
Alternative
Primary
<%
{
and
&&
and_­eq
&=
%>
}
bitor
|
or_­eq
|=
<:
[
or
||
xor_­eq
^=
:>
]
xor
^
not
!
%:
#
compl
~
not_­eq
!=
%:%:
##
bitand
&
These include “digraphs” and additional reserved words.
The term “digraph” (token consisting of two characters) is not perfectly descriptive, since one of the alternative preprocessing-tokens is %:%: and of course several primary tokens contain two characters.
Nonetheless, those alternative tokens that aren't lexical keywords are colloquially known as “digraphs”.
Thus the “stringized” values of [ and <: will be different, maintaining the source spelling, but the tokens can otherwise be freely interchanged.

5.6 Tokens [lex.token]

token:
	identifier
	keyword
	literal
	operator
	punctuator
There are five kinds of tokens: identifiers, keywords, literals,19 operators, and other separators.
Blanks, horizontal and vertical tabs, newlines, formfeeds, and comments (collectively, “white space”), as described below, are ignored except as they serve to separate tokens.
[Note
:
Some white space is required to separate otherwise adjacent identifiers, keywords, numeric literals, and alternative tokens containing alphabetic characters.
end note
]
Literals include strings and character and numeric literals.

5.7 Comments [lex.comment]

The characters /* start a comment, which terminates with the characters */.
These comments do not nest.
The characters // start a comment, which terminates immediately before the next new-line character.
If there is a form-feed or a vertical-tab character in such a comment, only white-space characters shall appear between it and the new-line that terminates the comment; no diagnostic is required.
[Note
:
The comment characters //, /*, and */ have no special meaning within a // comment and are treated just like other characters.
Similarly, the comment characters // and /* have no special meaning within a /* comment.
end note
]

5.9 Preprocessing numbers [lex.ppnumber]

pp-number:
	digit
	. digit
	pp-number digit
	pp-number identifier-nondigit
	pp-number ' digit
	pp-number ' nondigit
	pp-number e sign
	pp-number E sign
	pp-number p sign
	pp-number P sign
	pp-number .
Preprocessing number tokens lexically include all integer literal tokens and all floating literal tokens.
A preprocessing number does not have a type or a value; it acquires both after a successful conversion to an integer literal token or a floating literal token.

5.10 Identifiers [lex.name]

identifier:
	identifier-nondigit
	identifier identifier-nondigit
	identifier digit
identifier-nondigit:
	nondigit
	universal-character-name
nondigit: one of
	a b c d e f g h i j k l m
	n o p q r s t u v w x y z
	A B C D E F G H I J K L M
	N O P Q R S T U V W X Y Z _
digit: one of
	0 1 2 3 4 5 6 7 8 9
An identifier is an arbitrarily long sequence of letters and digits.
Each universal-character-name in an identifier shall designate a character whose encoding in ISO 10646 falls into one of the ranges specified in Table 2.
The initial element shall not be a universal-character-name designating a character whose encoding falls into one of the ranges specified in Table 3.
Upper- and lower-case letters are different.
All characters are significant.21
Table 2 — Ranges of characters allowed
00A8
00AA
00AD
00AF
00B2-00B5
00B7-00BA
00BC-00BE
00C0-00D6
00D8-00F6
00F8-00FF
0100-167F
1681-180D
180F-1FFF
200B-200D
202A-202E
203F-2040
2054
2060-206F
2070-218F
2460-24FF
2776-2793
2C00-2DFF
2E80-2FFF
3004-3007
3021-302F
3031-D7FF
F900-FD3D
FD40-FDCF
FDF0-FE44
FE47-FFFD
10000-1FFFD
20000-2FFFD
30000-3FFFD
40000-4FFFD
50000-5FFFD
60000-6FFFD
70000-7FFFD
80000-8FFFD
90000-9FFFD
A0000-AFFFD
B0000-BFFFD
C0000-CFFFD
D0000-DFFFD
E0000-EFFFD
Table 3 — Ranges of characters disallowed initially (combining characters)
0300-036F
1DC0-1DFF
20D0-20FF
FE20-FE2F
The identifiers in Table 4 have a special meaning when appearing in a certain context.
When referred to in the grammar, these identifiers are used explicitly rather than using the identifier grammar production.
Unless otherwise specified, any ambiguity as to whether a given identifier has a special meaning is resolved to interpret the token as a regular identifier.
Table 4 — Identifiers with special meaning
override
final
In addition, some identifiers are reserved for use by C++ implementations and shall not be used otherwise; no diagnostic is required.
  • Each identifier that contains a double underscore __ or begins with an underscore followed by an uppercase letter is reserved to the implementation for any use.
  • Each identifier that begins with an underscore is reserved to the implementation for use as a name in the global namespace.
On systems in which linkers cannot accept extended characters, an encoding of the universal-character-name may be used in forming valid external identifiers.
For example, some otherwise unused character or sequence of characters may be used to encode the \u in a universal-character-name.
Extended characters may produce a long external identifier, but C++ does not place a translation limit on significant characters for external identifiers.
In C++, upper- and lower-case letters are considered different for all identifiers, including external identifiers.

5.11 Keywords [lex.key]

The identifiers shown in Table 5 are reserved for use as keywords (that is, they are unconditionally treated as keywords in phase 7) except in an attribute-token:
Table 5 — Keywords
alignas
const_­cast
for
public
thread_­local
alignof
continue
friend
register
throw
asm
decltype
goto
reinterpret_­cast
true
auto
default
if
requires
try
bool
delete
inline
return
typedef
break
do
int
short
typeid
case
double
long
signed
typename
catch
dynamic_­cast
mutable
sizeof
union
char
else
namespace
static
unsigned
char16_­t
enum
new
static_­assert
using
char32_­t
explicit
noexcept
static_­cast
virtual
class
export
nullptr
struct
void
concept
extern
operator
switch
volatile
const
false
private
template
wchar_­t
constexpr
float
protected
this
while
[Note
:
The export and register keywords are unused but are reserved for future use.
end note
]
Furthermore, the alternative representations shown in Table 6 for certain operators and punctuators ([lex.digraph]) are reserved and shall not be used otherwise:
Table 6 — Alternative representations
and
and_­eq
bitand
bitor
compl
not
not_­eq
or
or_­eq
xor
xor_­eq

5.12 Operators and punctuators [lex.operators]

The lexical representation of C++ programs includes a number of preprocessing tokens which are used in the syntax of the preprocessor or are converted into tokens for operators and punctuators:
preprocessing-op-or-punc: one of
	{ 	} 	[ 	] 	# 	## 	( 	)
	<: 	:> 	<% 	%> 	%: 	%:%: 	; 	: 	...
	new 	delete 	? 	:: 	. 	.*
	+ 	- 	* 	/ 	% 	^	& 	| 	~
	! 	= 	< 	> 	+= 	-= 	*= 	/= 	%=
	^= 	&= 	|= 	<< 	>> 	>>= 	<<= 	== 	!=
	<= 	>= 	&& 	|| 	++ 	-- 	, 	->* 	->
	and 	and_eq 	bitand 	bitor 	compl 	not 	not_eq
	or 	or_eq 	xor 	xor_eq
Each preprocessing-op-or-punc is converted to a single token in translation phase 7.

5.13 Literals [lex.literal]

5.13.1 Kinds of literals [lex.literal.kinds]

The term “literal” generally designates, in this International Standard, those tokens that are called “constants” in ISO C.

5.13.2 Integer literals [lex.icon]

integer-literal:
	binary-literal integer-suffix
	octal-literal integer-suffix
	decimal-literal integer-suffix
	hexadecimal-literal integer-suffix
binary-literal:
	0b binary-digit
	0B binary-digit
	binary-literal ' binary-digit
octal-literal:
	0
	octal-literal ' octal-digit
decimal-literal:
	nonzero-digit
	decimal-literal ' digit
hexadecimal-literal:
	hexadecimal-prefix hexadecimal-digit-sequence
binary-digit:
	0
	1
octal-digit: one of
	0  1  2  3  4  5  6  7
nonzero-digit: one of
	1  2  3  4  5  6  7  8  9
hexadecimal-prefix: one of
	0x  0X
hexadecimal-digit-sequence:
	hexadecimal-digit
	hexadecimal-digit-sequence ' hexadecimal-digit
hexadecimal-digit: one of
	0  1  2  3  4  5  6  7  8  9
	a  b  c  d  e  f
	A  B  C  D  E  F
integer-suffix:
	unsigned-suffix long-suffix 
	unsigned-suffix long-long-suffix 
	long-suffix unsigned-suffix 
	long-long-suffix unsigned-suffix
unsigned-suffix: one of
	u  U
long-suffix: one of
	l  L
long-long-suffix: one of
	ll  LL
An integer literal is a sequence of digits that has no period or exponent part, with optional separating single quotes that are ignored when determining its value.
An integer literal may have a prefix that specifies its base and a suffix that specifies its type.
The lexically first digit of the sequence of digits is the most significant.
A binary integer literal (base two) begins with 0b or 0B and consists of a sequence of binary digits.
An octal integer literal (base eight) begins with the digit 0 and consists of a sequence of octal digits.23
A decimal integer literal (base ten) begins with a digit other than 0 and consists of a sequence of decimal digits.
A hexadecimal integer literal (base sixteen) begins with 0x or 0X and consists of a sequence of hexadecimal digits, which include the decimal digits and the letters a through f and A through F with decimal values ten through fifteen.
[Example
:
The number twelve can be written 12, 014, 0XC, or 0b1100.
The integer literals 1048576, 1'048'576, 0X100000, 0x10'0000, and 0'004'000'000 all have the same value.
end example
]
The type of an integer literal is the first of the corresponding list in Table 7 in which its value can be represented.
Table 7 — Types of integer literals
Suffix
Decimal literal
Binary, octal, or hexadecimal literal
none
int
int
long int
unsigned int
long long int
long int
unsigned long int
long long int
unsigned long long int
u or U
unsigned int
unsigned int
unsigned long int
unsigned long int
unsigned long long int
unsigned long long int
l or L
long int
long int
long long int
unsigned long int
long long int
unsigned long long int
Both u or U
unsigned long int
unsigned long int
and l or L
unsigned long long int
unsigned long long int
ll or LL
long long int
long long int
unsigned long long int
Both u or U
unsigned long long int
unsigned long long int
and ll or LL
If an integer literal cannot be represented by any type in its list and an extended integer type can represent its value, it may have that extended integer type.
If all of the types in the list for the integer literal are signed, the extended integer type shall be signed.
If all of the types in the list for the integer literal are unsigned, the extended integer type shall be unsigned.
If the list contains both signed and unsigned types, the extended integer type may be signed or unsigned.
A program is ill-formed if one of its translation units contains an integer literal that cannot be represented by any of the allowed types.
The digits 8 and 9 are not octal digits.

5.13.3 Character literals [lex.ccon]

character-literal:
	encoding-prefix ' c-char-sequence '
encoding-prefix: one of
	u8  u  U  L
c-char-sequence:
	c-char
	c-char-sequence c-char
c-char:
	any member of the source character set except
		the single-quote ', backslash \, or new-line character
	escape-sequence
	universal-character-name
escape-sequence:
	simple-escape-sequence
	octal-escape-sequence
	hexadecimal-escape-sequence
simple-escape-sequence: one of
	\'  \"  \?  \\
	\a  \b  \f  \n  \r  \t  \v
octal-escape-sequence:
	\ octal-digit
	\ octal-digit octal-digit
	\ octal-digit octal-digit octal-digit
hexadecimal-escape-sequence:
	\x hexadecimal-digit
	hexadecimal-escape-sequence hexadecimal-digit
A character literal is one or more characters enclosed in single quotes, as in 'x', optionally preceded by u8, u, U, or L, as in u8'w', u'x', U'y', or L'z', respectively.
A character literal that does not begin with u8, u, U, or L is an ordinary character literal.
An ordinary character literal that contains a single c-char representable in the execution character set has type char, with value equal to the numerical value of the encoding of the c-char in the execution character set.
An ordinary character literal that contains more than one c-char is a multicharacter literal.
A multicharacter literal, or an ordinary character literal containing a single c-char not representable in the execution character set, is conditionally-supported, has type int, and has an implementation-defined value.
A character literal that begins with u8, such as u8'w', is a character literal of type char, known as a UTF-8 character literal.
The value of a UTF-8 character literal is equal to its ISO 10646 code point value, provided that the code point value is representable with a single UTF-8 code unit (that is, provided it is in the C0 Controls and Basic Latin Unicode block).
If the value is not representable with a single UTF-8 code unit, the program is ill-formed.
A UTF-8 character literal containing multiple c-chars is ill-formed.
A character literal that begins with the letter u, such as u'x', is a character literal of type char16_­t.
The value of a char16_­t character literal containing a single c-char is equal to its ISO 10646 code point value, provided that the code point is representable with a single 16-bit code unit.
(That is, provided it is a basic multi-lingual plane code point.)
If the value is not representable within 16 bits, the program is ill-formed.
A char16_­t character literal containing multiple c-chars is ill-formed.
A character literal that begins with the letter U, such as U'y', is a character literal of type char32_­t.
The value of a char32_­t character literal containing a single c-char is equal to its ISO 10646 code point value.
A char32_­t character literal containing multiple c-chars is ill-formed.
A character literal that begins with the letter L, such as L'z', is a wide-character literal.
A wide-character literal has type wchar_­t.24
The value of a wide-character literal containing a single c-char has value equal to the numerical value of the encoding of the c-char in the execution wide-character set, unless the c-char has no representation in the execution wide-character set, in which case the value is implementation-defined.
[Note
:
The type wchar_­t is able to represent all members of the execution wide-character set (see [basic.fundamental]).
end note
]
The value of a wide-character literal containing multiple c-chars is implementation-defined.
Certain non-graphic characters, the single quote ', the double quote ", the question mark ?,25 and the backslash \, can be represented according to Table 8.
The double quote " and the question mark ?, can be represented as themselves or by the escape sequences \" and \? respectively, but the single quote ' and the backslash \ shall be represented by the escape sequences \' and \\ respectively.
Escape sequences in which the character following the backslash is not listed in Table 8 are conditionally-supported, with implementation-defined semantics.
An escape sequence specifies a single character.
Table 8 — Escape sequences
new-line
NL(LF)
\n
horizontal tab
HT
\t
vertical tab
VT
\v
backspace
BS
\b
carriage return
CR
\r
form feed
FF
\f
alert
BEL
\a
backslash
\
\\
question mark
?
\?
single quote
'
\'
double quote
"
\"
octal number
ooo
\ooo
hex number
hhh
\xhhh
The escape \ooo consists of the backslash followed by one, two, or three octal digits that are taken to specify the value of the desired character.
The escape \xhhh consists of the backslash followed by x followed by one or more hexadecimal digits that are taken to specify the value of the desired character.
There is no limit to the number of digits in a hexadecimal sequence.
A sequence of octal or hexadecimal digits is terminated by the first character that is not an octal digit or a hexadecimal digit, respectively.
The value of a character literal is implementation-defined if it falls outside of the implementation-defined range defined for char (for character literals with no prefix) or wchar_­t (for character literals prefixed by L).
[Note
:
If the value of a character literal prefixed by u, u8, or U is outside the range defined for its type, the program is ill-formed.
end note
]
A universal-character-name is translated to the encoding, in the appropriate execution character set, of the character named.
If there is no such encoding, the universal-character-name is translated to an implementation-defined encoding.
[Note
:
In translation phase 1, a universal-character-name is introduced whenever an actual extended character is encountered in the source text.
Therefore, all extended characters are described in terms of universal-character-names.
However, the actual compiler implementation may use its own native character set, so long as the same results are obtained.
end note
]
They are intended for character sets where a character does not fit into a single byte.
Using an escape sequence for a question mark is supported for compatibility with ISO C++ 2014 and ISO C.

5.13.4 Floating literals [lex.fcon]

floating-literal:
	decimal-floating-literal
	hexadecimal-floating-literal
decimal-floating-literal:
	fractional-constant exponent-part floating-suffix
	digit-sequence exponent-part floating-suffix
hexadecimal-floating-literal:
	hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part floating-suffix
	hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part floating-suffix
fractional-constant:
	digit-sequence . digit-sequence
	digit-sequence .
hexadecimal-fractional-constant:
	hexadecimal-digit-sequence . hexadecimal-digit-sequence
	hexadecimal-digit-sequence .
exponent-part:
	e sign digit-sequence
	E sign digit-sequence
binary-exponent-part:
	p sign digit-sequence
	P sign digit-sequence
sign: one of
	+  -
digit-sequence:
	digit
	digit-sequence ' digit
floating-suffix: one of
	f  l  F  L
A floating literal consists of an optional prefix specifying a base, an integer part, a radix point, a fraction part, an e, E, p or P, an optionally signed integer exponent, and an optional type suffix.
The integer and fraction parts both consist of a sequence of decimal (base ten) digits if there is no prefix, or hexadecimal (base sixteen) digits if the prefix is 0x or 0X.
The floating literal is a decimal floating literal in the former case and a hexadecimal floating literal in the latter case.
Optional separating single quotes in a digit-sequence or hexadecimal-digit-sequence are ignored when determining its value.
[Example
:
The floating literals 1.602'176'565e-19 and 1.602176565e-19 have the same value.
end example
]
Either the integer part or the fraction part (not both) can be omitted.
Either the radix point or the letter e or E and the exponent (not both) can be omitted from a decimal floating literal.
The radix point (but not the exponent) can be omitted from a hexadecimal floating literal.
The integer part, the optional radix point, and the optional fraction part, form the significand of the floating literal.
In a decimal floating literal, the exponent, if present, indicates the power of 10 by which the significand is to be scaled.
In a hexadecimal floating literal, the exponent indicates the power of 2 by which the significand is to be scaled.
[Example
:
The floating literals 49.625 and 0xC.68p+2 have the same value.
end example
]
If the scaled value is in the range of representable values for its type, the result is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.
The type of a floating literal is double unless explicitly specified by a suffix.
The suffixes f and F specify float, the suffixes l and L specify long double.
If the scaled value is not in the range of representable values for its type, the program is ill-formed.

5.13.5 String literals [lex.string]

string-literal:
	encoding-prefix " s-char-sequence "
	encoding-prefix R raw-string
s-char-sequence:
	s-char
	s-char-sequence s-char
s-char:
	any member of the source character set except
		the double-quote ", backslash \, or new-line character
	escape-sequence
	universal-character-name
raw-string:
	" d-char-sequence ( r-char-sequence ) d-char-sequence "
r-char-sequence:
	r-char
	r-char-sequence r-char
r-char:
	any member of the source character set, except
		a right parenthesis ) followed by the initial d-char-sequence
		(which may be empty) followed by a double quote ".
d-char-sequence:
	d-char
	d-char-sequence d-char
d-char:
	any member of the basic source character set except:
		space, the left parenthesis (, the right parenthesis ), the backslash \,
		and the control characters representing horizontal tab,
		vertical tab, form feed, and newline.
A string-literal is a sequence of characters (as defined in [lex.ccon]) surrounded by double quotes, optionally prefixed by R, u8, u8R, u, uR, U, UR, L, or LR, as in "...", R"(...)", u8"...", u8R"**(...)**", u"...", uR"*~(...)*~", U"...", UR"zzz(...)zzz", L"...", or LR"(...)", respectively.
A string-literal that has an R in the prefix is a raw string literal.
The d-char-sequence serves as a delimiter.
The terminating d-char-sequence of a raw-string is the same sequence of characters as the initial d-char-sequence.
A d-char-sequence shall consist of at most 16 characters.
[Note
:
The characters '(' and ')' are permitted in a raw-string.
Thus, R"delimiter((a|b))delimiter" is equivalent to "(a|b)".
end note
]
[Note
:
A source-file new-line in a raw string literal results in a new-line in the resulting execution string literal.
Assuming no whitespace at the beginning of lines in the following example, the assert will succeed:
const char* p = R"(a\
b
c)";
assert(std::strcmp(p, "a\\\nb\nc") == 0);
end note
]
[Example
:
The raw string
R"a(
)\
a"
)a"
is equivalent to "\n)\\\na\"\n".
The raw string
R"(x = "\"y\"")"
is equivalent to "x = \"\\\"y\\\"\"".
end example
]
After translation phase 6, a string-literal that does not begin with an encoding-prefix is an ordinary string literal, and is initialized with the given characters.
A string-literal that begins with u8, such as u8"asdf", is a UTF-8 string literal.
Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals.
A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration.
For a UTF-8 string literal, each successive element of the object representation has the value of the corresponding code unit of the UTF-8 encoding of the string.
A string-literal that begins with u, such as u"asdf", is a char16_­t string literal.
A char16_­t string literal has type “array of n const char16_­t”, where n is the size of the string as defined below; it is initialized with the given characters.
A single c-char may produce more than one char16_­t character in the form of surrogate pairs.
A string-literal that begins with U, such as U"asdf", is a char32_­t string literal.
A char32_­t string literal has type “array of n const char32_­t”, where n is the size of the string as defined below; it is initialized with the given characters.
A string-literal that begins with L, such as L"asdf", is a wide string literal.
A wide string literal has type “array of n const wchar_­t”, where n is the size of the string as defined below; it is initialized with the given characters.
In translation phase 6, adjacent string-literals are concatenated.
If both string-literals have the same encoding-prefix, the resulting concatenated string literal has that encoding-prefix.
If one string-literal has no encoding-prefix, it is treated as a string-literal of the same encoding-prefix as the other operand.
If a UTF-8 string literal token is adjacent to a wide string literal token, the program is ill-formed.
Any other concatenations are conditionally-supported with implementation-defined behavior.
[Note
:
This concatenation is an interpretation, not a conversion.
Because the interpretation happens in translation phase 6 (after each character from a string literal has been translated into a value from the appropriate character set), a string-literal's initial rawness has no effect on the interpretation or well-formedness of the concatenation.
end note
]
Table 9 has some examples of valid concatenations.
Table 9 — String literal concatenations
Source
Means
Source
Means
Source
Means
u"a"
u"b"
u"ab"
U"a"
U"b"
U"ab"
L"a"
L"b"
L"ab"
u"a"
"b"
u"ab"
U"a"
"b"
U"ab"
L"a"
"b"
L"ab"
"a"
u"b"
u"ab"
"a"
U"b"
U"ab"
"a"
L"b"
L"ab"
Characters in concatenated strings are kept distinct.
[Example
:
"\xA" "B"
contains the two characters '\xA' and 'B' after concatenation (and not the single hexadecimal character '\xAB').
end example
]
After any necessary concatenation, in translation phase 7, '\0' is appended to every string literal so that programs that scan a string can find its end.
Escape sequences and universal-character-names in non-raw string literals have the same meaning as in character literals, except that the single quote ' is representable either by itself or by the escape sequence \', and the double quote " shall be preceded by a \, and except that a universal-character-name in a char16_­t string literal may yield a surrogate pair.
In a narrow string literal, a universal-character-name may map to more than one char element due to multibyte encoding.
The size of a char32_­t or wide string literal is the total number of escape sequences, universal-character-names, and other characters, plus one for the terminating U'\0' or L'\0'.
The size of a char16_­t string literal is the total number of escape sequences, universal-character-names, and other characters, plus one for each character requiring a surrogate pair, plus one for the terminating u'\0'.
[Note
:
The size of a char16_­t string literal is the number of code units, not the number of characters.
end note
]
Within char32_­t and char16_­t string literals, any universal-character-names shall be within the range 0x0 to 0x10FFFF.
The size of a narrow string literal is the total number of escape sequences and other characters, plus at least one for the multibyte encoding of each universal-character-name, plus one for the terminating '\0'.
Evaluating a string-literal results in a string literal object with static storage duration, initialized from the given characters as specified above.
Whether all string literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.
[Note
:
The effect of attempting to modify a string literal is undefined.
end note
]

5.13.6 Boolean literals [lex.bool]

boolean-literal:
	false
	true
The Boolean literals are the keywords false and true.
Such literals are prvalues and have type bool.

5.13.7 Pointer literals [lex.nullptr]

pointer-literal:
	nullptr
The pointer literal is the keyword nullptr.
It is a prvalue of type std​::​nullptr_­t.
[Note
:
std​::​nullptr_­t is a distinct type that is neither a pointer type nor a pointer to member type; rather, a prvalue of this type is a null pointer constant and can be converted to a null pointer value or null member pointer value.
end note
]

5.13.8 User-defined literals [lex.ext]

user-defined-literal:
	user-defined-integer-literal
	user-defined-floating-literal
	user-defined-string-literal
	user-defined-character-literal
user-defined-integer-literal:
	decimal-literal ud-suffix
	octal-literal ud-suffix
	hexadecimal-literal ud-suffix
	binary-literal ud-suffix
user-defined-floating-literal:
	fractional-constant exponent-part ud-suffix
	digit-sequence exponent-part ud-suffix
	hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part ud-suffix
	hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part ud-suffix
user-defined-string-literal:
	string-literal ud-suffix
user-defined-character-literal:
	character-literal ud-suffix
ud-suffix:
	identifier
If a token matches both user-defined-literal and another literal kind, it is treated as the latter.
[Example
:
123_­km is a user-defined-literal, but 12LL is an integer-literal.
end example
]
The syntactic non-terminal preceding the ud-suffix in a user-defined-literal is taken to be the longest sequence of characters that could match that non-terminal.
To determine the form of this call for a given user-defined-literal L with ud-suffix X, the literal-operator-id whose literal suffix identifier is X is looked up in the context of L using the rules for unqualified name lookup.
Let S be the set of declarations found by this lookup.
S shall not be empty.
If L is a user-defined-integer-literal, let n be the literal without its ud-suffix.
If S contains a literal operator with parameter type unsigned long long, the literal L is treated as a call of the form
operator "" X(nULL)
Otherwise, S shall contain a raw literal operator or a literal operator template but not both.
If S contains a raw literal operator, the literal L is treated as a call of the form
operator "" X("n")
Otherwise (S contains a literal operator template), L is treated as a call of the form
operator "" X<'', '', ... ''>()
where n is the source character sequence .
[Note
:
The sequence can only contain characters from the basic source character set.
end note
]
If L is a user-defined-floating-literal, let f be the literal without its ud-suffix.
If S contains a literal operator with parameter type long double, the literal L is treated as a call of the form
operator "" X(fL)
Otherwise, S shall contain a raw literal operator or a literal operator template but not both.
If S contains a raw literal operator, the literal L is treated as a call of the form
operator "" X("f")
Otherwise (S contains a literal operator template), L is treated as a call of the form
operator "" X<'', '', ... ''>()
where f is the source character sequence .
[Note
:
The sequence can only contain characters from the basic source character set.
end note
]
If L is a user-defined-string-literal, let str be the literal without its ud-suffix and let len be the number of code units in str (i.e., its length excluding the terminating null character). The literal L is treated as a call of the form
operator "" X(str, len)
If L is a user-defined-character-literal, let ch be the literal without its ud-suffix.
S shall contain a literal operator whose only parameter has the type of ch and the literal L is treated as a call of the form
operator "" X(ch)
[Example
:
long double operator "" _w(long double);
std::string operator "" _w(const char16_t*, std::size_t);
unsigned operator "" _w(const char*);
int main() {
  1.2_w;      // calls operator "" _­w(1.2L)
  u"one"_w;   // calls operator "" _­w(u"one", 3)
  12_w;       // calls operator "" _­w("12")
  "two"_w;    // error: no applicable literal operator
}
end example
]
In translation phase 6, adjacent string literals are concatenated and user-defined-string-literals are considered string literals for that purpose.
During concatenation, ud-suffixes are removed and ignored and the concatenation process occurs as described in [lex.string].
At the end of phase 6, if a string literal is the result of a concatenation involving at least one user-defined-string-literal, all the participating user-defined-string-literals shall have the same ud-suffix and that suffix is applied to the result of the concatenation.
[Example
:
int main() {
  L"A" "B" "C"_x; // OK: same as L"ABC"_­x
  "P"_x "Q" "R"_y;// error: two different ud-suffixes
}
end example
]