5 Lexical conventions [lex]

5.3 Character sets [lex.charset]

The basic source character set consists of 96 characters: the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:9
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " '
The universal-character-name construct provides a way to name other characters.
hex-quad:
	hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit
universal-character-name:
	\u hex-quad
	\U hex-quad hex-quad
The character designated by the universal-character-name \ U00NNNNNN is that character that has U+NNNNNN as a code point short identifier; the character designated by the universal-character-name \uNNNN is that character that has U+NNNN as a code point short identifier.
If a universal-character-name does not correspond to a code point in ISO/IEC 10646 or if a universal-character-name corresponds to a surrogate code point, the program is ill-formed.
Additionally, if a universal-character-name outside the c-char-sequence, s-char-sequence, or r-char-sequence of a character or string literal corresponds to a control character or to a character in the basic source character set, the program is ill-formed.10
[Note
:
ISO/IEC 10646 code points are within the range 0x0-0x10FFFF (inclusive).
A surrogate code point is a value in the range 0xD800-0xDFFF (inclusive).
A control character is a character whose code point is in either of the ranges 0x0-0x1F or 0x7F-0x9F (both inclusive).
end note
]
The basic execution character set and the basic execution wide-character set shall each contain all the members of the basic source character set, plus control characters representing alert, backspace, and carriage return, plus a null character (respectively, null wide character), whose value is 0.
For each basic execution character set, the values of the members shall be non-negative and distinct from one another.
In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous.
The execution character set and the execution wide-character set are implementation-defined supersets of the basic execution character set and the basic execution wide-character set, respectively.
The values of the members of the execution character sets and the sets of additional members are locale-specific.
The glyphs for the members of the basic source character set are intended to identify characters from the subset of ISO/IEC 10646 which corresponds to the ASCII character set.
However, because the mapping from source file characters to the source character set (described in translation phase 1) is specified as implementation-defined, an implementation is required to document how the basic source characters are represented in source files.
A sequence of characters resembling a universal-character-name in an r-char-sequence ([lex.string]) does not form a universal-character-name.