Annex E (informative) Conformance with UAX #31 [uaxid]

E.1 General [uaxid.general]

This Annex describes the choices made in application of UAX #31 (“Unicode Identifier and Pattern Syntax”) to C++ in terms of the requirements from UAX #31 and how they do or do not apply to this document.

In terms of UAX #31, this document conforms by meeting the requirements R1 “Default Identifiers” and R4 “Equivalent Normalized Identifiers” from UAX #31.

The other requirements from UAX #31, also listed below, are either alternatives not taken or do not apply to this document.

E.2 R1 Default identifiers [uaxid.def]

E.2.1 General [uaxid.def.general]

UAX #31 specifies a default syntax for identifiers based on properties from the Unicode Character Database, UAX #44.

The general syntax is

<Identifier> := <Start> <Continue>* (<Medial> <Continue>+)*

where <Start> has the XID_Start property, <Continue> has the XID_Continue property, and <Medial> is a list of characters permitted between continue characters.

For C++ we add the character U+005f low line, or _, to the set of permitted <Start> characters, the <Medial> set is empty, and the <Continue> characters are unmodified.

In the grammar used in UAX #31, this is

<Identifier> := <Start> <Continue>*
<Start> := XID_Start + U+005f
<Continue> := <Start> + XID_Continue

This is described in the C++ grammar in [lex.name], where identifier is formed from identifier-start or identifier followed by identifier-continue.

E.2.2 R1b Stable identifiers [uaxid.def.stable]

An implementation of UAX #31 may choose to guarantee that identifiers are stable across versions of the Unicode Standard.

Once a string qualifies as an identifier it does so in all future versions.

C++ does not make this guarantee, except to the extent that UAX #31 guarantees the stability of the XID_Start and XID_Continue properties.

E.3 R2 Immutable identifiers [uaxid.immutable]

An implementation may choose to guarantee that the set of identifiers will never change by fixing the set of code points allowed in identifiers forever.

C++ does not choose to make this guarantee.

As scripts are added to Unicode, additional characters in those scripts may become available for use in identifiers.

E.4 R3 Pattern_White_Space and Pattern_Syntax characters [uaxid.pattern]

UAX #31 describes how formal languages such as computer languages should describe and implement their use of whitespace and syntactically significant characters during the processes of lexing and parsing.

This document does not claim conformance with this requirement from UAX #31.

E.5 R4 Equivalent normalized identifiers [uaxid.eqn]

UAX #31 requires that implementations describe how identifiers are compared and considered equivalent.

This document requires that identifiers be in Normalization Form C and therefore identifiers that compare the same under NFC are equivalent.

This is described in [lex.name].

E.6 R5 Equivalent case-insensitive identifiers [uaxid.eqci]

This document considers case to be significant in identifier comparison, and does not do any case folding.

This requirement from UAX #31 does not apply to this document.

E.7 R6 Filtered normalized identifiers [uaxid.filter]

If any characters are excluded from normalization, UAX #31 requires a precise specification of those exclusions.

This document does not make any such exclusions.

E.8 R7 Filtered case-insensitive identifiers [uaxid.filterci]

C++ identifiers are case sensitive, and therefore this requirement from UAX #31 does not apply.

E.9 R8 Hashtag identifiers [uaxid.hashtag]

There are no hashtags in C++, so this requirement from UAX #31 does not apply.