6 Basics [basic]

6.9 Program execution [basic.exec]

6.9.1 Sequential execution [intro.execution]

An instance of each object with automatic storage duration is associated with each entry into its block.
Such an object exists and retains its last-stored value during the execution of the block and while the block is suspended (by a call of a function, suspension of a coroutine ([expr.await]), or receipt of a signal).
A constituent expression is defined as follows:
[Example 1: struct A { int x; }; struct B { int y; struct A a; }; B b = { 5, { 1+1 } };
The constituent expressions of the initializer used for the initialization of b are 5 and 1+1.
— end example]
The immediate subexpressions of an expression E are
A subexpression of an expression E is an immediate subexpression of E or a subexpression of an immediate subexpression of E.
[Note 1: 
Expressions appearing in the compound-statement of a lambda-expression are not subexpressions of the lambda-expression.
— end note]
The potentially-evaluated subexpressions of an expression, conversion, or initializer E are
  • the constituent expressions of E and
  • the subexpressions thereof that are not subexpressions of a nested unevaluated operand ([expr.context]).
A full-expression is
If a language construct is defined to produce an implicit call of a function, a use of the language construct is considered to be an expression for the purposes of this definition.
Conversions applied to the result of an expression in order to satisfy the requirements of the language construct in which the expression appears are also considered to be part of the full-expression.
For an initializer, performing the initialization of the entity (including evaluating default member initializers of an aggregate) is also considered part of the full-expression.
[Example 2: struct S { S(int i): I(i) { } // full-expression is initialization of I int& v() { return I; } ~S() noexcept(false) { } private: int I; }; S s1(1); // full-expression comprises call of S​::​S(int) void f() { S s2 = 2; // full-expression comprises call of S​::​S(int) if (S(3).v()) // full-expression includes lvalue-to-rvalue and int to bool conversions, // performed before temporary is deleted at end of full-expression { } bool b = noexcept(S(4)); // exception specification of destructor of S considered for noexcept // full-expression is destruction of s2 at end of block } struct B { B(S = S(0)); }; B b[2] = { B(), B() }; // full-expression is the entire initialization // including the destruction of temporaries — end example]
[Note 2: 
The evaluation of a full-expression can include the evaluation of subexpressions that are not lexically part of the full-expression.
For example, subexpressions involved in evaluating default arguments ([dcl.fct.default]) are considered to be created in the expression that calls the function, not the expression that defines the default argument.
— end note]
Reading an object designated by a volatile glvalue ([basic.lval]), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.
Evaluation of an expression (or a subexpression) in general includes both value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and initiation of side effects.
When a call to a library I/O function returns or an access through a volatile glvalue is evaluated, the side effect is considered complete, even though some external actions implied by the call (such as the I/O itself) or by the volatile access may not have completed yet.
Sequenced before is an asymmetric, transitive, pair-wise relation between evaluations executed by a single thread ([intro.multithread]), which induces a partial order among those evaluations.
Given any two evaluations A and B, if A is sequenced before B (or, equivalently, B is sequenced after A), then the execution of A shall precede the execution of B.
If A is not sequenced before B and B is not sequenced before A, then A and B are unsequenced.
[Note 3: 
The execution of unsequenced evaluations can overlap.
— end note]
Evaluations A and B are indeterminately sequenced when either A is sequenced before B or B is sequenced before A, but it is unspecified which.
[Note 4: 
Indeterminately sequenced evaluations cannot overlap, but either can be executed first.
— end note]
An expression X is said to be sequenced before an expression Y if every value computation and every side effect associated with the expression X is sequenced before every value computation and every side effect associated with the expression Y.
Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.36
Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced.
[Note 5: 
In an expression that is evaluated more than once during the execution of a program, unsequenced and indeterminately sequenced evaluations of its subexpressions need not be performed consistently in different evaluations.
— end note]
The value computations of the operands of an operator are sequenced before the value computation of the result of the operator.
If a side effect on a memory location ([intro.memory]) is unsequenced relative to either another side effect on the same memory location or a value computation using the value of any object in the same memory location, and they are not potentially concurrent ([intro.multithread]), the behavior is undefined.
[Note 6: 
The next subclause imposes similar, but more complex restrictions on potentially concurrent computations.
— end note]
[Example 3: void g(int i) { i = 7, i++, i++; // i becomes 9 i = i++ + 1; // the value of i is incremented i = i++ + i; // undefined behavior i = i + 1; // the value of i is incremented } — end example]
When invoking a function (whether or not the function is inline), every argument expression and the postfix expression designating the called function are sequenced before every expression or statement in the body of the called function.
For each function invocation or evaluation of an await-expression F, each evaluation that does not occur within F but is evaluated on the same thread and as part of the same signal handler (if any) is either sequenced before all evaluations that occur within F or sequenced after all evaluations that occur within F;37 if F invokes or resumes a coroutine ([expr.await]), only evaluations subsequent to the previous suspension (if any) and prior to the next suspension (if any) are considered to occur within F.
Several contexts in C++ cause evaluation of a function call, even though no corresponding function call syntax appears in the translation unit.
[Example 4: 
Evaluation of a new-expression invokes one or more allocation and constructor functions; see [expr.new].
For another example, invocation of a conversion function ([class.conv.fct]) can arise in contexts in which no function call syntax appears.
— end example]
The sequencing constraints on the execution of the called function (as described above) are features of the function calls as evaluated, regardless of the syntax of the expression that calls the function.
If a signal handler is executed as a result of a call to the std​::​raise function, then the execution of the handler is sequenced after the invocation of the std​::​raise function and before its return.
[Note 7: 
When a signal is received for another reason, the execution of the signal handler is usually unsequenced with respect to the rest of the program.
— end note]
36)36)
As specified in [class.temporary], after a full-expression is evaluated, a sequence of zero or more invocations of destructor functions for temporary objects takes place, usually in reverse order of the construction of each temporary object.
37)37)
In other words, function executions do not interleave with each other.

6.9.2 Multi-threaded executions and data races [intro.multithread]

6.9.2.1 General [intro.multithread.general]

A thread of execution (also known as a thread) is a single flow of control within a program, including the initial invocation of a specific top-level function, and recursively including every function invocation subsequently executed by the thread.
[Note 1: 
When one thread creates another, the initial call to the top-level function of the new thread is executed by the new thread, not by the creating thread.
— end note]
Every thread in a program can potentially access every object and function in a program.38
Under a hosted implementation, a C++ program can have more than one thread running concurrently.
The execution of each thread proceeds as defined by the remainder of this document.
The execution of the entire program consists of an execution of all of its threads.
[Note 2: 
Usually the execution can be viewed as an interleaving of all its threads.
However, some kinds of atomic operations, for example, allow executions inconsistent with a simple interleaving, as described below.
— end note]
Under a freestanding implementation, it is implementation-defined whether a program can have more than one thread of execution.
For a signal handler that is not executed as a result of a call to the std​::​raise function, it is unspecified which thread of execution contains the signal handler invocation.
38)38)
An object with automatic or thread storage duration ([basic.stc]) is associated with one specific thread, and can be accessed by a different thread only indirectly through a pointer or reference ([basic.compound]).

6.9.2.2 Data races [intro.races]

The value of an object visible to a thread T at a particular point is the initial value of the object, a value assigned to the object by T, or a value assigned to the object by another thread, according to the rules below.
[Note 1: 
In some cases, there might instead be undefined behavior.
Much of this subclause is motivated by the desire to support atomic operations with explicit and detailed visibility constraints.
However, it also implicitly supports a simpler view for more restricted programs.
— end note]
Two expression evaluations conflict if one of them modifies ([defns.access]) a memory location ([intro.memory]) and the other one reads or modifies the same memory location.
[Note 2: 
A modification can still conflict even if it does not alter the value of any bits.
— end note]
The library defines a number of atomic operations ([atomics]) and operations on mutexes ([thread]) that are specially identified as synchronization operations.
These operations play a special role in making assignments in one thread visible to another.
A synchronization operation on one or more memory locations is either a consume operation, an acquire operation, a release operation, or both an acquire and release operation.
A synchronization operation without an associated memory location is a fence and can be either an acquire fence, a release fence, or both an acquire and release fence.
In addition, there are relaxed atomic operations, which are not synchronization operations, and atomic read-modify-write operations, which have special characteristics.
[Note 3: 
For example, a call that acquires a mutex will perform an acquire operation on the locations comprising the mutex.
Correspondingly, a call that releases the same mutex will perform a release operation on those same locations.
Informally, performing a release operation on A forces prior side effects on other memory locations to become visible to other threads that later perform a consume or an acquire operation on A.
“Relaxed” atomic operations are not synchronization operations even though, like synchronization operations, they cannot contribute to data races.
— end note]
All modifications to a particular atomic object M occur in some particular total order, called the modification order of M.
[Note 4: 
There is a separate order for each atomic object.
There is no requirement that these can be combined into a single total order for all objects.
In general this will be impossible since different threads can observe modifications to different objects in inconsistent orders.
— end note]
A release sequence headed by a release operation A on an atomic object M is a maximal contiguous sub-sequence of side effects in the modification order of M, where the first operation is A, and every subsequent operation is an atomic read-modify-write operation.
Certain library calls synchronize with other library calls performed by another thread.
For example, an atomic store-release synchronizes with a load-acquire that takes its value from the store ([atomics.order]).
[Note 5: 
Except in the specified cases, reading a later value does not necessarily ensure visibility as described below.
Such a requirement would sometimes interfere with efficient implementation.
— end note]
[Note 6: 
The specifications of the synchronization operations define when one reads the value written by another.
For atomic objects, the definition is clear.
All operations on a given mutex occur in a single total order.
Each mutex acquisition “reads the value written” by the last mutex release.
— end note]
An evaluation A carries a dependency to an evaluation B if
  • the value of A is used as an operand of B, unless: or
  • A writes a scalar object or bit-field M, B reads the value written by A from M, and A is sequenced before B, or
  • for some evaluation X, A carries a dependency to X, and X carries a dependency to B.
[Note 7: 
“Carries a dependency to” is a subset of “is sequenced before”, and is similarly strictly intra-thread.
— end note]
An evaluation A is dependency-ordered before an evaluation B if
  • A performs a release operation on an atomic object M, and, in another thread, B performs a consume operation on M and reads the value written by A, or
  • for some evaluation X, A is dependency-ordered before X and X carries a dependency to B.
[Note 8: 
The relation “is dependency-ordered before” is analogous to “synchronizes with”, but uses release/consume in place of release/acquire.
— end note]
An evaluation A inter-thread happens before an evaluation B if
  • A synchronizes with B, or
  • A is dependency-ordered before B, or
  • for some evaluation X
    • A synchronizes with X and X is sequenced before B, or
    • A is sequenced before X and X inter-thread happens before B, or
    • A inter-thread happens before X and X inter-thread happens before B.
[Note 9: 
The “inter-thread happens before” relation describes arbitrary concatenations of “sequenced before”, “synchronizes with” and “dependency-ordered before” relationships, with two exceptions.
The first exception is that a concatenation never ends with “dependency-ordered before” followed by “sequenced before”.
The reason for this limitation is that a consume operation participating in a “dependency-ordered before” relationship provides ordering only with respect to operations to which this consume operation actually carries a dependency.
The reason that this limitation applies only to the end of such a concatenation is that any subsequent release operation will provide the required ordering for a prior consume operation.
The second exception is that a concatenation never consist entirely of “sequenced before”.
The reasons for this limitation are (1) to permit “inter-thread happens before” to be transitively closed and (2) the “happens before” relation, defined below, provides for relationships consisting entirely of “sequenced before”.
— end note]
An evaluation A happens before an evaluation B (or, equivalently, B happens after A) if
  • A is sequenced before B, or
  • A inter-thread happens before B.
The implementation shall ensure that no program execution demonstrates a cycle in the “happens before” relation.
[Note 10: 
This cycle would otherwise be possible only through the use of consume operations.
— end note]
An evaluation A simply happens before an evaluation B if either
  • A is sequenced before B, or
  • A synchronizes with B, or
  • A simply happens before X and X simply happens before B.
[Note 11: 
In the absence of consume operations, the happens before and simply happens before relations are identical.
— end note]
An evaluation A strongly happens before an evaluation D if, either
  • A is sequenced before D, or
  • A synchronizes with D, and both A and D are sequentially consistent atomic operations ([atomics.order]), or
  • there are evaluations B and C such that A is sequenced before B, B simply happens before C, and C is sequenced before D, or
  • there is an evaluation B such that A strongly happens before B, and B strongly happens before D.
[Note 12: 
Informally, if A strongly happens before B, then A appears to be evaluated before B in all contexts.
Strongly happens before excludes consume operations.
— end note]
A visible side effect A on a scalar object or bit-field M with respect to a value computation B of M satisfies the conditions:
  • A happens before B and
  • there is no other side effect X to M such that A happens before X and X happens before B.
The value of a non-atomic scalar object or bit-field M, as determined by evaluation B, is the value stored by the visible side effect A.
[Note 13: 
If there is ambiguity about which side effect to a non-atomic object or bit-field is visible, then the behavior is either unspecified or undefined.
— end note]
[Note 14: 
This states that operations on ordinary objects are not visibly reordered.
This is not actually detectable without data races, but is needed to ensure that data races, as defined below, and with suitable restrictions on the use of atomics, correspond to data races in a simple interleaved (sequentially consistent) execution.
— end note]
The value of an atomic object M, as determined by evaluation B, is the value stored by some unspecified side effect A that modifies M, where B does not happen before A.
[Note 15: 
The set of such side effects is also restricted by the rest of the rules described here, and in particular, by the coherence requirements below.
— end note]
If an operation A that modifies an atomic object M happens before an operation B that modifies M, then A is earlier than B in the modification order of M.
[Note 16: 
This requirement is known as write-write coherence.
— end note]
If a value computation A of an atomic object M happens before a value computation B of M, and A takes its value from a side effect X on M, then the value computed by B is either the value stored by X or the value stored by a side effect Y on M, where Y follows X in the modification order of M.
[Note 17: 
This requirement is known as read-read coherence.
— end note]
If a value computation A of an atomic object M happens before an operation B that modifies M, then A takes its value from a side effect X on M, where X precedes B in the modification order of M.
[Note 18: 
This requirement is known as read-write coherence.
— end note]
If a side effect X on an atomic object M happens before a value computation B of M, then the evaluation B takes its value from X or from a side effect Y that follows X in the modification order of M.
[Note 19: 
This requirement is known as write-read coherence.
— end note]
[Note 20: 
The four preceding coherence requirements effectively disallow compiler reordering of atomic operations to a single object, even if both operations are relaxed loads.
This effectively makes the cache coherence guarantee provided by most hardware available to C++ atomic operations.
— end note]
[Note 21: 
The value observed by a load of an atomic depends on the “happens before” relation, which depends on the values observed by loads of atomics.
The intended reading is that there must exist an association of atomic loads with modifications they observe that, together with suitably chosen modification orders and the “happens before” relation derived as described above, satisfy the resulting constraints as imposed here.
— end note]
Two actions are potentially concurrent if
  • they are performed by different threads, or
  • they are unsequenced, at least one is performed by a signal handler, and they are not both performed by the same signal handler invocation.
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below.
Any such data race results in undefined behavior.
[Note 22: 
It can be shown that programs that correctly use mutexes and memory_order​::​seq_cst operations to prevent all data races and use no other synchronization operations behave as if the operations executed by their constituent threads were simply interleaved, with each value computation of an object being taken from the last side effect on that object in that interleaving.
This is normally referred to as “sequential consistency”.
However, this applies only to data-race-free programs, and data-race-free programs cannot observe most program transformations that do not change single-threaded program semantics.
In fact, most single-threaded program transformations remain possible, since any program that behaves differently as a result has undefined behavior.
— end note]
Two accesses to the same object of type volatile std​::​sig_atomic_t do not result in a data race if both occur in the same thread, even if one or more occurs in a signal handler.
For each signal handler invocation, evaluations performed by the thread invoking a signal handler can be divided into two groups A and B, such that no evaluations in B happen before evaluations in A, and the evaluations of such volatile std​::​sig_atomic_t objects take values as though all evaluations in A happened before the execution of the signal handler and the execution of the signal handler happened before all evaluations in B.
[Note 23: 
Compiler transformations that introduce assignments to a potentially shared memory location that would not be modified by the abstract machine are generally precluded by this document, since such an assignment might overwrite another assignment by a different thread in cases in which an abstract machine execution would not have encountered a data race.
This includes implementations of data member assignment that overwrite adjacent members in separate memory locations.
Reordering of atomic loads in cases in which the atomics in question might alias is also generally precluded, since this could violate the coherence rules.
— end note]
[Note 24: 
It is possible that transformations that introduce a speculative read of a potentially shared memory location do not preserve the semantics of the C++ program as defined in this document, since they potentially introduce a data race.
However, they are typically valid in the context of an optimizing compiler that targets a specific machine with well-defined semantics for data races.
They would be invalid for a hypothetical machine that is not tolerant of races or provides hardware race detection.
— end note]

6.9.2.3 Forward progress [intro.progress]

The implementation may assume that any thread will eventually do one of the following:
[Note 1: 
This is intended to allow compiler transformations such as removal, merging, and reordering of empty loops, even when termination cannot be proven.
An affordance is made for trivial infinite loops, which cannot be removed nor reordered.
— end note]
Executions of atomic functions that are either defined to be lock-free ([atomics.flag]) or indicated as lock-free ([atomics.lockfree]) are lock-free executions.
  • If there is only one thread that is not blocked ([defns.block]) in a standard library function, a lock-free execution in that thread shall complete.
    [Note 2: 
    Concurrently executing threads might prevent progress of a lock-free execution.
    For example, this situation can occur with load-locked store-conditional implementations.
    This property is sometimes termed obstruction-free.
    — end note]
  • When one or more lock-free executions run concurrently, at least one should complete.
    [Note 3: 
    It is difficult for some implementations to provide absolute guarantees to this effect, since repeated and particularly inopportune interference from other threads could prevent forward progress, e.g., by repeatedly stealing a cache line for unrelated purposes between load-locked and store-conditional instructions.
    For implementations that follow this recommendation and ensure that such effects cannot indefinitely delay progress under expected operating conditions, such anomalies can therefore safely be ignored by programmers.
    Outside this document, this property is sometimes termed lock-free.
    — end note]
During the execution of a thread of execution, each of the following is termed an execution step:
  • termination of the thread of execution,
  • performing an access through a volatile glvalue, or
  • completion of a call to a library I/O function, a synchronization operation, or an atomic operation.
An invocation of a standard library function that blocks ([defns.block]) is considered to continuously execute execution steps while waiting for the condition that it blocks on to be satisfied.
[Example 1: 
A library I/O function that blocks until the I/O operation is complete can be considered to continuously check whether the operation is complete.
Each such check consists of one or more execution steps, for example using observable behavior of the abstract machine.
— end example]
[Note 4: 
Because of this and the preceding requirement regarding what threads of execution have to perform eventually, it follows that no thread of execution can execute forever without an execution step occurring.
— end note]
A thread of execution makes progress when an execution step occurs or a lock-free execution does not complete because there are other concurrent threads that are not blocked in a standard library function (see above).
For a thread of execution providing concurrent forward progress guarantees, the implementation ensures that the thread will eventually make progress for as long as it has not terminated.
[Note 5: 
This applies regardless of whether or not other threads of execution (if any) have been or are making progress.
To eventually fulfill this requirement means that this will happen in an unspecified but finite amount of time.
— end note]
It is implementation-defined whether the implementation-created thread of execution that executes main ([basic.start.main]) and the threads of execution created by std​::​thread ([thread.thread.class]) or std​::​jthread ([thread.jthread.class]) provide concurrent forward progress guarantees.
General-purpose implementations should provide these guarantees.
For a thread of execution providing parallel forward progress guarantees, the implementation is not required to ensure that the thread will eventually make progress if it has not yet executed any execution step; once this thread has executed a step, it provides concurrent forward progress guarantees.
[Note 6: 
This does not specify a requirement for when to start this thread of execution, which will typically be specified by the entity that creates this thread of execution.
For example, a thread of execution that provides concurrent forward progress guarantees and executes tasks from a set of tasks in an arbitrary order, one after the other, satisfies the requirements of parallel forward progress for these tasks.
— end note]
For a thread of execution providing weakly parallel forward progress guarantees, the implementation does not ensure that the thread will eventually make progress.
[Note 7: 
Threads of execution providing weakly parallel forward progress guarantees cannot be expected to make progress regardless of whether other threads make progress or not; however, blocking with forward progress guarantee delegation, as defined below, can be used to ensure that such threads of execution make progress eventually.
— end note]
Concurrent forward progress guarantees are stronger than parallel forward progress guarantees, which in turn are stronger than weakly parallel forward progress guarantees.
[Note 8: 
For example, some kinds of synchronization between threads of execution might only make progress if the respective threads of execution provide parallel forward progress guarantees, but will fail to make progress under weakly parallel guarantees.
— end note]
When a thread of execution P is specified to block with forward progress guarantee delegation on the completion of a set S of threads of execution, then throughout the whole time of P being blocked on S, the implementation shall ensure that the forward progress guarantees provided by at least one thread of execution in S is at least as strong as P's forward progress guarantees.
[Note 9: 
It is unspecified which thread or threads of execution in S are chosen and for which number of execution steps.
The strengthening is not permanent and not necessarily in place for the rest of the lifetime of the affected thread of execution.
As long as P is blocked, the implementation has to eventually select and potentially strengthen a thread of execution in S.
— end note]
Once a thread of execution in S terminates, it is removed from S.
Once S is empty, P is unblocked.
[Note 10: 
A thread of execution B thus can temporarily provide an effectively stronger forward progress guarantee for a certain amount of time, due to a second thread of execution A being blocked on it with forward progress guarantee delegation.
In turn, if B then blocks with forward progress guarantee delegation on C, this can also temporarily provide a stronger forward progress guarantee to C.
— end note]
[Note 11: 
If all threads of execution in S finish executing (e.g., they terminate and do not use blocking synchronization incorrectly), then P's execution of the operation that blocks with forward progress guarantee delegation will not result in P's progress guarantee being effectively weakened.
— end note]
[Note 12: 
This does not remove any constraints regarding blocking synchronization for threads of execution providing parallel or weakly parallel forward progress guarantees because the implementation is not required to strengthen a particular thread of execution whose too-weak progress guarantee is preventing overall progress.
— end note]
An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time.

6.9.3 Start and termination [basic.start]

6.9.3.1 main function [basic.start.main]

A program shall contain exactly one function called main that belongs to the global scope.
Executing a program starts a main thread of execution ([intro.multithread], [thread.threads]) in which the main function is invoked.
It is implementation-defined whether a program in a freestanding environment is required to define a main function.
[Note 1: 
In a freestanding environment, startup and termination is implementation-defined; startup contains the execution of constructors for non-local objects with static storage duration; termination contains the execution of destructors for objects with static storage duration.
— end note]
An implementation shall not predefine the main function.
Its type shall have C++ language linkage and it shall have a declared return type of type int, but otherwise its type is implementation-defined.
An implementation shall allow both
  • a function of () returning int and
  • a function of (int, pointer to pointer to char) returning int
as the type of main ([dcl.fct]).
In the latter form, for purposes of exposition, the first function parameter is called argc and the second function parameter is called argv, where argc shall be the number of arguments passed to the program from the environment in which the program is run.
If argc is nonzero these arguments shall be supplied in argv[0] through argv[argc-1] as pointers to the initial characters of null-terminated multibyte strings (ntmbss) ([multibyte.strings]) and argv[0] shall be the pointer to the initial character of an ntmbs that represents the name used to invoke the program or "".
The value of argc shall be non-negative.
The value of argv[argc] shall be 0.
Recommended practice: Any further (optional) parameters should be added after argv.
The function main shall not be named by an expression.
The linkage ([basic.link]) of main is implementation-defined.
A program that defines main as deleted or that declares main to be inline, static, constexpr, or consteval is ill-formed.
The function main shall not be a coroutine ([dcl.fct.def.coroutine]).
The main function shall not be declared with a linkage-specification ([dcl.link]).
A program that declares
  • a variable main that belongs to the global scope, or
  • a function main that belongs to the global scope and is attached to a named module, or
  • a function template main that belongs to the global scope, or
  • an entity named main with C language linkage (in any namespace)
is ill-formed.
The name main is not otherwise reserved.
[Example 1: 
Member functions, classes, and enumerations can be called main, as can entities in other namespaces.
— end example]
Terminating the program without leaving the current block (e.g., by calling the function std​::​exit(int) ([support.start.term])) does not destroy any objects with automatic storage duration ([class.dtor]).
If std​::​exit is invoked during the destruction of an object with static or thread storage duration, the program has undefined behavior.
A return statement ([stmt.return]) in main has the effect of leaving the main function (destroying any objects with automatic storage duration) and calling std​::​exit with the return value as the argument.
If control flows off the end of the compound-statement of main, the effect is equivalent to a return with operand 0 (see also [except.handle]).

6.9.3.2 Static initialization [basic.start.static]

Variables with static storage duration are initialized as a consequence of program initiation.
Variables with thread storage duration are initialized as a consequence of thread execution.
Within each of these phases of initiation, initialization occurs as follows.
Constant initialization is performed if a variable or temporary object with static or thread storage duration is constant-initialized ([expr.const]).
If constant initialization is not performed, a variable with static storage duration ([basic.stc.static]) or thread storage duration ([basic.stc.thread]) is zero-initialized ([dcl.init]).
Together, zero-initialization and constant initialization are called static initialization; all other initialization is dynamic initialization.
All static initialization strongly happens before ([intro.races]) any dynamic initialization.
[Note 1: 
The dynamic initialization of non-block variables is described in [basic.start.dynamic]; that of static block variables is described in [stmt.dcl].
— end note]
An implementation is permitted to perform the initialization of a variable with static or thread storage duration as a static initialization even if such initialization is not required to be done statically, provided that
  • the dynamic version of the initialization does not change the value of any other object of static or thread storage duration prior to its initialization, and
  • the static version of the initialization produces the same value in the initialized variable as would be produced by the dynamic initialization if all variables not required to be initialized statically were initialized dynamically.
[Note 2: 
As a consequence, if the initialization of an object obj1 refers to an object obj2 potentially requiring dynamic initialization and defined later in the same translation unit, it is unspecified whether the value of obj2 used will be the value of the fully initialized obj2 (because obj2 was statically initialized) or will be the value of obj2 merely zero-initialized.
For example, inline double fd() { return 1.0; } extern double d1; double d2 = d1; // unspecified: // either statically initialized to 0.0 or // dynamically initialized to 0.0 if d1 is // dynamically initialized, or 1.0 otherwise double d1 = fd(); // either initialized statically or dynamically to 1.0
— end note]

6.9.3.3 Dynamic initialization of non-block variables [basic.start.dynamic]

Dynamic initialization of a non-block variable with static storage duration is unordered if the variable is an implicitly or explicitly instantiated specialization, is partially-ordered if the variable is an inline variable that is not an implicitly or explicitly instantiated specialization, and otherwise is ordered.
[Note 1: 
A non-inline explicit specialization of a templated variable has ordered initialization.
— end note]
A declaration D is appearance-ordered before a declaration E if
  • D appears in the same translation unit as E, or
  • the translation unit containing E has an interface dependency on the translation unit containing D,
in either case prior to E.
Dynamic initialization of non-block variables V and W with static storage duration are ordered as follows:
  • If V and W have ordered initialization and the definition of V is appearance-ordered before the definition of W, or if V has partially-ordered initialization, W does not have unordered initialization, and for every definition E of W there exists a definition D of V such that D is appearance-ordered before E, then
    • if the program does not start a thread ([intro.multithread]) other than the main thread ([basic.start.main]) or V and W have ordered initialization and they are defined in the same translation unit, the initialization of V is sequenced before the initialization of W;
    • otherwise, the initialization of V strongly happens before the initialization of W.
  • Otherwise, if the program starts a thread other than the main thread before either V or W is initialized, it is unspecified in which threads the initializations of V and W occur; the initializations are unsequenced if they occur in the same thread.
  • Otherwise, the initializations of V and W are indeterminately sequenced.
[Note 2: 
This definition permits initialization of a sequence of ordered variables concurrently with another sequence.
— end note]
A non-initialization odr-use is an odr-use ([basic.def.odr]) not caused directly or indirectly by the initialization of a non-block static or thread storage duration variable.
It is implementation-defined whether the dynamic initialization of a non-block non-inline variable with static storage duration is sequenced before the first statement of main or is deferred.
If it is deferred, it strongly happens before any non-initialization odr-use of any non-inline function or non-inline variable defined in the same translation unit as the variable to be initialized.39
It is implementation-defined in which threads and at which points in the program such deferred dynamic initialization occurs.
Recommended practice: An implementation should choose such points in a way that allows the programmer to avoid deadlocks.
[Example 1: // - File 1 - #include "a.h" #include "b.h" B b; A::A() { b.Use(); } // - File 2 - #include "a.h" A a; // - File 3 - #include "a.h" #include "b.h" extern A a; extern B b; int main() { a.Use(); b.Use(); }
It is implementation-defined whether either a or b is initialized before main is entered or whether the initializations are delayed until a is first odr-used in main.
In particular, if a is initialized before main is entered, it is not guaranteed that b will be initialized before it is odr-used by the initialization of a, that is, before A​::​A is called.
If, however, a is initialized at some point after the first statement of main, b will be initialized prior to its use in A​::​A.
— end example]
It is implementation-defined whether the dynamic initialization of a non-block inline variable with static storage duration is sequenced before the first statement of main or is deferred.
If it is deferred, it strongly happens before any non-initialization odr-use of that variable.
It is implementation-defined in which threads and at which points in the program such deferred dynamic initialization occurs.
It is implementation-defined whether the dynamic initialization of a non-block non-inline variable with thread storage duration is sequenced before the first statement of the initial function of a thread or is deferred.
If it is deferred, the initialization associated with the entity for thread t is sequenced before the first non-initialization odr-use by t of any non-inline variable with thread storage duration defined in the same translation unit as the variable to be initialized.
It is implementation-defined in which threads and at which points in the program such deferred dynamic initialization occurs.
If the initialization of a non-block variable with static or thread storage duration exits via an exception, the function std​::​terminate is called ([except.terminate]).
39)39)
A non-block variable with static storage duration having initialization with side effects is initialized in this case, even if it is not itself odr-used ([basic.def.odr], [basic.stc.static]).

6.9.3.4 Termination [basic.start.term]

Constructed objects ([dcl.init]) with static storage duration are destroyed and functions registered with std​::​atexit are called as part of a call to std​::​exit ([support.start.term]).
The call to std​::​exit is sequenced before the destructions and the registered functions.
[Note 1: 
Returning from main invokes std​::​exit ([basic.start.main]).
— end note]
Constructed objects with thread storage duration within a given thread are destroyed as a result of returning from the initial function of that thread and as a result of that thread calling std​::​exit.
The destruction of all constructed objects with thread storage duration within that thread strongly happens before destroying any object with static storage duration.
If the completion of the constructor or dynamic initialization of an object with static storage duration strongly happens before that of another, the completion of the destructor of the second is sequenced before the initiation of the destructor of the first.
If the completion of the constructor or dynamic initialization of an object with thread storage duration is sequenced before that of another, the completion of the destructor of the second is sequenced before the initiation of the destructor of the first.
If an object is initialized statically, the object is destroyed in the same order as if the object was dynamically initialized.
For an object of array or class type, all subobjects of that object are destroyed before any block variable with static storage duration initialized during the construction of the subobjects is destroyed.
If the destruction of an object with static or thread storage duration exits via an exception, the function std​::​terminate is called ([except.terminate]).
If a function contains a block variable of static or thread storage duration that has been destroyed and the function is called during the destruction of an object with static or thread storage duration, the program has undefined behavior if the flow of control passes through the definition of the previously destroyed block variable.
[Note 2: 
Likewise, the behavior is undefined if the block variable is used indirectly (e.g., through a pointer) after its destruction.
— end note]
If the completion of the initialization of an object with static storage duration strongly happens before a call to std​::​atexit (see <cstdlib>, [support.start.term]), the call to the function passed to std​::​atexit is sequenced before the call to the destructor for the object.
If a call to std​::​atexit strongly happens before the completion of the initialization of an object with static storage duration, the call to the destructor for the object is sequenced before the call to the function passed to std​::​atexit.
If a call to std​::​atexit strongly happens before another call to std​::​atexit, the call to the function passed to the second std​::​atexit call is sequenced before the call to the function passed to the first std​::​atexit call.
If there is a use of a standard library object or function not permitted within signal handlers ([support.runtime]) that does not happen before ([intro.multithread]) completion of destruction of objects with static storage duration and execution of std​::​atexit registered functions ([support.start.term]), the program has undefined behavior.
[Note 3: 
If there is a use of an object with static storage duration that does not happen before the object's destruction, the program has undefined behavior.
Terminating every thread before a call to std​::​exit or the exit from main is sufficient, but not necessary, to satisfy these requirements.
These requirements permit thread managers as static-storage-duration objects.
— end note]
Calling the function std​::​abort() declared in <cstdlib> terminates the program without executing any destructors and without calling the functions passed to std​::​atexit() or std​::​at_quick_exit().