# Modular Transactions: Bounding Mixed Races in Space and Time

Brijesh Dongol University of Surrey, UK b.dongol@surrey.ac.uk Radha Jagadeesan DePaul University, USA rjagadeesan@cs.depaul.edu James Riely DePaul University, USA jriely@cs.depaul.edu

# Abstract

We define *local transactional race freedom* (LTRF), which provides a *programmer model* for *software transactional memory*. LTRF programs satisfy the *SC-LTRF* property, thus allowing the programmer to focus on sequential executions in which transactions execute atomically. Unlike previous results, *SC-LTRF* does not require global race freedom. We also provide a lower-level *implementation model* to reason about *quiescence fences* and validate numerous compiler optimizations.

CCS Concepts • Theory of computation  $\rightarrow$  Parallel computing models; Abstraction;

## 1 Introduction

For concurrent programs communicating via a shared-memory subsystem that includes locks, the *SC-DRF* property states that a *Data Race Free* program can be fully understood by considering only executions that are *Sequentially Consistent*, meaning that the shared-memory subsystem can be modeled as a standard sequential store [3].

For programs that use transactions to augment or replace locking, the analogous SC-TRF property states that for *Transactionally Race-Free* programs, it suffices to consider executions that are SC and where transactions are executed atomically. For TRF programs, SC-TRF implies *opacity* [15, 16], which generalizes SC-DRF to include aborted and live transactions. SC-TRF is a conditional form of *operational refinement*: for TRF programs, "every behavior a user can observe of a program using a TM implementation can also be observed when the program uses an abstract TM that executes each block atomically" [22].

Dongol is supported by EPSRC grant EP/R032556/1. Jagadeesan and Riely are supported by National Science Foundation CCR-1617175.

PPoPP '19, February 16-20, 2019, Washington, DC, USA

© 2019 Association for Computing Machinery.

ACM ISBN 978-1-4503-6225-2/19/02...\$15.00

https://doi.org/10.1145/3293883.3295708

Reasoning with SC-TRF is powerful, particularly for *mixed-mode access*, where a single location is accessed both transactionally and nontransactionally. A common idiom is *pri-vatization*, shown in the following program.

atomic<sub>a</sub> { if !y then  $x \coloneqq 1$  } || atomic<sub>b</sub> {  $y \coloneqq 1$  };  $x \coloneqq 2$ Here, there are two threads, separated by parallel composition. Transactions are denoted by atomic blocks, with transaction names as subscripts to facilitate discussion. The first thread atomically reads y and updates x if y is 0 (the initial value). The second thread atomically writes y, then executes a plain (nontransactional) write to x.

Reasoning sequentially and assuming all transactions commit, it is impossible for the program to terminate with x = 1since the atomic blocks must appear to occur in some serial order. Suppose *a* serializes first—then the write of 1 to *x*, denoted  $\langle Wx1 \rangle$ , must precede  $\langle Wx2 \rangle$ , and the final result is 2. Suppose *b* serializes first—then there will be no  $\langle Wx1 \rangle$ , since the only available value for *y* is 1.

Thus, the atomic blocks are used to synchronize threads. In the case that x = 2 is replaced with some costly computation, the privatization idiom can be used to reduce computational costs inside atomic blocks.

The reverse idiom is *publication*, exemplified by:

 $x \coloneqq 1$ ; atomic<sub>a</sub> {  $y \coloneqq 1$  } || atomic<sub>b</sub> {  $z \coloneqq 2$ ; if y then  $z \coloneqq x$  } Reasoning as before, it is impossible for the program to terminate with z = 0. Suppose transaction a serializes first—then b must see both  $\langle Wx1 \rangle$  and  $\langle Wy1 \rangle$  and therefore end by writing  $\langle Wz1 \rangle$ . Suppose b serializes first—then there will be no second write to z, since the only available value for y is the initial value 0, and thus the last write to z is  $\langle Wz2 \rangle$ .

It is a direct consequence of sequential reasoning that these outcomes must be forbidden. In the implementation of Software Transactional Memory (STM), many performance enhancements, such as optimistic execution, can result in a failure of SC-TRF, allowing behaviors such as those above. This has led to a tension between the programmer model and the implementation of STMs, resulting in a great literature on the subject, with many competing notions of *transactional race* that abstract away implementation details to a greater or lesser degree [1, 17, 24, 28].

In this paper, we emphasize the *programmer model*, developing a high-level definition of a transactional race that makes mixed-mode idioms safe by definition (§2 and §4). We attempt to make the programmer model as broadly applicable as possible by adapting the notion of *local* data race

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

developed by Dolan et al. [9]. At the same time, we show that our model is efficiently implementable, in that it avoids common pitfalls that overly constrain the STM implementation, such as *publication by antidependency* or *global lock atomicity* (§3). Our programmer model disables common compiler optimizations; so, we develop a slightly more concrete *implementation model* that supports compiler optimizations (§5). We describe how to compile our model to x86 and ARMv8 (§6). We are inspired by Khyzha et al. [22], who followed the same agenda for global races using a model similar to our implementation model; we discuss related work in §7.

In addition to providing a novel programmer model, this paper extends existing work in several ways.

*Local Race Freedom.* The SC-TRF property is a *global* property: a race anywhere in the program is sufficient to nullify the TRF property, typically resulting in undefined semantics. Recently, Dolan et al. [9] proposed *local* DRF as an alternative to global DRF for programs that synchronize via Java-like volatiles. We propose the first *local TRF* property.

Local TRF is strictly more expressive than the global TRF models considered in prior work. As a result, we are able to provide an SC-*L*TRF guarantee, which applies to many additional programs. Consider the variant of the well-known *independent reads of independent writes* example below.

atomic { x := 1 } || atomic { y := 1 } || atomic {  $r_1 := x$  }; z := 1; atomic {  $r_2 := y$  } (IRIW) || atomic {  $q_1 := y$  }; z := 2; atomic {  $q_2 := x$  }

The following outcome cannot occur sequentially.

$$\begin{array}{c} Wx1 \longrightarrow Rx1 \longrightarrow Wz1 \longrightarrow Ry0 \\ \hline Wy1 \longrightarrow Ry1 \longrightarrow Wz2 \longrightarrow Rx0 \end{array}$$

If the writes to *z* are removed, then SC-TRF reasoning allows a programmer to conclude that this sequence of reads cannot occur. However, with the writes to *z* included, SC-TRF reasoning says nothing about this program, since, by any definition, there is a race on *z*. SC-LTRF allows us to ignore this race. Since no transactional variable is involved in a race, we are guaranteed that every execution of this program behaves as though the *transactional portion* were executed sequentially with no interleaving of transactions. This example illustrates *spatial locality*.

To understand the temporal flavor of locality, consider the following program that uses IRIW as a parallel component.

$$x := -1$$
; atomic {  $F ++$  } ||  $x := -2$ ; atomic {  $F ++$  } || atomic {  $r := F$  }; if  $r = 2$  then IRIW

Again, standard SC-TRF reasoning says nothing about this program, since there are races on *x*. But there is no race on *x* or *y* after the guard r = 2 becomes true; SC-*L*TRF allows us to reason sequentially from that point, ensuring that IRIW behaves as expected.

Thus, by adapting the notion of locality from [9], we enable *modular reasoning with transactions* by isolating transactional races from other data races, in both space and time. **Defined Behavior for Racy Programs.** Most prior models based on SC TPE either give undefined computing for pro-

based on SC-TRF either give undefined semantics for programs with races or assume that the underlying memory model is sequentially consistent. We define the semantics of programs using the relaxed memory model of [9], and thus give a defined semantics for racy programs using a realistic memory model.

**Implementation-Level Reasoning.** Most prior work relies on programmers to place quiescence fences to guarantee safety [22, 27, 34–36]. We connect our high-level model to this previous work by developing a lower-level implementation model that includes explicit fences. Our lower-level assumes only that the underlying transactional machinery provides order between transactions that have a direct dependency, e.g., as in the publication idiom. We note that hardware transactions [5, 6, 10] support the ordering assumptions of our lower-level model. Fences are necessary only to provide order when there is no direct dependency, as in the privatization idiom. We provide a correctness criterion to realize our abstract programming model, and compare the fences required to realize our high-level model to previous approaches.

In addition to building on these aspects of prior work on SC-TRF, we prove that common compiler optimizations are sound under LTRF. In addition to *all* of the optimizations validated by LDRF [9], we also validate some optimizations specific to transactions, inspired by optimizations that are sound with respect to locks [27]. For example, we show that empty transactions can be elided, that the scope of transactions can be increased, and that adjacent transactions can be combined.

## 2 Programmer Model

Dolan et al. [9] give a semantics for a language using Java-like volatiles for synchronization. We adapt their semantics to *isolated* transactions [13, 26] (where plain actions may not be causally interleaved with transactional actions). Transactions are more general than volatiles in several ways:

- A transaction may abort.
- A transaction may both read and write.
- A transaction may access more than one location.
- The same location may be used in both transactional and plain accesses.

We give the semantics of a program as a set of *traces*, each of which is a sequence of *actions* (e.g., read, write, transaction begin). Dolan et al. [9] give both an operational semantics generating traces and an axiomatic semantics defined over event graphs. We concentrate on the axiomatic treatment, treating actions as events in an event graph and deriving orders over these actions. We use the words *trace* and *execution* 

interchangeably, preferring "trace" when the exact sequence of actions is relevant and "execution" when it suffices to consider the derived relations.

We have designed the semantics so that transactions behave exactly like the volatiles of [9] for *degenerate* traces in which each transaction contains a single read or write action, transactional and nontransactional locations are disjoint, and each transaction is committed and contiguous.

In this section, we present a programmer model that validates mixed-mode idioms such as privatization, but fails to validate common compiler optimizations. In §5, we give a low-level model that validates compiler optimizations, but only conditionally validates mixed-mode idioms.

Actions The syntax of actions is as follows.

| $a, b, c \in Act$     | (Action Id)   | $\alpha ::= \langle a:s \forall x v q \rangle$ | (Write)  |
|-----------------------|---------------|------------------------------------------------|----------|
| $s, t \in Three$      | d (Thread Id) | $ \langle a:s Rxvq \rangle$                    | (Read)   |
| $x, y \in Loc$        | (Location)    | $ \langle a:sB\rangle$                         | (Begin)  |
| $v, w \in \mathbb{Z}$ | (Value)       | $ \langle a:sCb\rangle $                       | (Commit) |
| $q, p \in \mathbb{Q}$ | (Timestamp)   | $ \langle a:sAb \rangle$                       | (Abort)  |

Action ids are unique identifiers for actions. Thread ids include the reserved thread id init, used for initialization. To simplify the definition of initialization, we assume that the set of locations is finite. We take values to be integers and timestamps to be rationals, as in [9].

The write action  $\langle a:sWxvq \rangle$  denotes a write of v to x by thread s, with action name a. Likewise,  $\langle a:sRxvq \rangle$  denotes a read. The timestamp q is used to encode relations between these actions, as detailed below.

The begin action  $\langle b:sB \rangle$  denotes the begin of transaction by thread *s*, with action name *b*. We also use *b* as the transaction name. The commit action  $\langle a:sCb \rangle$  denotes the commit of the transaction named *b*. Likewise  $\langle a:sAb \rangle$  denotes the abort of *b*. We refer to commits and aborts collectively as *resolution* actions.

We often drop components of the action syntax that are not interesting for the discussion at hand, e.g., we may write  $\langle a:sWxvq \rangle$  as either  $\langle a \rangle$ ,  $\langle a:s \rangle$ ,  $\langle Wx \rangle$ ,  $\langle Wxv \rangle$ , or  $\langle Wxq \rangle$ .

**Traces and Transactions.** A *trace* is a finite sequence of actions  $\alpha_1\alpha_2 \cdots \alpha_n$ . We use  $\sigma$ ,  $\rho$  to range over traces. We only consider *well-formed* traces (defined below), which begin with an *initializing transaction* of the form  $\langle b:initB \rangle$   $\langle initWx_1v_10 \rangle \cdots \langle initWx_nv_n0 \rangle \langle initCb \rangle$ , which contains exactly one write for each location, at timestamp 0. Here init is a reserved thread name. In examples, we usually omit this initializing transaction, assuming that all locations are initialized to 0.

Each trace  $\sigma = \alpha_1 \alpha_2 \cdots \alpha_n$  generates a total order  $\xrightarrow{\text{index}}_{\sigma}$ , where  $\alpha_i \xrightarrow{\text{index}}_{\sigma} \alpha_j$  iff i < j. Usually, the trace is clear from context and we drop the subscript, preferring  $\xrightarrow{\text{index}}_{\sigma}$  to  $\xrightarrow{\text{index}}_{\sigma}$ . We adopt this convention throughout, dropping the subscript in definitions as well as examples.

We derive several other relations from a trace, including *initialization order*, *program order*, *write-to-write order* (aka *coherence*) and *write-to-read order* (aka *reads-from*).

- $\langle a:s \rangle \xrightarrow{\text{init}} \langle b:t \rangle$  iff  $s = \text{init} \neq t$ .
- $\langle a:s \rangle \xrightarrow{\text{po}} \langle b:t \rangle$  iff  $a \xrightarrow{\text{index}} b$  and s = t.
- $\langle a: Wxq \rangle \xrightarrow{ww} \langle b: Wyp \rangle$  iff x = y and q < p.
- $\langle a:Wxvq \rangle \xrightarrow{Wr} \langle b:Rywp \rangle$  iff x = y, v = w and q = p. All of these relations are irreflexive.  $\xrightarrow{po}$  and  $\xrightarrow{ww}$  are transi-

tive. The domain and range are disjoint for  $\xrightarrow{\text{init}}$  and  $\xrightarrow{\text{wr}}$ . In the context of a trace, we often refer to actions by name. For example, we prefer " $a \xrightarrow{\text{po}} b$ " to " $\langle a \rangle \xrightarrow{\text{po}} \langle b \rangle$ ". We also write " $a = \langle sWxvq \rangle$ " rather than " $\exists i. \alpha_i = \langle a:sWxvq \rangle$ ".

We take the name of the begin action to be the unique id for each transaction. We say that action *a belongs to* transaction *b* if  $\langle b:B \rangle \xrightarrow{po} a$  and there is no commit or abort action *c* such that  $b \xrightarrow{po} c \xrightarrow{po} a$ . We say that *a* is *transactional* if it belongs to some transaction, and *plain* otherwise.

Each trace induces an equivalence over action names, relating actions that belong to the same transaction:

 $a \stackrel{\text{tx}}{\sim} b$  iff a = b or a and b belong to the same transaction. Note that plain actions are included in  $\stackrel{\text{tx}}{\sim}$ , although they only relate to themselves.

There are three possible states for transactions: *committed*, *aborted* and *live*. Committed and aborted transactions are *resolved*. Committed and live transactions are *nonaborted*. We use the same terminology to refer all of the actions in a transaction; thus, we may use "aborted write action" to refer to a write action that belongs to an aborted transaction.

We visualize traces as graphs. For example, the trace  $\langle a:initB \rangle \langle initWx00 \rangle \langle initWy00 \rangle \langle initCa \rangle \langle b:sB \rangle \langle sWy11 \rangle \langle sWx11 \rangle \langle sCb \rangle \langle c:tB \rangle \langle tRy11 \rangle \langle tAc \rangle \langle d:tWx22 \rangle$  is visualized as:

To avoid clutter, we drop the label on  $\xrightarrow{po}$  and elide the initializing transaction. Instead of including explicit begin and resolution actions, we visualize transactions using boxes. Committed and live transactions are drawn in solid boxes, colored blue. Aborted transactions are drawn in dashed boxes, colored red.

*Well-Formedness.* A trace is a *well-formed* if each of the following hold:

- WF<sub>1</sub>. The trace starts with an initializing transaction.
- WF<sub>2</sub>. Action names are unique: if  $a \xrightarrow{\text{index}} b$ , then  $a \neq b$ .
- $WF_{3}$ . Write timestamps are per-location unique:
  - If  $a = \langle Wxq \rangle$  and  $b = \langle Wxq \rangle$ , then a = b.
- WF<sub>4</sub>. Each begin action has at most one resolution, and each resolution has exactly one begin action.
- WF<sub>5</sub>. Each resolution follows its begin in  $\xrightarrow{\text{po}}$ , without an intervening begin or resolution.

- WF<sub>6</sub>. If *b* is a read, then there is some *a* such that  $a \xrightarrow{\text{wr}} b$ .
- WF<sub>7</sub>. If  $a \xrightarrow{\text{wr}} b$  and *a* is aborted or live, then  $a \stackrel{\text{tx}}{\sim} b$ .
- WF<sub>8</sub>. If  $a \xrightarrow{\text{wr}} b$ , then  $a \xrightarrow{\text{index}} b$ .
- WF<sub>9</sub>. If *b* is transactional, then there is no committed or live  $c \xrightarrow{index} b$  such that  $b \xrightarrow{ww} c$ .
- WF<sub>10</sub>. If *b* is transactional and there is some transactional  $a \xrightarrow{\text{wr}} b$ , then there is no committed or live  $c \xrightarrow{\text{index}} b$  such that  $a \xrightarrow{\text{ww}} c$ .
- WF<sub>11</sub>. If *b* is transactional and there is some  $a \xrightarrow{\text{wr}} b$ , then there is no  $c \stackrel{\text{tx}}{\sim} b$  such that  $c \xrightarrow{\text{index}} b$  and  $a \xrightarrow{\text{ww}} c$ .

WF<sub>1</sub> ensures that locations are initialized. WF<sub>2</sub>–WF<sub>3</sub> ensure that action names and timestamps are unique. WF<sub>4</sub>–WF<sub>5</sub> ensure proper bracketing for transactions. These conditions also preclude nesting of transactions — we leave the treatment of nested transactions to future work. WF<sub>6</sub> ensures that all reads are fulfilled. WF<sub>7</sub> ensures that aborted and live writes are not visible outside the transaction.

WF<sub>8</sub>–WF<sub>11</sub> constrain the interleavings allowed in a trace. For the most part, we view traces as abstract execution graphs, where transactions are expressed as multiple  $\xrightarrow{po}$ contiguous actions. In execution graphs, time is *relative*: it is expressed as the *happens-before* relation, which captures causal relations between actions. At the concrete level of a trace, time is *absolute*: it is expressed by order in the sequence. Viewed as execution graphs, WF<sub>8</sub>–WF<sub>11</sub> are redundant with respect to consistency criteria given below. These conditions, instead, constrain the concrete representation of the execution graph as a trace, enabling inductive reasoning that mirrors the operational reasoning of [9].

WF<sub>8</sub> ensures that reads only see the absolute past: reads are not allowed to "see the future". This condition is guaranteed by the operational semantics of [9], but here must be stated explicitly. There is no similar requirement that writes respect absolute time. They may appear out of order. For example, we allow the trace  $\langle Wx22 \rangle \langle Wx11 \rangle$ .

WF<sub>9</sub>–WF<sub>11</sub> constrain the interleaving of the actions from different transactions. There is no analogue of these rules in [9] since volatiles are expressed as a single action. WF<sub>9</sub> forbids  $\langle cWx22 \rangle \langle bWx11 \rangle$  when both are transactional we ignore aborted writes because they are not visible to other transactions. WF<sub>10</sub> forbids  $\langle aWx11 \rangle \langle cWx22 \rangle \langle bRx11 \rangle$ when all three are transactional. WF<sub>11</sub> forbids  $\langle aWx11 \rangle \langle cWx22 \rangle \langle bRx11 \rangle$  when  $c \overset{tx}{\simeq} b$ .

**Antidependencies.** An antidependency relates a read to any write that cannot precede it. We use  $\overset{\GammaW}{\longrightarrow}$  to represent antidependency as *read-to-write order* (aka *from-read*). Ignoring transactions,  $b \overset{\GammaW}{\longrightarrow} c$  whenever  $a \xrightarrow{wr} b$  and  $a \xrightarrow{ww} c$ , for some *a*.

As we shall see, antidependencies are not allowed to contradict the happens-before order, which defines causality. The end result is that stale reads are precluded. For example, consider the trace  $\langle a:sWx1 \rangle \langle c:sWx2 \rangle \langle b:sRx1 \rangle$ . This trace should not be allowed, since it reads 1 after writing 1 and then 2 in the same thread. Because  $c \xrightarrow{po} b \xrightarrow{rw} c$ , this trace, shown on the left below, will not be considered consistent:

$$a: \forall x_1 \xrightarrow{\cdots} c: \forall x_2 \qquad a: \forall x_1 \xrightarrow{\cdots} c: \forall x_2$$
$$b: Rx_1 \xrightarrow{\cdots} b: Rx_1$$

Aborted transactions complicate the definition of antidependency. For example, if *c* is part of an aborted transaction, as shown on the right, then the outcome should be allowed. Note that if *b* and *c* belonged to the same aborted transaction, then the execution would be disallowed by condition  $WF_{11}$  in the definition of well-formed trace.

Thus we arrive at the following definition:

 $b \xrightarrow{rw} c$  iff  $a \xrightarrow{wr} b$  and  $a \xrightarrow{ww} c$ , for some *a*, and *c* is either plain or nonaborted.

*Lifted Relations.* A common technique to enforce transactional atomicity is to *lift* orders from individual actions to the level of transactions [6, 10, 32]. Notationally, we indicate a lifted relation by prefixing "l." For example, the lifting of  $\stackrel{\text{wr}}{\xrightarrow{}}$  is written  $\stackrel{\text{lwr}}{\xrightarrow{}}$ . We also use two variants.

- $\xrightarrow{IR}$  is the lifting of relation  $\xrightarrow{R}$ .
- $\xrightarrow{xR}$  restricts  $\xrightarrow{IR}$  to transactions.
- $\xrightarrow{cR}$  restricts  $\xrightarrow{xR}$  to nonaborted transactions.

For any relation  $\xrightarrow{R}$ , the definitions are as follows.

- $a \xrightarrow{\text{IR}} b \text{ iff } a \xrightarrow{\text{R}} b \text{ or } a' \xrightarrow{\text{R}} b' \text{ for some } a' \xrightarrow{\text{tx}} a \xrightarrow{\text{tx}} b \xrightarrow{\text{tx}} b'.$
- $a \xrightarrow{\times R} b$  iff  $a \xrightarrow{IR} b$  and a, b are transactional.
- $a \xrightarrow{cR} b$  iff  $a \xrightarrow{xR} b$  and a, b are committed or live.

Consider the following execution, where we label the individual actions of *b*.

| $b_1: V$    | Vy1 - | $\rightarrow b_2: V$ | Vx1     |
|-------------|-------|----------------------|---------|
| wr          |       |                      | ww      |
| <i>c</i> :R | y1 –  | $\rightarrow d$ :W   | $x^{2}$ |

We have  $b_1 \xrightarrow{\text{wr}} c$  but not  $b_2 \xrightarrow{\text{wr}} c$ . In the lifted relation both of these hold; in particular, we have  $b_2 \xrightarrow{\text{lwr}} c$ . Similarly, we have  $b_1 \xrightarrow{\text{lwr}} d$  but not  $b_1 \xrightarrow{\text{ww}} d$ . The "x" variants exclude d. The "c" variants exclude both c and d.

Summarizing the relations defined thus far, we have:

- $\xrightarrow{index}$  is the absolute order of events in a trace.
- $\xrightarrow{\text{init}}$  relates initialization events to other events.
- $\xrightarrow{\text{po}}$  restricts  $\xrightarrow{\text{index}}$  to events of same thread.
- $-\overset{\text{ww}}{\rightarrow}$  is write-to-write order, derived from timestamps.
- $\xrightarrow{\text{wr}}$  is write-to-read order, derived from timestamps.
- $\cdot \stackrel{\text{rw}}{\longrightarrow}$  is read-to-write order, derived from  $-\stackrel{\text{ww}}{\longrightarrow}$  and  $\stackrel{\text{wr}}{\longrightarrow}$ .

Lifting is only applied to the last three relations.

**Happens-Before.** The *happens-before* order, <u>hb</u>, is a partial order that captures dependency, or causality, between actions. It serves a crucial role in understanding distributed systems. In the next subsection, happens-before is used to define *consistent* executions that obey the intended notion of causality. In §4, happens-before is also used to define

. .

*data races.* By varying the definition of happens-before, we change the definition of both consistency and raciness.

We define  $\xrightarrow{hb}$  to be the least relation that is closed with respect to the following.

$$\begin{array}{l} a \xrightarrow{hb} c \text{ if } a ( \xrightarrow{\text{Init}} \cup \xrightarrow{po} \cup \xrightarrow{cwr} \cup \xrightarrow{cww} ) c & (\text{HB}_{\text{BASE}}) \\ a \xrightarrow{hb} c \text{ if } a \xrightarrow{hb} b \xrightarrow{hb} c & (\text{HB}_{\text{TRANS}}) \\ a \xrightarrow{hb} c \text{ if } c \text{ is plain, } a \xrightarrow{lww} c \text{ and } a \xrightarrow{crw} b \xrightarrow{hb} c & (\text{HB}_{\text{WW}}) \end{array}$$

We discuss variations of  $HB_{ww}$  at the end of this section. We discuss an alternative model without  $HB_{ww}$  in §5.

By  $HB_{BASE}$ , happens-before includes initialization order, program order, lifted write-to-write order and lifted writeto-read order.  $HB_{TRANS}$  says that happens-before is transitive. These rules are adapted from the analogous rules in [9]. The only subtlety of these rules lies in the choice of lifted relation in  $HB_{BASE}$ ; note that we restrict  $HB_{BASE}$  to include order only from committed and live transactions. We discuss the reason for this in the next subsection.

HB<sub>ww</sub> is designed to ensure that *privatization* is considered *race-free*. Roughly, two actions are *racing* if they touch a common location, neither is aborted, one is a write, and they are not ordered by  $\xrightarrow{\text{hb}}$ . HB<sub>ww</sub> only applies when *a* and *b* are live or committed. If *c* is also live or committed, then this rule adds nothing: HB<sub>BASE</sub> already gives us  $a \xrightarrow{\text{hb}} c$  since  $a \xrightarrow{\text{CWW}} c$ .

#### **Example 2.1.** Recall the *privatization* example from §1.

|                                                          | $a: Ry0 \rightarrow Wx1$  |
|----------------------------------------------------------|---------------------------|
| atomic <sub>a</sub> { if $!y$ then $x \coloneqq 1$ }     | crw tww                   |
| atomic <sub>b</sub> { $y \coloneqq 1$ }; $x \coloneqq 2$ | $b:Wy1 \rightarrow c:Wx2$ |

Without HB<sub>ww</sub>, there would be a race between  $\langle Wx1 \rangle$  and  $\langle Wx2 \rangle$ . By including  $a \xrightarrow{lww} c$  in happens-before, we ensure that this execution is considered race free.

Order from  $HB_{ww}$  can cascade, as in the following.

atomic<sub>a</sub> { if !y then x := 1 } || atomic<sub>b</sub> { y := 1 }; atomic<sub>a'</sub> { if !y' then x' := 1 } || atomic<sub>b'</sub> { y' := 1 }; x' := 2; x := 2



*Consistency.* We say that an execution is *consistent* iff it is well-formed and the following hold.

| $(\xrightarrow{hb} \cup \xrightarrow{hwr} \cup \cdots \xrightarrow{xrw})$ is acyclic. | (Causality)   |
|---------------------------------------------------------------------------------------|---------------|
| $(\xrightarrow{hb}; -\stackrel{lww}{\sim})$ is irreflexive.                           | (Coherence)   |
| $(\xrightarrow{hb}; \xrightarrow{lrw})$ is irreflexive.                               | (Observation) |
| $( \xrightarrow{crw}; \xrightarrow{hb}; \xrightarrow{lww})$ is irreflexive.           | $(Anti_{ww})$ |

CAUSALITY, COHERENCE and OBSERVATION all appear in [9]. We discuss ANTI<sub>ww</sub> below and in Example 3.5.

**Example 2.2.** Consider the variant of Example 2.1, in which the writes on *x* are given the reverse order in  $-\frac{|ww}{2}$ .

|                                                                      | $a: Ry0 \rightarrow Wx2$  |
|----------------------------------------------------------------------|---------------------------|
| atomic <sub>a</sub> { if $!y$ then $x \coloneqq 2$ }                 | crw Tww                   |
| $\parallel$ atomic <sub>b</sub> { $y \coloneqq 1$ }; $x \coloneqq 1$ | $b:Wy1 \rightarrow c:Wx1$ |

Intuitively, this execution should be disallowed since  $-\frac{lww}{}$  seems to order the writes incorrectly. ANTI<sub>ww</sub> forbids it.

Technically, this execution must be disallowed in order to establish the SC-LTRF theorem, which states that any race can be discovered in a sequential execution. To see the issue, note that the two writes on *x* are not ordered by  $\xrightarrow{hb}$ (HB<sub>ww</sub> does not apply here); thus they are in a race. SC-LTRF requires, therefore, that we find a sequential execution of this program that also exhibits a race. But this is impossible: any sequential execution must have *a* before *b*, and therefore before *c*, and thus *a*  $-\stackrel{lww}{\longrightarrow}$  *c*. But in this case, HB<sub>ww</sub> adds order between *a* and *c*, eliminating the race.

As noted in [9], since  $\xrightarrow{po} \subseteq \xrightarrow{hb}$  and  $\xrightarrow{wr} \subseteq \xrightarrow{lwr}$ , the inclusion of  $\xrightarrow{lwr}$  in CAUSALITY forbids "load buffering," shown on the left below, which is allowed by many other models.



On the other hand, the model does allow "store buffering," shown in the middle above, since plain antidependencies only have an irreflexivity requirement in OBSERVATION, not an acyclicity requirement.

We do not include aborted transactions in HB<sub>BASE</sub>; in conjunction, with OBSERVATION, this would cause publication through aborted reads. To see this, consider the execution on the right above, which is allowed by our model, but would be disallowed if  $\stackrel{hb}{\longrightarrow}$  included  $\stackrel{xwr}{\longrightarrow}$  rather than  $\stackrel{cwr}{\longrightarrow}$ .

Were we to use  $(\underline{crw})$  in CAUSALITY, the execution on the left below would be allowed. But this execution violates opacity, which requires a total order among all transactions (including aborted transactions) that is consistent with happensbefore order [15, 16]. Therefore the execution must be forbidden. If the writes are plain, however, this execution is similar to the store buffering example, and should be allowed. Thus, it would be too strong to use  $(\underline{hrw})$  in CAUSALITY, or to require acyclicity of  $(\underline{hb} \cup .|\underline{hrw})$  in OBSERVATION. Similarly, we cannot use  $-\underline{hww}$  in CAUSALITY or require acyclicity of  $((\underline{hb} \cup .-\underline{hrw}))$  in COHERENCE. In either case, we would rule out the execution on the right.



PPoPP '19, February 16-20, 2019, Washington, DC, USA

As noted in [9], the notion of coherence in LTRF is stronger than Java, which allows the execution on the left below. On the other hand, LTRF coherence is not as strong as coherence in hardware models and C++ atomics, which forbid the execution on the right—allowing such executions is necessary to support compiler optimizations, such as common subexpression elimination [9, 31].



Anti-Dependence vs Happens-Before. HB<sub>ww</sub> adds to  $\xrightarrow{hb}$  the minimal order needed to validate privatization. There is a design space of choices for additional constraints that can be imposed on the compositions of  $\xrightarrow{crw}$  and  $\xrightarrow{hb}$ .

**Example 2.3.** There are six variants, each of which we illustrate with an example. For completeness, we include  $HB_{ww}$  with a variant of Example 2.1. Following Example 2.2, many of these require an additional antidependency axiom. The exceptions involve  $\xrightarrow{Iwr}$ , for which CAUSALITY suffices.

$$a \xrightarrow{\text{hb}} c \text{ if } c \text{ is plain, } a \xrightarrow{\text{lrw}} c \text{ and } a \xrightarrow{\text{crw}} b \xrightarrow{\text{hb}} c \quad (\text{HB}_{RW})$$
$$(\underline{\cdot crw}_{s}; \underline{\xrightarrow{\text{hb}}}; \underline{\cdot lrw}_{s}) \text{ is irreflexive} \qquad (\text{AntI}_{RW})$$

atomic<sub>a</sub> { 
$$r := y; q := x$$
 }  
|| atomic<sub>b</sub> {  $y := 1$  };  $x := 1$ 

$$(a:Ry0 \rightarrow Rx0)$$

$$(crw; intervention interventintervention intervention interven$$

 $a \xrightarrow{\text{hb}} c \text{ if } c \text{ is plain, } a \xrightarrow{\text{lwr}} c \text{ and } a \xrightarrow{\text{crw}} b \xrightarrow{\text{hb}} c \quad (\text{HB}_{\text{wR}})$ 

atomic<sub>a</sub> { 
$$r \coloneqq y; x \coloneqq 1$$
 }  
|| atomic<sub>b</sub> {  $y \coloneqq 1$  };  $q \coloneqq x$   
 $b \coloneqq Wy1 \rightarrow c: Rx1$ 

 $a \xrightarrow{\text{hb}} c \text{ if } a \text{ is plain, } a \xrightarrow{\text{lww}} c \text{ and } a \xrightarrow{\text{hb}} b \xrightarrow{\text{crw}} c \text{ (HB}'_{ww})$  $(\xrightarrow{\text{hb}}; \xrightarrow{\text{crw}}; \xrightarrow{\text{lww}}) \text{ is irreflexive.} (Antl'_{ww})$ 

$$x := 1; \operatorname{atomic}_{b} \{ r := y \} \qquad a: Wx1 \to \underbrace{b: Ry0}_{\downarrow ww \downarrow} \xrightarrow{\vdots crw}_{\vdots crw} \\ \| \operatorname{atomic}_{c} \{ x := 2; y := 1 \} \qquad \underbrace{c: Wx2 \to Wy1}_{\downarrow ww \downarrow} \xrightarrow{\vdots crw}_{\downarrow crw}$$

Brijesh Dongol, Radha Jagadeesan, and James Riely

$$a \xrightarrow{\text{hb}} c \text{ if } a \text{ is plain, } a \xrightarrow{\text{lwr}} c \text{ and } a \xrightarrow{\text{hb}} b \xrightarrow{\text{crw}} c \quad (\text{HB}'_{\text{WR}})$$
$$x \coloneqq 1; \text{ atomic}_b \{ r \coloneqq y \}$$
$$\| \text{ atomic}_c \{ q \coloneqq x; y \coloneqq 1 \}$$
$$a \coloneqq Wx1 \longrightarrow b \coloneqq y1$$
$$\| \text{ atomic}_c \{ q \coloneqq x; y \coloneqq 1 \}$$

#### 3 STM Design

We consider several examples from the literature to argue that the ordering required by our model does not impair efficient implementations of Software Transactional Memory.

**Example 3.1.** In accordance with [27, Figure 12], our model does not enforce *publication by antidependence*: The final outcome r = q = 0 is permitted in the program (left), as shown by the allowable execution (right).

$$x := 1; \text{ atomic}_a \{ r := y \}$$
  
$$\| \text{ atomic}_b \{ q := x; y := 1 \}$$
  
$$Wx1 \longrightarrow a: Ry0$$
  
$$|_{\text{Irw}} \stackrel{?}{\longrightarrow} \cdots \stackrel{?}{\longrightarrow} crw$$
  
$$b: Rx0 \longrightarrow Wy1$$

Note that if  $\xrightarrow{\text{hb}}$  were to include  $\xrightarrow{\text{crw}}$ , then this execution would be forbidden by OBSERVATION. Note also that this execution is forbidden by any model that enforces  $\text{ANTI}'_{RW}$ , from Example 2.3.

**Example 3.2.** In accordance with [27, Figure 11], our model does not enforce *global lock atomicity*: The final outcome r = q = 0 is possible in the program below.

| $r = 1$ , stomic $\int u = 1$ b $r = 7$             | $Wx1 \rightarrow W_1$ | $y1 \mapsto Rz0$      |
|-----------------------------------------------------|-----------------------|-----------------------|
| $x = 1$ , atomic <sub>a</sub> $\{y = 1\}$ , $r = 2$ | Irw •                 | ; Irw                 |
| $\  \text{ atomic}_{b} \{ q - x, z - 1 \}$          | Rx0                   | $\longrightarrow Wz1$ |

This execution is allowed by all variants discussed in Example 2.3, including  $ANTI'_{RW}$ .

**Example 3.3.** We now consider the limitations of our approach. Menon et al. [27] describes an idiom for *benign racy publication*. This outcome is considered desirable, yet our model forbids it: The final outcome q = 0 is *not* possible for the following program.

$$\begin{array}{ll} x \coloneqq 1; \operatorname{atomic}_{a} \{ y \coloneqq 1 \} & \qquad \forall x 1 \longrightarrow a \colon \forall y 1 \\ \| q \coloneqq 2; \operatorname{atomic}_{b} \{ r \coloneqq x; & \qquad \downarrow r & \qquad \downarrow cwr \\ & \quad \text{if } y \text{ then } q \coloneqq r \} & \qquad b \colon Rx0 \longrightarrow Ry1 \end{array}$$

The outcome is only allowed if b reads 0 for x and 1 for y, but this execution is disallowed by OBSERVATION.

Note that, in accordance with the name, the program is not race-free: the execution in which b reads 0 for y has a race on x. Thus, there is no canonical answer as to whether this execution is indeed benign and should be allowed.

**Example 3.4.** The literature describes a class of STMs that implement *eager versioning*, which create an undo log for each write, perform writes as they are encountered (as opposed to during commits). If the transaction aborts, the updates are rolled back to their original logged values. Shpeisman et al. [34] describe potential issues with eager versioning in a mixed mode SC setting. In our relaxed memory setting, we show that these have natural explanations.

Consider the following program.

Under SC, the final value r=0 is considered to be problematic [34, Figure 3a] since it follows from a scenario in which the non-transactional write  $\langle Wx2 \rangle$  is lost, known as a *speculative lost update*. Assuming SC, suppose transaction *a* executes its write to *x*, then second thread executes its first two writes. Since transaction *a* aborts, the write to *x* would be rolled back to 0. Transaction *b* would then skip over the update to *x* (because it now observes y = 1). This allows r = q = 0.

In our setting, the final value q = 0 is immediately disallowed by HB<sub>BASE</sub> and CAUSALITY. Moreover, the first thread may read either 0 or 2 for *x*, whereas the second thread must read 2 for *x*, i.e., non-transactional write  $\langle Wx2 \rangle$  is not lost.

$$(a:Ry0 \longrightarrow Wx1) \longrightarrow (b:Ry1) \longrightarrow Rx0$$

$$\downarrow ww \downarrow \qquad \uparrow lwr$$

$$Wx2 \longrightarrow Wy1 \longrightarrow Rx2$$

The scenario above may also result in executions such as:

$$(a:Ry0 \longrightarrow Wx1) \longrightarrow b:Ry0 \longrightarrow Wx1 \longrightarrow Rx2$$

$$|ww \downarrow \qquad |wr \qquad |w$$

where transaction *a* successfully writes  $\langle Wx1 \rangle$ . Again, the non-transactional write  $\langle Wx2 \rangle$  is available for the final reads in both threads.

**Example 3.5.** Analogous to eager versioning is a class of STMs that implement *lazy versioning* that cache writes locally within a transaction and update shared memory during a transaction's commit operation. Shpeisman et al. [34] discuss potential problems with lazy versioning in a mixed-mode setting. We consider the most interesting of these below.

Suppose z is an array in the program below.

atomic<sub>a</sub> { r := x; x := 42 };  $r_1 := z[r]$ ;  $r_2 := z[r]$ ; z[r] := 0|| atomic<sub>b</sub> { q := x; if  $q \neq 42$  then z[q] := z[q] + 1 }

The first thread atomically caches x and privatizes it by setting it to a special value (denoted here by 42). From a programmer's perspective z[r] should not be read by other threads. However, in a lazy-versioning STM, transaction bmay have been serialized before transaction a, yet contain a *buffered write* to z[q]. Thus, the reads of z[r] may race with the buffered write to z[q]. A consequence of this is the execution below, where the two reads of z[0] return different values.

$$\begin{array}{c} (a: Rx0 \longrightarrow Wx42) \longrightarrow Rz[0]0 \longrightarrow Rz[0]1 \longrightarrow Wz[0]0 \\ \hline \\ \hline \\ (b: Rx0 \longrightarrow Rz[0]0 \longrightarrow Wz[0]1 \end{array} \right) \xrightarrow{\mathsf{hvw}} Wz[0]1$$

The final outcome  $r_1 \neq r_2$  is considered problematic in [34]. This outcome is disallowed by any variant of our model that includes ANTI<sub>RW</sub> (Example 2.3). By ANTI<sub>ww</sub>, the execution becomes inconsistent if we reverse the  $-\frac{|ww|}{2}$  order above. Thus, the outcome  $z[0] \neq 0$  is forbidden by our model. This outcome is also considered problematic in [34].

#### 4 Local Transactional Race Freedom

We introduce the concepts behind localising data race freedom (LDRF [9]) by example. Consider the program:

$$x \coloneqq 1; y \coloneqq 1; atomic_a \{ F \coloneqq 1 \}; z \coloneqq 1$$

|| y = 2; atomic<sub>b</sub> { r = F }; z = 2; if r then w = x + y - y

Consider the case where b reads F from a, as depicted below. We leave the write-to-write orders and the values of the last four actions of the second thread unspecified.

$$Wx1 \to Wy1 \to a:WF1 \to Wz1$$

 $Wy2 \rightarrow b:RF1 \rightarrow Wz2 \rightarrow Rx \rightarrow Ry \rightarrow Ww$ There are write-write races between  $\langle Wy1 \rangle$  and  $\langle Wy2 \rangle$ , and between  $\langle Wz1 \rangle$  and  $\langle Wz2 \rangle$ . By some definitions of race, the write  $\langle Wy1 \rangle$  is also racing with the two reads of *y*. Thus, a global notion of race-freedom does not allow one to conclude anything about this program. A localised notion, however, would allow one to deduce that  $\langle Wx1 \rangle$  is correctly published to the second thread. Moreover, the two reads of *y* must see the same value and hence, the value written to *w* must be 1.

LDRF is defined relative to (1) a set  $\Sigma$  of traces, generated by the semantics of a program, (2) a set *L* of locations, and (3) a trace  $\sigma \in \Sigma$ , denoting a partial execution. For the example,  $\Sigma$  is fixed by the program. Let  $L = \{x, y, F\}$ . A race is an *L*-race if it involves a location in *L*; thus the race between  $\langle W z 1 \rangle$  and  $\langle W z 2 \rangle$  is not considered an *L*-race.

Now consider the trace  $\sigma = \langle Wx1 \rangle \langle Wy1 \rangle \langle a:B \rangle \langle WF1 \rangle \langle Ca \rangle \langle Wy2 \rangle \langle b:B \rangle \langle RF1 \rangle \langle Cb \rangle$  that linearizes the execution above. This  $\sigma$  contains an *L*-race between  $\langle Wy1 \rangle$  and  $\langle Wy2 \rangle$ . Nonetheless,  $\sigma$  is *L*-stable for  $\Sigma$  because there is no  $\sigma \rho \in \Sigma$  that includes an *L*-race between any action of  $\sigma$  and an action of  $\rho$ . It is important to note the definition of stability is relative to the set  $\Sigma$ . Trace  $\sigma$  is stable for *this program*, but would not be stable if, for example, the program is modified so that the first thread reads y after writing z := 1.

Having fixed  $\sigma$ , we now consider the *L*-sequential extensions of this prefix. These extensions are constrained to obey the sequential semantics for locations in *L*. Extensions that do not touch *L*, such as the writes to *z*, are unconstrained.

The SC-LDRF theorem says that either every extension of  $\sigma$  is *L*-sequential, or there is some *L*-sequential extension with an *L*-race. Since no *L*-sequential extension has a race, the program must behave sequentially from  $\sigma$ , guaranteeing that the read of *x* sees 1, that the two reads of *y* see the same value, and thus that the value written for *w* is 1.

The use of L in the definitions serves as an obvious spatial bound on races. The temporal bounds are less direct: By semantic fiat, future races can be ignored, since reads cannot see the future. By L-stability, past races are also excluded.

From D to T. Locations used to store data are often disjoint from locations used to perform synchronization. In TRF, a single location may serve both purposes. This is the chief difficulty in extending LDRF to LTRF. Consider the program x := 1; atomic { x := 2 } || atomic { r := x } with executions:

(1) 
$$a:Wx1 \longrightarrow b:Wx2$$
  
 $vr \rightarrow c:Rx1$ 
(2)  $a':Wx1 \longrightarrow b':Wx2$   
 $vr \rightarrow c:Rx2$ 
(3)  $c:Wx1 \longrightarrow b':Wx2$ 

Since  $\xrightarrow{wr}$  only creates happens-before order between committed transactions, there is a race in execution (1) but not (2). Consider the linearizations in which the read occurs last in the trace. We analyze by setting  $L = \{x\}$ . In trace *abc*, *c* is not *L*-sequential, whereas in a'b'c', c' is *L*-sequential. In the SC-DRF theorem of [9], it is required that whenever there is a nonsequential racy read at the end of trace, such as *c*, we must be able to find a trace with a sequential read, such as c', that preserves the race. But here, this is impossible.

Note, however, that *ac* is *L*-sequential and has an *L*-race. In generalizing the SC-DRF theorem of [9] to mixed accesses, we must consider such prefixes. When transactional and plain accesses are disjoint this is not necessary, since wellformedness already guarantees sequential order between transactions. But well-formedness does not constrain interactions between transactional and plain access.

Intuitively, [9] proves that data races can be discovered by sequential reasoning. In the case of transactions, this is not enough. We must also have that all data races can be discovered by *executing transactions one-at-time*. To achieve this, we generalize the theorem to allow *permutations* that preserve order while ensuring that all actions of a transaction are *contiguous* in the trace.

*L-Races.* Two actions are in *L-conflict* if they both access the same  $x \in L$ , at least one is plain, at least one is a write, and neither is aborted.

We say that (b, c) is an *L*-race if *b* and *c* are in *L*-conflict and  $b \xrightarrow{index} c$ , but not  $b \xrightarrow{hb} c$ . Two transactional actions cannot be in a race.

In global DRF, conflicting actions must be ordered by  $\xrightarrow{hb}$ ; local DRF additionally constrains the direction of the order. This captures one form of temporal locality: future actions cannot causally affect the past.

*L*-Sequentiality and *L*-Stability. For  $L \subseteq Loc$ , we say that *c* is *L*-sequential if *c* does not touch any location in *L*, or if *c* is a B, C, or A action, or if we have both of the following:

- 1. there is no  $b \xrightarrow{\text{index}} c$  such that  $c \xrightarrow{ww} b$ , and 2. if  $a \xrightarrow{wr} c$  then there is no  $b \xrightarrow{\text{index}} c$  such that  $a \xrightarrow{ww} b$ .

Condition (1) applies when c is a write; it ensures that the timestamp chosen for *c* is larger than all preceding timestamps. Condition (2) applies when c is a read; it ensures that *c* reads from the preceding write with the largest timestamp.

An action that is not *L*-sequential is *L*-weak. Any *L*-weak action participates in an L-race: for writes, this follows from COHERENCE; for reads, from OBSERVATION.

Let  $\Sigma$  be a set of traces. A trace  $\sigma$  is *L*-stable for  $\Sigma$  if for every *L*-sequential  $\rho$  such that  $\sigma \rho \in \Sigma$ , there is no  $a \in \sigma$  and  $b \in \rho$  such that (a, b) is an *L*-race.

Transactional L-Sequentiality and L-Stability. Transaction b is contiguous if  $\langle b:sB \rangle \xrightarrow{index} \langle c:t \rangle$  and  $s \neq t$  imply that either  $\langle Cb \rangle \xrightarrow{index} c$ ,  $\langle Ab \rangle \xrightarrow{index} c$ , or there are no actions of s after c, i.e.,  $c \xrightarrow{index} \langle d:s' \rangle$  implies  $s \neq s'$ .

Note that contiguity allows multiple live transactions.

A trace is transactionally L-sequential if every action is *L*-sequential and every transaction is contiguous.

A trace  $\sigma$  is *transactionally L*-*stable* for  $\Sigma$  if it is *L*-stable for  $\Sigma$ , every transaction is both contiguous and resolved, and there is no  $\beta \in \sigma$ ,  $\sigma \rho \in \Sigma$ , and  $\alpha \in \rho$  such that  $\alpha$  touches a variable in *L* and  $\alpha \xrightarrow{\text{xrw}} \beta$ .

The last condition ensures that a stable state is "future proof" by making all new conflicting transactions serialize afterwards.

Closure Conditions on Programs. The SC-LTRF theorem requires that we relate an arbitrary execution to one that is transactionally L-sequential. To ensure that such an execution exists, we assume that the semantics of programs is closed under certain operations.

We first give some preliminary definitions.

Let  $\stackrel{\text{act}}{\sim}$  relate actions with the same thread and location:

$$\langle a:s W x v q \rangle \stackrel{\text{act}}{\sim} \langle a':s' W x' v' q' \rangle$$
 if  $a = a', s = s'$  and  $x = x'$   
 $\langle a:s R x v q \rangle \stackrel{\text{act}}{\sim} \langle a':s' R x' v' q' \rangle$  if  $a = a', s = s'$  and  $x = x'$ 

A set  $\Sigma$  of traces is *sequentially-closed* if whenever a trace  $\sigma \alpha \in \Sigma$  includes a *Loc*-weak action  $\alpha$ , there exists a *Loc*sequential action  $\alpha' \stackrel{\text{act}}{\sim} \alpha$  such that  $\sigma \alpha' \in \Sigma$ .

For  $a \in \sigma$ , let  $\sigma \downarrow a$  be the subsequence of  $\sigma$  obtained by removing all the events that causally follow *a*:

$$b \notin (\sigma \downarrow a)$$
 iff  $a \xrightarrow{\text{hb}} \cup \xrightarrow{\text{lwr}} \cup \xrightarrow{\text{xrw}})^+ l$ 

We say that a set of traces  $\Sigma$  is *causally closed* iff for any  $\sigma \in \Sigma$ , for any  $a \in \sigma$ ,  $\sigma \downarrow a \in \Sigma$ .

Intuitively,  $\sigma \downarrow a$  removes "causal upclosure" of *a* from  $\sigma$ . Significantly, if  $(b, \alpha)$  is an *L*-race in  $\sigma \alpha$ , then  $b \in \sigma \alpha \downarrow \alpha$ . This property does not hold for the "causal downclosure."

For any consistent trace  $\sigma$ , we say that  $\rho$  is an orderpreserving permutation of  $\sigma$  if  $\rho$  is a well-formed permutation of  $\sigma$  and  $\xrightarrow{po}_{\rho} = \xrightarrow{po}_{\sigma}$ .

If a trace is consistent, then any order-preserving permutation is also consistent, since the derived orders coincide. In addition, any consistent trace has an order-preserving permutation with contiguous transactions. We say that  $\Sigma$  is valid as the *semantics of a program* if (1) every  $\sigma \in \Sigma$  is consistent, (2)  $\Sigma$  is sequentially closed, (3)  $\Sigma$  is causally closed, and (4)  $\Sigma$  is closed under order preserving permutation.

**Theorem 4.1** (SC-LTRF). Fix  $\Sigma$  to be the semantics of a program. Fix  $\sigma \rho \alpha \in \Sigma$  such that

- $\sigma$  is transactionally *L*-stable,
- $\rho$  is transactionally L-sequential in  $\sigma \rho$ ,
- $\rho$  has no L-races in  $\sigma \rho$ , and
- $\alpha$  is L-weak in  $\sigma \rho \alpha$ .

Then, there are  $b \in \rho$ ,  $\alpha' \stackrel{\text{act}}{\sim} \alpha$  and  $\sigma \rho' \alpha' \in \Sigma$  such that:

•  $\rho' \alpha'$  is transactionally L-sequential in  $\sigma \rho' \alpha'$ , and

•  $(b, \alpha')$  is an L-race in  $\sigma \rho' \alpha'$ .

With respect to the SC-LDRF theorem in [9], the SC-LTRF result differs in that we allow  $\rho' \neq \rho$  and use the *transactional* variants of *L*-stability and *L*-sequentiality, which require that we only consider traces with contiguous transactions. In an *L*-stable trace, all transactions must also be resolved. In the degenerate case, with only contiguous committed singleton transactions, the definitions of SC-LDRF and SC-LTRF coincide.

For example, consider the (IRIW) program from the introduction. Reasoning sequentially, we know that we cannot read 1 followed by 0 in both threads. SC-LDRF validates this reasoning for concurrent executions. Likewise, the publication and privatization examples from the introduction have the expected behavior. As a further example in this vein, consider the following program.

atomic<sub>a</sub> { if !y then while x do skip }  
|| atomic<sub>b</sub> { 
$$y \coloneqq 1$$
 };  $x \coloneqq 1$ 

If it is possible for a to read 0 for y and then 1 for x, then a becomes a *doomed transaction*, which can never commit. By sequential reasoning, this is impossible, and therefore, by SC-LTRF, it is impossible in our model.

It is worth emphasizing that the SC-LTRF theorem includes aborted and live transactions, and thus guarantees opacity. In addition, the following result shows that aborted transactions can be ignored.

**Theorem 4.2.** If  $\sigma$  is consistent then so is  $\sigma$  with aborted transactions removed.

## 5 Implementation Model

An optimization is *valid* as long as it creates no new behaviors. As noted in §2, LDRF disables reads from being reordered with later writes. Thus we cannot transform r:=z; x:=1 to x:=1; r:=z. Unfortunately, the reverse transformation *also* fails in our programmer model, due to the order created by HB<sub>ww</sub>. Consider the following variant of privatization:

$$z \coloneqq 1$$
; atomic<sub>a</sub> { if !y then  $x \coloneqq 1$  }  
|| atomic<sub>b</sub> {  $y \coloneqq 1$  };  $x \coloneqq 2$ ;  $r \coloneqq z$ 

The second thread must read  $\langle Rz1 \rangle$ . If not, we would obtain the following execution, which is disallowed by OBSERVA-TION.

$$Wz1 \longrightarrow \underbrace{a: Ry0 \longrightarrow Wx1}_{crw \xrightarrow{\downarrow}} \underbrace{ww}_{\downarrow} \qquad (\ddagger)$$

Note that  $\langle Wx2 \rangle \xrightarrow{ww} \langle Wx1 \rangle$  is ruled out by ANTI<sub>ww</sub>, and so we must have  $\langle Wx1 \rangle \xrightarrow{ww} \langle Wx2 \rangle$ , as shown. By HB<sub>ww</sub>, we have  $\langle Wx1 \rangle \xrightarrow{hb} \langle Wx2 \rangle$ , and thus by transitivity,  $\langle Wz1 \rangle \xrightarrow{hb} \langle Rz0 \rangle$ . OBSERVATION rules out the execution, since  $\langle Rz0 \rangle \xrightarrow{rw} \langle Wz1 \rangle$ .

However if we replace " $x \coloneqq 2$ ;  $r \coloneqq z$ " by " $r \coloneqq z$ ;  $x \coloneqq 2$ " in the program above, then the second thread may read  $\langle Rz0 \rangle$ , since we no longer have  $\langle Wz1 \rangle \xrightarrow{hb} \langle Rz0 \rangle$ . The resulting allowed execution shows that the optimization is not valid:

$$Wz1 \longrightarrow \underbrace{a:Ry0 \longrightarrow Wx1}_{crw^{-1}} \underbrace{b:Wy1}_{rw} \xrightarrow{ww}_{rw} Wx2$$

In this section, we consider an "implementation" model that removes HB<sub>ww</sub>. Since HB<sub>ww</sub> is designed to allow non-racy privatization, it should not be surprising that privatization is racy in the implementation model. To enable the removal of such races, we add the new action  $\langle sQx \rangle$  to model a *quiescence fence* [36] for thread *s* on location *x*.

Note that our implementation model is still fairly abstract. We assume that the underlying transactional machinery provides order between transactions that have a direct dependency, as in the publication idiom. Quiescence fences are necessary only to provide order when there is no direct dependency, as in the privatization idiom.

$$\alpha := \cdots | \langle a:sQx \rangle \qquad (Quiesence fence)$$

A quiescence fence  $\langle Qx \rangle$  may not be interleaved with a transaction that touches *x*. We therefore add the following requirement to *well-formedness*:

WF<sub>12</sub>. If  $\langle b:B \rangle \xrightarrow{\text{index}} \langle Qx \rangle$  then either  $\langle Cb \rangle \xrightarrow{\text{index}} \langle Qx \rangle$ ,  $\langle Ab \rangle \xrightarrow{\text{index}} \langle Qx \rangle$  or *b* neither reads not writes *x*.

In addition, quiescence fences create order. In the definition of *happens-before*, we replace HB<sub>ww</sub> by the following.

$$\langle a:Cb \rangle \xrightarrow{\text{hb}} \langle c:Qx \rangle \text{ if } a \xrightarrow{\text{index}} c \text{ and } b \text{ touches } x \quad (\text{HB}_{CQ})$$
  
 $\langle c:Qx \rangle \xrightarrow{\text{hb}} \langle b:B \rangle \text{ if } c \xrightarrow{\text{index}} b \text{ and } b \text{ touches } x \quad (\text{HB}_{QB})$ 

Because we have removed  $HB_{ww}$ , we also drop  $AntI_{ww}$  from the definition of a *consistent* execution. The remaining definitions are unchanged in the implementation model.

**Relating implementation and programmer models.** The implementation model allows executions that are not allowed by the programmer model. Since  $ANTI_{ww}$  is removed, Example 2.2 is allowed in the implementation model; however, there is no matching execution in the programmer model: If *a* precedes *b*, then the read of *a* is invalidated by

OBSERVATION. If *b* precedes *a*, the write-to-write order is invalidated by COHERENCE. Since HB<sub>ww</sub> is removed, ( $\ddagger$ ) is allowed in the implementation model; however, there is no matching execution in the programmer model: If *a* precedes *b*, then the read of *z* is invalidated by OBSERVATION. If *b* precedes *a*, then the read of *y* is invalidated by WF<sub>10</sub>.

We say that  $\sigma$  has a *mixed race* if there is some  $L \subseteq Loc$  such that  $\sigma$  includes an action in an *L*-race between a transactional write and a plain write.

The following lemma establishes that the implementation and programmer models coincide for programs without mixed races. Therefore, for *mixed-race free* programs in the implementation model, SC-LTRF holds. Khyzha et al. [22] establish a similar result for global TRF.

**Lemma 5.1.** Let  $\sigma$  be an execution in the implementation model without mixed races. Let  $\rho$  be the induced execution in the programmer model obtained by dropping all the quiescence fences in  $\sigma$ . If  $\sigma$  is consistent, then so is  $\rho$ .

**Suborders.** The quiescent fence  $\langle Qx \rangle$  has the same ordering properties as a committed transaction that writes *x*:  $\langle a:B \rangle \langle Qx \rangle \langle Ca \rangle$ . For the purpose of studying compiler optimizations, we encode quiescent fences thusly as writing transactions. With this convention, we do not mention  $\langle Qx \rangle$  explicitly in the following development. The treatment follows [9] closely, including much of the notation and proofs. We need only adapt their definitions to work up to  $\overset{\text{tx}}{\sim}$ .

Let  $TAct = \{\langle B \rangle, \langle C \rangle, \langle A \rangle\}$ . Define the following subsets of  $\xrightarrow{po} \setminus (Act \times TAct \cup TAct \times Act)$ , i.e., the portion of  $\xrightarrow{po}$  that does not involve the transactional boundaries. In the following definitions, we quantify universally over  $a, b \in Act \setminus TAct$ ; all other actions are quantified existentially.

We say action *a conflicts with b* iff they access the same location at least one of *a* or *b* is a write.

 $\begin{array}{l} a \xrightarrow{\text{po-T}} b \text{ iff } a \xrightarrow{\text{po}} b, a \xrightarrow{\text{tx}} b, b \xrightarrow{\text{tx}} \langle B \rangle, \text{ and } b \xrightarrow{\text{tx}} \langle W \rangle \\ a \xrightarrow{\text{poT}} b \text{ iff } a \xrightarrow{\text{po}} b, a \xrightarrow{\text{tx}} b, \text{ and } a \xrightarrow{\text{tx}} \langle B \rangle \\ a \xrightarrow{\text{poT}} b \text{ iff } a \xrightarrow{\text{po}} b, a \xrightarrow{\text{poT}} b \text{ and } a \xrightarrow{\text{po-T}} b \\ a \xrightarrow{\text{poRW}} b \text{ iff } a \xrightarrow{\text{po}} b, a = \langle R \rangle, \text{ and } b = \langle W \rangle \\ a \xrightarrow{\text{poCon}} b \text{ iff } a \xrightarrow{\text{po}} b \text{ and } a \text{ conflicts with } b \end{array}$ 

The relations  $\xrightarrow{po-T}$ ,  $\xrightarrow{poT-}$ ,  $\xrightarrow{poT-}$  do not relate actions from the same transaction.  $\xrightarrow{po-T}$  is that subset of  $\xrightarrow{po}$  that ends in a transactional action of a writing transaction;  $\xrightarrow{poT-}$  is the subset of  $\xrightarrow{po}$  that begins in a resolved transactional action; whereas  $\xrightarrow{poTT}$  is the subset of  $\xrightarrow{po}$  that begins and ends in transactional actions with target being a writing transaction. The targets of relations  $\xrightarrow{poTT}$  and  $\xrightarrow{po-T}$  are restricted to transactions that contain a write action; this restriction mirrors the treatment of read actions of volatiles in [9] and ensures that read-only transactions have greater flexibility in commuting earlier in program order.  $\xrightarrow{poRW}$  is that subset of  $\xrightarrow{poCon}$  restricts  $\xrightarrow{po}$  to conflicting actions. In the supplementary material for this paper, we describe an equivalent definition of consistency that uses only these suborders instead of the full  $\xrightarrow{\text{po}}$ . This characterization of consistency is useful for proving the correctness of the optimizations enumerated in the next subsection.

**Compiler optimizations.** Consider a program transformation  $P \triangleright Q$ , where Q is a program obtained from P by reordering its statements. To validate the transformation, for any execution  $\rho$  of Q, we must associate a corresponding execution  $\sigma$  of P. We consider three flavors.

In the first method, the transformation is correct if there is no change in transactional actions, and

$$\begin{array}{c} (\xrightarrow{\text{po-T}} \sigma, \xrightarrow{\text{poT-}} \sigma, \xrightarrow{\text{poTT}} \sigma, \xrightarrow{\text{poRW}} \sigma, \xrightarrow{\text{poCon}} \sigma) \\ = (\xrightarrow{\text{po-T}} \rho, \xrightarrow{\text{poT-}} \rho, \xrightarrow{\text{poTT}} \rho, \xrightarrow{\text{poRW}} \rho, \xrightarrow{\text{poCon}} \rho) \end{array}$$

This allows, for example, the reordering of independent writes and of independent reads. Dolan et al. [9] show how to prove the validity of some peephole optimizations using this flexibility: redundant load, store forwarding, dead store elimination, common subexpression elimination, constant propagation and loop invariant code motion. We show that:

*P*; atomic 
$$\{Q\} \triangleright$$
 atomic  $\{Q\}$ ; *P*

if *Q* is read-only, *P* is write-only and there are no conflicts between *P*, *Q*. For correctness, note that  $\xrightarrow{\text{poTT}}$  and  $\xrightarrow{\text{po-T}}$  relations do not target read-only transactions. The absence of conflict between *P*, *Q* ensures the preservation of  $\xrightarrow{\text{poCon}}$ . Moreover,  $\xrightarrow{\text{poRW}}$  is preserved because *P* is write only.

Secondly, we validate transformations, such as the roach motel optimization, where the only change is increase in the scope of transactions; i.e, when *P* and *Q* are nontransactional:

$$P; \text{ atomic } \{R\}; Q \triangleright \text{ atomic } \{P; R; Q\}.$$

Given  $\rho$  from atomic { *P*; *R*; *Q* }, we establish the consistency of the corresponding  $\sigma$  from *P*; atomic { *R* }; *Q* by showing that all relevant orders of  $\sigma$  are contained in those of  $\rho$ .

Thirdly, we validate the fusion of adjacent transactions:

atomic 
$$\{P\}$$
; atomic  $\{Q\} \triangleright$  atomic  $\{P; Q\}$ 

Given  $\rho$  from atomic { *P*; *Q* }, we build  $\sigma$  for atomic { *P* }; atomic { *Q* } by adding two adjacent transactional events. On the other hand, the converse transformation is not validated. This is because we need to remove the two extra events to build a witness execution of atomic { *P*; *Q* } from a given execution of atomic { *P* }; atomic { *Q* }. These events are not necessarily adjacent; so, the validity of the constructed execution cannot be established in general.

We can similarly establish that empty transactions can be elided, i.e.,

*P*; atomic{}; 
$$Q \triangleright P$$
; *Q*.

#### 6 Compilation

Dolan et al. [9] show that the LDRF memory model can be compiled efficiently to both x86-TSO and AArch64/ARMv8.

Compilation of LDRF to x86-TSO requires no additional fencing. Therefore non-volatile reads/writes execute with native performance.

Because ARMv8 allows load buffering (which is disallowed by LDRF), compilation to ARMv8 requires some fencing, even for non-volatile reads/writes. [9] discusses two compilation schemes and studies their performance on several benchmarks with differing patterns of access. The performance penalty is 2.5% for one compilation strategy and 0.6% for the other. These results demonstrate that non-volatile access is not appreciably slowed by the insertion of fences to prevent load buffering.

The compilation results for plain variables carry over to our model, which differs from [9] primarily in the style of synchronization: [9] uses volatile variables, whereas we use transactions. In both x86-TSO and ARMv8 models, there are fences before and after successful transactions (see [6]), making the fencing behavior similar to that of volatile variables.

Both x86-TSO and ARMv8 validate our *implementation* model, assuming we include fences to prevent load-buffering in ARMv8, as described above.

In x86-TSO,  $\cdots$  order is included in  $\xrightarrow{hb}$ . Thus, it is straightforward to establish that x86-TSO validates even the strongest variant of our *programmer* model, which includes HB<sub>ww</sub>, HB<sub>Rw</sub>, HB<sub>WR</sub> and their prime variants. Like our programmer model, x86-TSO validates privatization (Example 2.1). Like models that include ANTI'<sub>RW</sub>, x86-TSO imposes publication by antidependence (Example 3.1). Neither of these examples require quiescent fences on x86-TSO.

It is not immediately obvious whether ARMv8 realizes our *programmer* model. In ARMv8,  $\stackrel{ob}{\longrightarrow}$  plays the role of  $\stackrel{hb}{\longrightarrow}$ . The  $\stackrel{crw}{\longrightarrow}$  relation is included in  $\stackrel{ob}{\longrightarrow}$  when the source and target come from different threads, known as *external from-read*. As a result, ARMv8 gives the same strong result as x86-TSO for Examples 2.1 and 3.1.

We expect that software transactional memories will realize the *implementation* model of §5, rather than the *programmer* model. As a result, it will be necessary for either the programmer or compiler to insert quiescent fences in order to realize our *programmer* model. Our results provide a correctness criterion: when are there sufficient fences to guarantee the absence of data races in the *implementation* model. As we discuss in §7, our work on placing quiescent fences is compatible with, and builds on, the extensive literature exploring this topic.

## 7 Related Work and Conclusions

Transactions [12, 18, 33] are motivated by the issues that arise with lock-based programming. See [14, 16, 17, 23] for textbook-style presentations. Hardware transactional models that integrate with relaxed memory are available for Pentium, Power and ARMV8 (in design) [5, 6, 10]. Software transactional memory achieves transactional guarantees limitations of the "bounded" and "best-effort" hardware transactional model, e.g., the C++ design of transactions [29] in C11 [4], Haskell transactions in GHC 6.4, experimental designs for Java [20] and C# [2].

Inspired by Dalessandro et al. [7] and Grossman et al. [13], we use memory orders to integrate transactions into the relaxed memory model of Dolan et al. [9].

In order to permit compiler optimizations, the LDRF model of [9] is more liberal than sequential consistency. Yet it eschews the speculative reads found in many models [19, 21, 25]. There is a rich design space for such "intermediate" models. Ou and Demsky [30] includes a survey of this work.

Transactional sequential consistency is similar to the the strong semantics [1], StrongBasic semantics [28], strong isolation [17], and transactional memory with store atomicity by [24]. Opacity [15, 16] and *TMS2* [8] treat aborted transactions in this context (see [11] for a survey).

Our model of SC-TDRF replaces the global real-time order by memory orders. We exploit the LDRF framework [8] to achieve a modular form of LTRF that is insensitive to races that are spatially and temporally isolated from the transactions under consideration. LDRF is defined operationally in [9], using machine states. We give an axiomatic account. The two approaches are equivalent if every machine state is derivable from the initial state.

Our results in §5 show that our model does not suffer from "optimization obstruction" [35]. Prior work, e.g., [22, 34, 35], requires that programmers place quiescence fences in order to guarantee safe privatization. Our low level model illustrates the correctness criteria for such techniques.

In Spear et al. [35], transactions can optionally be marked with annotations corresponding to publishing/privatizing transactions. The weakest ordering  $\frac{st_s}{2}$  in [35] is the smallest transitive relation that includes transactional ordering and ensures that  $a \xrightarrow{st_s} c$  in the cases when: (1) a is an acquire transaction,  $a \xrightarrow{po} c$ , and  $a \xrightarrow{t} c$ , or (2) there is some release transaction b such that  $a \xrightarrow{po} b$  and either  $b \xrightarrow{lwr} c$  or ais transactionally ordered before c. There are two kinds of fences in the implementation level model of §5, namely the explicit quiescence fences  $\langle Qx \rangle$ , and the implicit memory fences arising from our transactional abstraction. In each case, we can deduce  $a \xrightarrow{st_s} c$ , thus showing that our requirements for synchronization are no stronger than those of [35].

Our treatment of the implementation model is inspired by Khyzha et al. [22]. They divide actions into request/response pairs such that transactional response actions may abort. Our treatment is more abstract. We record all failed requests using a single abort action. Our commit action corresponds to the commit request in [22]. All of our other actions correspond to a response in [22].

#### References

- M. Abadi, A. Birrell, T. Harris, and M. Isard. 2011. Semantics of Transactional Memory and Automatic Mutual Exclusion. ACM Trans. Program. Lang. Syst. 33, 1, Article 2 (Jan. 2011), 50 pages.
- [2] M. Abadi, T. Harris, and M. Mehrara. 2009. Transactional memory with strong atomicity using off-the-shelf memory protection hardware. In *PPoPP*, D. A. Reed and V. Sarkar (Eds.). ACM, 185–196.
- [3] S. V. Adve and M. D. Hill. 1990. Weak Ordering-a New Definition. In Proceedings of the 17th Annual International Symposium on Computer Architecture (ISCA '90). ACM, New York, NY, USA, 2-14.
- [4] H.-J. Boehm and S. V. Adve. 2008. Foundations of the C++ concurrency memory model. In PLDI. ACM, 68–78.
- [5] H. W. Cain, M. M. Michael, B. Frey, C. May, D. Williams, and H. Q. Le. 2013. Robust architectural support for transactional memory in the Power architecture. In *ISCA*, A. Mendelson (Ed.). ACM, 225–236.
- [6] N. Chong, T. Sorensen, and J. Wickerson. 2018. The semantics of transactions and weak memory in x86, Power, ARM, and C++. In *PLDI*. ACM, 211–225.
- [7] L. Dalessandro, M. L. Scott, and M. F. Spear. 2010. Transactions As the Foundation of a Memory Consistency Model. In *DISC (LNCS)*. Springer-Verlag, Berlin, Heidelberg, 20–34.
- [8] S. Doherty, L. Groves, V. Luchangco, and M. Moir. 2013. Towards formally specifying and verifying transactional memory. *Formal Asp. Comput.* 25, 5 (2013), 769–799.
- [9] S. Dolan, K. C. Sivaramakrishnan, and A. Madhavapeddy. 2018. Bounding data races in space and time. In *PLDI*, Jeffrey S. Foster and Dan Grossman (Eds.). ACM, 242–255.
- [10] B. Dongol, R. Jagadeesan, and J. Riely. 2018. Transactions in relaxed memory architectures. *PACMPL* 2, POPL (2018), 18:1–18:29.
- [11] D. Dziuma, P. Fatourou, and E. Kanellou. 2015. Consistency for Transactional Memory Computing. Springer International Publishing, Cham, 3–31.
- [12] J. E. Gottschlich and H.-J. Boehm. 2013. Generic programming needs transactional memory. In *The 8th ACM SIGPLAN Workshop on Trans*actional Computing.
- [13] D. Grossman, J. Manson, and W. Pugh. 2006. What do high-level memory models mean for transactions?. In *Memory System Performance* and Correctness. ACM, New York, NY, USA, 62–69.
- [14] D. Grossman, V. Menon, S. Srinivas, and C. Zilles. 2007. Transactional Memory in Managed Runtimes - Hardware/Software View. https: //www.microarch.org/micro40
- [15] R. Guerraoui and M. Kapalka. 2008. On the Correctness of Transactional Memory. In PPoPP. ACM, New York, NY, USA, 175–184.
- [16] R. Guerraoui and M. Kapalka. 2010. Principles of Transactional Memory. Morgan & Claypool Publishers.
- [17] T. Harris, J. Larus, and R. Rajwar. 2010. Transactional Memory, 2nd edition. Morgan & Claypool Publishers.
- [18] M. Herlihy and J. E. B. Moss. 1993. Transactional Memory: Architectural Support for Lock-Free Data Structures. In *ISCA*, A. J. Smith (Ed.). ACM, 289–300.
- [19] R. Jagadeesan, C. Pitcher, and J. Riely. 2010. Generative Operational Semantics for Relaxed Memory Models. In ESOP. 307–326.
- [20] S. Jagannathan, J. Vitek, A. Welc, and A. Hosking. 2005. A transactional object calculus. *Science of Computer Programming* 57, 2 (2005), 164 – 186.
- [21] J. Kang, C.-K. Hur, O. Lahav, V. Vafeiadis, and D. Dreyer. 2017. A promising semantics for relaxed-memory concurrency. In *POPL*, Giuseppe Castagna and Andrew D. Gordon (Eds.). ACM, 175–189.
- [22] A. Khyzha, H. Attiya, A. Gotsman, and N. Rinetzky. 2018. Safe privatization in transactional memory. In PPOPP. ACM, 233–245.
- [23] J. Larus and C. Kozyrakis. 2008. Transactional Memory. Commun. ACM 51, 7 (July 2008), 80–88.
- [24] J.-W. Maessen and Arvind. 2007. Store Atomicity for Transactional Memory. Electronic Notes in Theoretical Computer Science 174, 9 (2007),

117 – 137. Proceedings of the Thread Verification Workshop (TV 2006).

- [25] J. Manson, W. Pugh, and S. V. Adve. 2005. The Java memory model. In POPL. 378–391.
- [26] M. Martin, C. Blundell, and E. Lewis. 2006. Subtleties of Transactional Memory Atomicity Semantics. *IEEE Comput. Archit. Lett.* 5, 2 (July 2006), 17–17.
- [27] V. Menon, S. Balensiefer, T. Shpeisman, A.-R. Adl-Tabatabai, R. L. Hudson, B. Saha, and A. Welc. 2008. Practical Weak-atomicity Semantics for Java Stm. In SPAA. ACM, New York, NY, USA, 314–325.
- [28] K. F. Moore and D. Grossman. 2008. High-level small-step operational semantics for transactions. In *POPL*, G. C. Necula and P. Wadler (Eds.). ACM, 51–62.
- [29] Y. Ni, A. Welc, A.-R. Adl-Tabatabai, M. Bach, S. Berkowits, J. Cownie, R. Geva, S. Kozhukow, R. Narayanaswamy, J. Olivier, S. Preis, B. Saha, A. Tal, and X. Tian. 2008. Design and Implementation of Transactional Constructs for C/C++. *SIGPLAN Not.* 43, 10 (Oct. 2008), 195–212.
- [30] Peizhao Ou and Brian Demsky. 2018. Towards Understanding the Costs of Avoiding Out-of-thin-air Results. *Proc. ACM Program. Lang.* 2, OOPSLA, Article 136 (Oct. 2018), 29 pages.
- [31] W. Pugh. 1999. Fixing the Java Memory Model. In Proceedings of the ACM 1999 Conference on Java Grande (JAVA '99). ACM, New York, NY, USA, 89–98.
- [32] A. Raad, O. Lahav, and V. Vafeiadis. 2018. On Parallel Snapshot Isolation and Release/Acquire Consistency. In ESOP (Lecture Notes in Computer Science), Vol. 10801. Springer, 940–967.
- [33] N. Shavit and D. Touitou. 1995. Software Transactional Memory. In PODC. ACM, New York, NY, USA, 204–213.
- [34] T. Shpeisman, V. Menon, A.-R. Adl-Tabatabai, S. Balensiefer, D. Grossman, R. L. Hudson, K. F. Moore, and B. Saha. 2007. Enforcing isolation and ordering in STM. In *PLDI*, J. Ferrante and K. S. McKinley (Eds.). ACM, 78–88.
- [35] M. F. Spear, L. Dalessandro, V. J. Marathe, and M. L. Scott. 2008. Ordering-Based Semantics for Software Transactional Memory. In *PODS*, T. P. Baker, A. Bui, and S. Tixeuil (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 275–294.
- [36] M. F. Spear, V. J. Marathe, L. Dalessandro, and M. L. Scott. 2007. Privatization Techniques for Software Transactional Memory. In *PODC*. ACM, New York, NY, USA, 338–339.

# A Proof of SC-LTRF Theorem

We begin with an example to explain the last condition in the definition of transactional *L*-stability.

**Example A.1.** Recall the definition of transactionally *L*-stability: A trace is *transactionally L-stable for*  $\Sigma$  if it is *L*-stable for  $\Sigma$ , every transaction is both contiguous and resolved, and there are no  $\sigma \rho \in \Sigma$ ,  $\beta \in \sigma$ , and  $\alpha \in \rho$  such that  $\alpha$  touches a variable in *L* and  $\alpha \stackrel{\text{xrw}}{\longrightarrow} \beta$ .

To see the need for the last requirement, consider the following consistent execution:

$$Wx1 \longrightarrow \underbrace{a:Wx2}_{\text{orw}}$$
$$\underbrace{b:Rx1 \longrightarrow Wy1}$$

Take *L* = {*y*} and consider the execution in  $\sigma$  contains the top thread,  $\rho$  contains the read of the bottom thread, and  $\alpha$  is the write. Ignoring initialization, we have  $\sigma = \langle sWx1 \rangle \langle a:sB \rangle \langle sWx2 \rangle \langle sCa \rangle$ ,  $\rho = \langle b:tB \rangle \langle tRx1 \rangle$ , and  $\alpha = \langle tWy1 \rangle$ .

This particular decomposition invalidates the theorem, since we must remove *a* from  $\sigma$  in order to linearize *b*, yet *a* occurs in  $\sigma$ .

The last requirement forbids this decomposition. In considering the trace where *a* occurs before *b*, we must include  $a \text{ in } \rho$ , not  $\sigma$ .

In order to prove the theorem, we first establish several lemmas. The first two concern causal closure. Recall that  $\sigma \downarrow a$  is the subtrace of  $\sigma$  that discards all causal dependents of *a*, defined as:

$$b \notin (\sigma \downarrow a) \text{ iff } a \xrightarrow{(hb)} \cup \xrightarrow{(wr)} \cup \xrightarrow{(xrw)})^+ b$$

In the rest of the appendix, we use the notation  $\sigma \downarrow \rho$  to stand for:

$$b \notin (\sigma \downarrow \rho) \text{ iff } (\forall a \in \rho) a \xrightarrow{\text{hb}} \cup \xrightarrow{\text{hvr}} \cup \xrightarrow{\text{xrw}})^+ b$$

It is immediate that  $\sigma \downarrow \rho$  is invariant under permutations of  $\rho$ .

Note that  $a \in \sigma \downarrow a$ . In the case where *a* is transactional, the effect of  $\sigma \downarrow a$  is to remove all the dependent transactions that read from the transaction, and also the anti-dependent transactions. Thus, for any transactional  $b \stackrel{\text{tx}}{\sim} a$  that is a read, there are no transactional conflicting writes in  $\sigma \downarrow a$  with a later timestamp.

The first lemma shows that  $\sigma$  is included in  $\sigma \rho \alpha \downarrow \alpha$  whenever it is stable.

**Lemma A.2.** Suppose  $\sigma$  is transactionally *L*-stable  $\rho$  is transactionally *L*-sequential in  $\sigma\rho$ , and  $\alpha$  touches a location in *L*. Then  $\sigma$  is a prefix of  $\sigma\rho\alpha \downarrow \alpha$ .

*Proof.* We show that for all  $b \in \sigma$ 

$$\neg(\alpha \xrightarrow{\text{hb}} \cup \xrightarrow{\text{lwr}} \cup \cdots \xrightarrow{\text{xrw}})^+ b)$$

It suffices to prove  $\forall c \in \sigma \rho \alpha$ , if  $a \in \sigma$  is such that

$$c \xrightarrow{\text{hb}} \cup \xrightarrow{\text{lwr}} \cup \cdots \rightarrow a,$$

PPoPP '19, February 16-20, 2019, Washington, DC, USA

then  $c \in \sigma$ . We proceed by cases:

- $c \xrightarrow{\text{xrw}} a$ . By the assumption that  $\rho$  is transactionally *L*-sequential. (This requires the assumption that  $\alpha$  touches a location in *L*.)
- $c \xrightarrow{\text{lwr}} a$ . Since  $\sigma$  is a prefix of  $\sigma \rho \alpha$ , the result follows by WF<sub>8</sub>.
- $c \xrightarrow{hb} a$ . There are two sub cases.
  - −  $c \xrightarrow{\text{hb}} a$  by item HB<sub>BASE</sub>. The required result follows by WF<sub>9</sub>−WF<sub>11</sub>.
  - $-c \xrightarrow{\text{hb}} a$  by item HB<sub>ww</sub>. The required result follows by transactional *L*-stability of *σ*. □

The next lemma establishes that causal closure preserves transactional *L*-sequentiality.

**Lemma A.3.** Suppose  $\sigma$  is transactionally L-stable  $\rho$  is transactionally L-sequential in  $\sigma\rho$ , and  $\alpha$  touches a location in L. Then  $\sigma\rho\alpha \downarrow \alpha = \sigma\rho'\alpha$ , where  $\rho'\alpha$  is transactionally L-sequential in  $\sigma\rho$ .

*Proof.* If  $\alpha \notin \sigma$ , then the result is trivial, since  $\sigma \downarrow \alpha = \sigma$ . Thus, assume  $\alpha \in \sigma$ . Using the Lemma A.2, we know that  $\sigma \rho \alpha \downarrow \alpha$  includes  $\sigma$ . Thus we can fix  $\rho'$  so that  $\sigma \rho \alpha \downarrow \alpha = \sigma \rho' \alpha$ .

First, we show that  $\sigma \rho' \alpha$  is well-formed. WF<sub>1</sub>–WF<sub>5</sub>, and WF<sub>7</sub>–WF<sub>8</sub> follow from the well-formedness of  $\sigma$ . WF<sub>6</sub> follows since  $\pi \downarrow \alpha$  is closed under the predecessors of  $\xrightarrow{\text{lwr}}$ , for any  $\pi$ . WF<sub>9</sub> and WF<sub>10</sub> follow since,  $\pi \setminus (\pi \downarrow \alpha)$  is closed under  $\stackrel{\text{tx}}{\sim}$ . WF<sub>11</sub> follows, since it is preserved under removal of actions.

Consistency of  $\sigma \rho' \alpha$  follows from the consistency of  $\sigma \rho \alpha$  since all relations on  $\sigma \rho' \alpha$  are subrelations of  $\sigma \rho \alpha$ .

Transactional *L*-sequentiality of  $\sigma \rho' \alpha$  follows from transactional *L*-sequentiality of  $\sigma \rho \alpha$ .

The next lemma establishes that that any *L*-weak action participates in an *L*-race. The proof mirrors the last two paragraphs of the proof theorem 13 of [9]. Instead of reasoning operationally, we use the consistency axioms COHERENCE and CAUSALITY.

**Lemma A.4.** Suppose  $\sigma$  is L-stable,  $\rho$  is L-sequential and  $\sigma\rho\langle c \rangle \in \Sigma$ . If c is L-weak then there exists some  $b \in \rho$  such that (b, c) is an L-race.

*Proof.* Suppose *c* is *L*-weak. Then, there exists a write action  $b \xrightarrow{index} c$  such that either

- $c \xrightarrow{ww} b$ , or
- $a \xrightarrow{\text{wr}} c$  and  $a \xrightarrow{\text{ww}} b$ ; thus,  $c \xrightarrow{\text{rw}} b$ .

If  $b \xrightarrow{hb} c$ , we have a contradiction, either because

- $c \xrightarrow{ww} b$  contradicts the irreflexivity of  $(\xrightarrow{hb}; \xrightarrow{lww})$ , or
- $c \xrightarrow{rw} b$  contradicts the irreflexivity of  $(\xrightarrow{hb}; \xrightarrow{lrw})$ .

So, (b, c) is an *L*-race. Further, *b* cannot be in  $\sigma$  since  $\sigma$  is *L*-stable; therefore, *b* must be in  $\rho$ , as required.

The next lemma says that every execution has an orderpreserving permutation with contiguous transactions. The proof formalizes the following argument: All the ordering between transactional actions is reflected in the causality order. Furthermore, two actions in the same transaction are treated identically by the causality order. Consequently, since the causality order is acyclic, we can use a linearization of it to achieve contiguity in transactions.

**Lemma A.5.** Let  $\Sigma$  be the semantics of a program, and fix  $\sigma \rho \in \Sigma$ . Suppose that all transactions of  $\sigma$  are contiguous and no transactions of  $\sigma$  are live in  $\sigma$ . Then there exists an orderpreserving permutation  $\sigma \pi$  of  $\sigma \rho$  such that  $\sigma \pi \in \Sigma$  and  $\sigma \pi$  has contiguous transactions.

*Proof.* By CAUSALITY,  $(\stackrel{\text{hb}}{\longrightarrow} \cup \stackrel{\text{lwr}}{\longrightarrow} \cup \stackrel{\text{xrw}}{\longrightarrow})$  is acyclic. Thus, we can extend  $(\stackrel{\text{hb}}{\longrightarrow} \cup \stackrel{\text{lwr}}{\longrightarrow} \cup \stackrel{\text{xrw}}{\longrightarrow})^*$  to a total order over the actions of  $\sigma\rho$ . Fix such a total order (with the initializing begin transaction as minimal element), and let *R* be the suborder that includes only nontransactional actions and begin actions. We extend *R* to a total order over the actions of  $\sigma\rho$  as follows. Define  $a \leq b$  when one of the following holds:

$$a \in \sigma \land b \in \rho \tag{1}$$

$$a \stackrel{\text{tx}}{\sim} a' R b' \stackrel{\text{tx}}{\sim} b \tag{2}$$

 $a \stackrel{\text{tx}}{\sim} b \wedge a \xrightarrow{\text{index}} \sigma_{\sigma} b$  (3)

Condition (1) ensures that the actions in  $\sigma$  are ordered before those in  $\rho$ . Condition (2) ensures that the actions in a transaction of  $\sigma\rho$  are treated identically by  $\trianglelefteq$  with respect actions outside the transaction—recall that  $\stackrel{\text{tx}}{\sim}$  relates each nontransactional action to itself. Condition (3) forces order within a transaction of  $\sigma\rho$  to coincide with the order from  $\stackrel{\text{index}}{\longrightarrow} \sigma\rho$ .

It is clear that  $\trianglelefteq$  induces a total order on the actions  $\sigma\rho$  with contiguous transactions. Supposing that the trace ordered by  $\trianglelefteq$  is well formed, then it trivial to show that it is consistent, since no orders are changed. Because the semantics of a program must be closed with respect to order-preserving permutation, we further have that the trace belongs to  $\Sigma$ .

Thus, to prove the lemma it suffices to show that the trace ordered by  $\leq$  is well-formed. We consider each of the well-formedness criteria given in §2.

 $WF_1$  follows from the choice of *R*.

WF<sub>2</sub>–WF<sub>4</sub> and WF<sub>6</sub>–WF<sub>7</sub> follow from the well-formedness of  $\sigma \rho$ .

WF<sub>5</sub> holds due to well formedness of  $\sigma \rho$  and (3).

If both actions are nontransactional, WF<sub>8</sub> follows from well-formedness of  $\sigma\rho$ . If both are transactional, it follows because  $\xrightarrow{\text{cwr}}$  is included in  $\xrightarrow{\text{hb}}$ . Suppose the write is transactional and the read is not. Then the begin is ordered with respect to the read in the lifted relation  $\xrightarrow{\text{lwr}}$ . Using (2) and (3), the result holds. The argument is symmetric for the case where the read is transactional and the write is not.

For WF9, if *a*, *b* are conflicting transactional writes, then  $a \xrightarrow{\text{hb}} b$  or  $b \xrightarrow{\text{hb}} a$ . In the former case,  $a \leq b$ , by definition of *R*. The case for  $b \xrightarrow{\text{hb}} a$  is symmetric.

For WF<sub>10</sub>, let *a*, *b* be conflicting transactional writes such that  $a \xrightarrow{ww} b$  and let  $a \xrightarrow{wr} c$ . Thus,  $c \xrightarrow{rw} b$ . Since *c* is also

transactional we have  $c \stackrel{\text{xrw}}{\longrightarrow} b$ . Thus,  $c \leq b$  by the definition of *R*.

For WF<sub>11</sub>, let *b* be transactional and  $a \xrightarrow{\text{wr}} b$  and  $a \xrightarrow{\text{ww}} c$ and  $c \stackrel{\text{tx}}{\sim} b$ . If  $c \trianglelefteq b$ , then  $c \xrightarrow{\text{index}} b$  contradicting WF<sub>11</sub> on  $\sigma \rho$ .

The next lemma shows that races are preserved by delaying the timestamp of writes. The intuition is that delaying the timestamp of a write can only decrease happens before. Note that only the timestamp of the last write is increased, and since timestamps are rationals, it is straightforward to change a timestamp so that the execution under consideration remains consistent.

The key step in the proof is the inductive case for  $HB_{ww}$ , which requires  $ANTI_{ww}$ .

**Lemma A.6.** Let  $\sigma = \pi \alpha$  be a consistent execution such that  $(\beta, \alpha)$  is an L-race in  $\sigma$  between two writes. Let  $\rho = \pi \alpha'$  where  $\alpha' \stackrel{\text{act}}{\sim} \alpha$  and  $\alpha'$  has a later timestamp.

Then  $\rho$  is a consistent and  $(\beta, \alpha')$  is an L-race in  $\rho$ .

*Proof.* Since  $\rho = \pi \alpha'$  and  $a \xrightarrow{\text{hb}} \rho c$  implies  $a \xrightarrow{\text{index}} \rho c$ , it is not possible that  $\alpha' \xrightarrow{\text{hb}} \rho c$  or  $\alpha \xrightarrow{\text{hb}} \sigma c$ . We call this property *Terminal*.

We show that  $a \xrightarrow{hb}_{\rho} c$  implies  $a \xrightarrow{hb}_{\sigma} c$ , for any a, c.

The proof proceeds by induction on the definition of  $\frac{hb}{h}$ . The empty relation satisfies the hypothesis. For the inductive step, we have three cases. If  $a \xrightarrow{hb}_{\rho} c$  and  $\alpha' \neq c$ , then  $a \xrightarrow{hb}_{\pi} c$  and therefore  $a \xrightarrow{hb}_{\sigma} c$ . Thus, we need only consider cases where  $\alpha = c$ .

- For HB<sub>BASE</sub>, note that  $\frac{\text{init}}{\sigma} = \frac{-\text{init}}{\rho} \text{ and } \frac{-\text{po}}{\sigma} = \frac{-\text{po}}{\rho}$ . If  $\alpha'$  is transactional, then, by construction,  $\alpha$  must also be transactional. In this case, using *Terminal*, we deduce that  $\frac{-\text{cwr}}{\sigma} = \frac{-\text{cwr}}{\rho}$ . Since  $\alpha$  and  $\alpha'$  are transactional writes on the same variable, using *Terminal*, we deduce that  $-\frac{-\text{cww}}{\sigma} = -\frac{-\text{cww}}{\rho}$ . If  $\alpha'$  is nontransactional, then modifying the timestamp of  $\alpha$  has no effect on any of the relations in HB<sub>BASE</sub>.
- HB<sub>TRANS</sub> follows immediately by induction.
- For HB<sub>ww</sub>, suppose that *a* and *b* are nonaborted and transactional,  $\alpha'$  is plain,  $a \stackrel{\text{lww}}{\longrightarrow}_{\rho} \alpha'$ ,  $a \stackrel{\text{crw}}{\longrightarrow}_{\rho} b$  and  $b \stackrel{\text{hb}}{\longrightarrow}_{\rho} \alpha'$ .

We have  $a \xrightarrow{\text{crw}}_{\pi} b$  and therefore  $a \xrightarrow{\text{crw}}_{\sigma} b$ . Since  $\alpha \xrightarrow{\text{act}} \alpha'$ , we know that  $\alpha$  and  $\alpha'$  have the same name. Applying the induction hypothesis to  $b \xrightarrow{\text{hb}}_{\rho} \alpha'$ , we have  $b \xrightarrow{\text{hb}}_{\sigma} \alpha$ .

Note that the timestamp of  $\alpha$  cannot be less than that of *a*. If this were the case, then we would also have that the timestamp of  $\alpha$  is less than that of *a*, and we would have  $\alpha \xrightarrow{lww}_{\sigma} a$  and  $a \xrightarrow{crw}_{\sigma} b \xrightarrow{hb}_{\sigma} \alpha$ . Thus,  $\sigma$  would fail to be consistent by ANTI<sub>ww</sub>.

Since the timestamp of  $\alpha$  must be greater than that of *a*, we have  $a \xrightarrow{lww}_{\sigma} \alpha$ . Thus, by HB<sub>ww</sub>, we have as  $a \xrightarrow{hb}_{\sigma} \alpha$  required.

Well-formedness of  $\rho$  is immediate.

Consistency of  $\rho$  follows from  $\xrightarrow{\text{hb}}_{\rho} \subseteq \xrightarrow{\text{hb}}_{\sigma}$ , using the consistency of  $\sigma$ .

The raciness of  $(\beta, \alpha')$  in  $\rho$  also follows from  $\xrightarrow{\text{hb}}_{\rho} \subseteq \xrightarrow{\text{hb}}_{\sigma}$ , using the from the fact that  $\xrightarrow{\text{hb}}_{\rho}$  is included in  $\xrightarrow{\text{hb}}_{\sigma}$ , using the fact that  $(\beta, \alpha)$  is an *L*-race in  $\sigma$ .

The next lemma shows that races are preserved by delaying the timestamp of some reads. The intuition again is that delaying the timestamp can only decrease happens before. The sole case when delaying the timestamp of a read can actually increase happens before is when the read is transactional and the newly matched write is also transactional. The hypothesis of the following lemma rules out this problematic case.

**Lemma A.7.** Let  $\sigma = \pi \alpha$  be a consistent execution such that  $(\beta, \alpha)$  is an L-race in  $\sigma$ ,  $\beta$  is a write and  $\alpha$  is a read. Let  $\rho = \pi \alpha'$  where  $\alpha' \stackrel{\text{act}}{\sim} \alpha$  and  $\alpha'$  has a later timestamp.

Suppose that the writes satisfying  $\alpha$  and  $\alpha'$  are nontransactional when  $\alpha$  is transactional (and therefore  $\alpha'$  is transactional).

Then  $\rho$  is a consistent and  $(\beta, \alpha')$  is an *L*-race in  $\rho$ .

Proof. The proof is similar to the proof of Lemma A.6.

For HB<sub>BASE</sub>, the result follows since the writes matching  $\alpha$ ,  $\alpha'$  are not transactional when  $\alpha$  and  $\alpha'$  are transactional.

For rule HB<sub>ww</sub>, the result follows since  $\alpha$  and  $\alpha'$  are not writes.

**Lemma A.8.** Fix  $\Sigma$  to be the semantics of a program. Fix  $\sigma \rho \alpha \in \Sigma$  such that

- $\sigma$  is transactionally L-stable,
- $\rho$  is transactionally L-sequential in  $\sigma \rho$ , and

•  $\rho$  has no L-races in  $\sigma \rho$ .

- Then, there is  $\sigma \rho' \alpha \in \Sigma$  such that
  - $\rho'$  is transactionally L-sequential in  $\sigma \rho'$ ,
  - $\rho' \alpha$  has contiguous transactions, and
  - ρ' is an order-preserving permutation of a subsequence of ρ.

*Proof.* If  $\alpha$  is non-transactional or a begin action, setting  $\rho = \rho'$  meets the requirements. Thus we suppose that  $\alpha$  is transactional, belonging to transaction *a* of thread *s*.

Let  $\pi = \sigma \rho \alpha \downarrow a$ . Let  $\rho'$  be derived from  $\pi$  by permuting the events of the open transaction *a* to the end.

This order preserving permutation establishes contiguity of transactions in  $\rho' \alpha$ .

Next, we show that  $\sigma \rho' \alpha$  is well-formed. WF<sub>1</sub>–WF<sub>7</sub> are immediate. WF<sub>8</sub> follows because the writes of *a* are only read by actions of *a* by WF<sub>7</sub>. WF<sub>9</sub> and WF<sub>10</sub> follow because  $\rho'$  is derived from  $\pi$ . WF<sub>11</sub> is inherited from well-formedness of  $\sigma \rho \alpha$ .

Finally, we show that  $\rho'$  is *L*-sequential in  $\sigma \rho'$ . We proceed by contradiction. There are two cases to consider. Let *c* be an arbitrary action in  $\rho'$ .

• *c* touches a location in *L* and there is a  $b \xrightarrow{index} c$  such that  $c \xrightarrow{ww} b$ . Since  $\sigma \rho \alpha$  is well formed, this can only happen if *c* is in open transaction *a* and *c* was before *b* in  $\sigma \rho \alpha$ .

We reason by cases based on whether *b* is transactional.

- If *b* is transactional,  $c \xrightarrow{hb} b$ ; so  $b \notin \pi$ .
- If *b* is not transactional. In this case, since  $b \notin \pi$ , we deduce that  $\neg(c \xrightarrow{hb} b)$ . So, since there are no data races in  $\rho$ , we deduce that  $b \xrightarrow{hb} c$  which contradicts COHERENCE of  $\sigma\rho$ .
- c touches a location in L, a → c, and there is b → c such that a → b. We reason by cases based on whether b is transactional.
  - If *b* is transactional,  $c \xrightarrow{\text{xrw}} b$ ; so  $b \notin \pi$ .
  - If *b* is not transactional. In this case, since  $b \notin \pi$ , we deduce that  $\neg(c \xrightarrow{hb} b)$ . So, since there are no data races in  $\rho$ , we deduce that  $b \xrightarrow{hb} c$  which contradicts OBSERVATION of  $\sigma\rho$ .

We now turn to the theorem.

**Theorem 4.1.** Fix  $\Sigma$  to be the semantics of a program. Fix  $\sigma \rho \alpha \in \Sigma$  such that

- $\sigma$  is transactionally L-stable,
- $\rho$  is transactionally L-sequential in  $\sigma \rho$ ,
- $\rho$  has no L-races in  $\sigma \rho$ , and
- $\alpha$  is *L*-weak in  $\sigma \rho \alpha$ .

Then, there are  $b \in \rho$ ,  $\alpha' \stackrel{\text{act}}{\sim} \alpha$  and  $\sigma \rho' \alpha' \in \Sigma$  such that

- $\rho' \alpha'$  is transactionally L-sequential in  $\sigma \rho' \alpha'$ , and
- $(b, \alpha')$  is an L-race in  $\sigma \rho' \alpha'$ .

*Proof.* By Lemma A.8, we can assume without loss of generality that  $\sigma \rho \alpha$  has contiguous transactions.

Choose *b* as follows. Since  $\alpha$  is *L*-weak, by Lemma A.4, we know that there is some *b* such that  $(b, \alpha)$  is an *L*-race. By the definition of stability, we know that *b* must occur in  $\rho$ .

Choose  $\alpha'$  as follows. Since  $\Sigma$  is *sequentially-closed*, there must be a *L*-sequential action  $\alpha' \stackrel{\text{act}}{\sim} \alpha$  such that  $\sigma \rho \alpha' \in \Sigma$ .

Choose  $\rho'$  as follows. By Lemma A.3, there is some  $\rho'$ such that  $\sigma\rho\alpha \downarrow \alpha' = \sigma\rho'\alpha'$ . Since  $\Sigma$  is *causally-closed*, we know that  $\sigma\rho'\alpha' \in \Sigma$ . Since  $\rho'$  is a subsequence of  $\rho$ , all transactions of  $\rho'$  are contiguous. By construction, using Lemma A.3, we know that  $\rho'\alpha'$  is *L*-sequential in  $\sigma\rho'\alpha'$ . Thus,  $\rho'\alpha'$  is transactionally *L*-sequential in  $\sigma\rho'\alpha'$ .

We need only show that  $(b, \alpha')$  is an *L*-race. We proceed by cases.

- Suppose that *α* is a B, C, and A action. This is not possible since these actions are always *L*-sequential.
- Suppose that  $\alpha$  is a write. The result follows from Lemma A.6.
- Suppose that  $\alpha$  is a non transactional read. The result follows from Lemma A.7.
- Finally, suppose that *α* is a transactional read.
   The write matching *α* must be nontransactional. Otherwise WF<sub>10</sub> guarantees that *α* would be *L*-sequential.

The write matching  $\alpha'$  must be nontransactional. Otherwise it would follow  $\alpha$  in  $\xrightarrow{\text{xrw}}$ , and thus must have been removed from the causal closure. (This case corresponds to executions illustrated at the beginning of the paragraph labelled "*From* D *to* T" on page 8.) Given that the fulfilling writes for  $\alpha$  and  $\alpha'$  are not transactional, the hypotheses of lemma A.7 are satisfied, yielding the required result.  $\Box$ 

#### **B** Aborted Transactions

**Theorem 4.2.** If  $\sigma$  is consistent then so is  $\sigma$  with aborted transactions removed.

*Proof.* Let  $\rho$  be any well-formed and consistent trace. Then:

- $\rho$  without *a* is well-formed in the case that  $a = \langle R \rangle$  or  $a = \langle W \rangle$  and *a* is not the source of an  $\xrightarrow{xwr}$  edge.
- by WF<sub>7</sub>, if  $a = \langle W \rangle$  is in an aborted transaction, any read of *a* is also in the same aborted transaction.
- $\rho$  with  $\langle B \rangle$  ( and any matching  $\langle end \rangle$ ) removed is also well-formed.

Let  $\sigma$  be a well-formed and consistent trace. Let us write  $\sigma \setminus A$  for  $\sigma$  with aborted transactions removed. By above observation,  $\sigma \setminus A$  is well-formed. Consistency of  $\sigma \setminus A$  follows from the consistency of  $\sigma$  because the relations on  $\sigma \setminus A$  are merely the restriction of those in  $\sigma$  to a subset of events.  $\Box$ 

#### C Technical Development for §5

The intuition behind the proof of Lemma 5.1 is that the extra explicit ordering in an implementation race free execution compensates for the specified extra HB<sub>ww</sub> and ANTI<sub>ww</sub> axioms in the programmer model.

**Lemma 5.1.** Let  $\sigma$  be an execution in the implementation model without mixed races. Let  $\rho$  be the induced execution in the programmer model obtained by dropping all the quiescence fences in  $\sigma$ . If  $\sigma$  is consistent, then so is  $\rho$ .

*Proof.* Well-formedness of  $\rho$  is immediate.

Consistency of  $\rho$  follows if we can show that the orders in  $\rho$  agree with those in  $\sigma$ . Thus, it suffices to show that  $\sigma$ satisfies HB<sub>ww</sub> and ANTI<sub>ww</sub>. We proceed as follows.

To show HB<sub>ww</sub>, let *c* be plain,  $a \xrightarrow{-lww} c$ , and  $a \xrightarrow{crw} b \xrightarrow{hb} c$ in  $\sigma$ . Then, by implementation race freedom, we must have  $a \xrightarrow{hb} c$ , otherwise *a* and *c* would be racing.

To show ANTI<sub>ww</sub>, suppose  $a \xrightarrow{crw} b \xrightarrow{hb} c \xrightarrow{-lww} a$  in  $\sigma$ . By implementation race freedom, we must have  $c \xrightarrow{hb} a$ . However, this leads to a cycle in  $\xrightarrow{crw} \cup \xrightarrow{hb}$ , contradicting the observation axiom of  $\sigma$ .

**Suborders** We follow [9] in providing an alternate characterization of  $\xrightarrow{hb}$  in the implementation model. Recall that the  $\xrightarrow{hb}$  relation in the implementation model does not include HB<sub>ww</sub>.

Let  $\xrightarrow{\text{swe}} = (\underbrace{\text{cwr}} \cup \underbrace{\text{-cww}}) \setminus \underbrace{\text{po}}$  be the external transactional communication relation, which captures the basic

ingredients in the  $\xrightarrow{hb}$  relation across threads, namely external transactional reads-from and external transactional coherence.

Let  $\underline{hbe} = \underline{po-T}; (\underline{swe}; \underline{poT})^*; \underline{swe}; \underline{poT}$  be the external component of  $\underline{hb}$ , which captures how synchronization propagates across different threads.

These definitions provides a clean decomposition of hb.

**Lemma C.1** (Characterizing hb).  $\xrightarrow{hb} = \xrightarrow{init} \cup \xrightarrow{hbe} \cup \xrightarrow{po}$ 

*Proof.* The inclusion of  $\xrightarrow{\text{init}} \cup \xrightarrow{\text{hbe}} \cup \xrightarrow{\text{po}} \subseteq \xrightarrow{\text{hb}}$  is immediate.

For the converse direction. The following calculations are immediate.

$$\begin{array}{ccc} \underline{\operatorname{init}}; & \underline{\operatorname{hb}} & \subseteq & \underline{\operatorname{init}} \\ \underline{\operatorname{poT-}}; & \underline{\operatorname{po-T}} & \subseteq & \underline{\operatorname{poTT}} \\ \underline{\operatorname{po}}; & \underline{\operatorname{hbe}}; & \underline{\operatorname{po}} & \subseteq & \underline{\operatorname{hbe}} \end{array}$$

Thus we are able to deduce that  $\xrightarrow{hbe}$ ;  $\xrightarrow{hbe} \subseteq \xrightarrow{hbe}$  as follows: <u>hbe</u>, <u>hbe</u>,

$$= \xrightarrow{\text{po-T}}; (\xrightarrow{\text{swe}}; \xrightarrow{\text{poT}})^{\star}; \xrightarrow{\text{swe}}; \xrightarrow{\text{poT}}; (\xrightarrow{\text{swe}})^{\star}; \xrightarrow{\text{swe}}; \xrightarrow{\text{poT}}; (\xrightarrow{\text{swe}}; \xrightarrow{\text{poT}}; (\xrightarrow{\text{swe}})^{\star}; \xrightarrow{\text{swe}}; \xrightarrow{\text{poT}}; (\xrightarrow{\text{swe}}; \xrightarrow{\text{swe}}; \xrightarrow{\text{poT}}; \xrightarrow{\text{swe}}; \xrightarrow{\text{poT}}; (\xrightarrow{\text{swe}}; \xrightarrow{\text{poT}}; \xrightarrow{\text{swe}}; \xrightarrow{\text{swe}}; \xrightarrow{\text{poT}}; (\xrightarrow{\text{swe}}; \xrightarrow{\text{swe}}; \xrightarrow{\text{poT}}; \xrightarrow{\text{swe}}; \xrightarrow{\text{swe}}; \xrightarrow{\text{swe}}; \xrightarrow{\text{poT}}; \xrightarrow{\text{swe}}; \xrightarrow{\text{swe}};$$

Hence,  $\xrightarrow{\text{init}} \cup \xrightarrow{\text{hbe}} \cup \xrightarrow{\text{po}}$  is transitive. The proof is completed by noting that  $\xrightarrow{\text{cwr}} \cup \xrightarrow{\text{xrw}} \subseteq \xrightarrow{\text{hbe}} \cup \xrightarrow{\text{po}}$ .

They also provide an alternative characterization of consistency in the implementation model<sup>1</sup>.

Let  $\frac{wre}{r} = \frac{wr}{r} \setminus \frac{po}{r}$  be the external portion of the readto-write relation, and  $\frac{xrwe}{r} = \frac{xrw}{r} \setminus \frac{po}{r}$  be the external portion of the transactional read-to-read relation.

**Lemma C.2.** An execution is consistent in the implementation model iff the following hold.

$$( \stackrel{\text{hbe}}{\longrightarrow} \cup \stackrel{\text{poT-}}{\longrightarrow} \cup \stackrel{\text{po-T-}}{\longrightarrow} \cup \stackrel{\text{poRW}}{\longrightarrow} \cup \stackrel{\text{wre}}{\longrightarrow} \cup \stackrel{\text{xrwe}}{\longrightarrow} ) \text{ is acyclic.}$$

$$( \stackrel{\text{init}}{\longrightarrow} \cup \stackrel{\text{hbe}}{\longrightarrow} \cup \stackrel{\text{poCon}}{\longrightarrow} ); -\stackrel{\text{lww}}{\longrightarrow} \text{ is irreflexive.}$$

$$( \stackrel{\text{init}}{\longrightarrow} \cup \stackrel{\text{hbe}}{\longrightarrow} \cup \stackrel{\text{poCon}}{\longrightarrow} ); ... \stackrel{\text{lrw}}{\longrightarrow} ) \text{ is irreflexive.}$$

*Proof.* For causality, we need that  $(\xrightarrow{hb} \cup \xrightarrow{xrw})$  is acyclic. We deduce:

 $\begin{array}{c} \underline{hb} \cup \underline{lwr} \cup \underline{xrw} \text{ is acyclic.} \\ \Leftrightarrow \underline{init} \cup \underline{hbe} \cup \underline{po} \cup \underline{lwr} \cup \underline{xrw} \text{ is acyclic.} \\ \Leftrightarrow \underline{hbe} \cup \underline{po} \cup \underline{lwr} \cup \underline{xrw} \text{ is acyclic.} \\ \Leftrightarrow \underline{hbe} \cup \underline{po} \cup \underline{wre} \cup \underline{xrwe} \text{ is acyclic.} \end{array}$ 

The first step follows from Lemma C.1; the second since  $\xrightarrow{\text{init}}$  is acyclic, and the last from definitions of  $\xrightarrow{\text{wre}}$ ,  $\xrightarrow{\text{xrwe}}$ .

Consider a cycle in the last relation above. Without loss of generality, assume that every two adjacent elements of the cycle are in different threads. All the relations other than  $\frac{\text{wre}}{\text{wre}}$  use transactional events. So, if we have two adjacent events  $a \xrightarrow{\text{po}} b$ , neither of which is transactional, the cycle

<sup>&</sup>lt;sup>1</sup>We include  $\frac{\text{init}}{\text{init}}$  to be consistent with [9]. It can be removed since the initializing transaction has only one write per location; thus, initialization actions are not the target of any of our relations.

contains  $c_1 \xrightarrow{\text{wre}} a \xrightarrow{\text{po}} b \xrightarrow{\text{wre}} c_2$ . Thus, we deduce that  $a \xrightarrow{\text{poRW}} b$ .

In the last three items, we use Lemma C.1 for the alternative characterization of  $\xrightarrow{\text{hb}}$ . We can replace  $\xrightarrow{\text{po}}$  by  $\xrightarrow{\text{poCon}}$  by the following reasoning. If  $a (-\stackrel{lww}{\longrightarrow} \cup \stackrel{lrw}{\longrightarrow}) b$ , then a, b access the same location and at least one is a write.

The following lemma addresses the infrastructure needed for reordering transformations.

**Lemma C.3.** Let  $\sigma$ ,  $\rho$  be well-formed executions with the same events that agree on the <u>init</u>, <u>ww</u>, <u>wr</u>, and <sup>tx</sup> relations and satisfy:

$$\begin{array}{c} (\xrightarrow{\text{po-T}} \sigma, \xrightarrow{\text{poT}} \sigma, \xrightarrow{\text{poT}} \sigma, \xrightarrow{\text{poRW}} \sigma, \xrightarrow{\text{poCon}} \sigma, \xrightarrow{\text{swe}} \sigma) \\ = (\xrightarrow{\text{po-T}} \rho, \xrightarrow{\text{poT}} \rho, \xrightarrow{\text{poTT}} \rho, \xrightarrow{\text{poRW}} \rho, \xrightarrow{\text{poCon}} \rho, \xrightarrow{\text{swe}} \rho) \end{array}$$

Then,  $\sigma$  is consistent iff  $\rho$  is consistent.

*Proof.* We first show that the happens-before relations of  $\sigma$ ,  $\rho$  coincide. Since  $\xrightarrow{\text{swe}}$  coincides for  $\sigma$ ,  $\rho$ ,  $\xrightarrow{\text{hbe}}$  coincides for  $\sigma$ ,  $\rho$ . Result is immediate using lemma C.1.

Since  $\sigma$ ,  $\rho$  also agree on all the base relations  $\xrightarrow{\text{init}}$ ,  $\xrightarrow{\text{ww}}$ ,  $\xrightarrow{\text{wr}}$ , and  $\stackrel{\text{tx}}{\sim}$ , they also agree on all the derived lifted relations. Result follows.

The following lemma addresses the infrastructure needed for roach-motel transformations.

**Lemma C.4.** Let  $\sigma$ ,  $\rho$  be well-formed executions with the same events that agree on the  $\frac{\text{init}}{\rightarrow}$ ,  $-\frac{\text{ww}}{\rightarrow}$ ,  $\frac{\text{wr}}{\rightarrow}$  and  $\frac{\text{po}}{\rightarrow}$ . Let the  $\stackrel{\text{tx}}{\sim}$  relation of  $\rho$  be a superset of the  $\stackrel{\text{tx}}{\sim}$  relation of  $\sigma$ .

Then, if  $\rho$  is consistent, so is  $\sigma$ .

*Proof.* Since the  $\stackrel{tx}{\sim}$  relation of  $\sigma$  is a subset of the  $\stackrel{tx}{\sim}$  relation of  $\sigma$ , and  $\sigma$ ,  $\rho$  agree on all the base relations  $\stackrel{\text{init}}{\longrightarrow}$ ,  $\stackrel{\text{ww}}{\longrightarrow}$ ,  $\stackrel{\text{wr}}{\longrightarrow}$ , and  $\stackrel{\text{po}}{\longrightarrow}$ , we deduce that all lifted relations of  $\sigma$  are a subset of the lifted relations of  $\rho$  and  $\stackrel{\text{hb}}{\longrightarrow}_{\sigma} \subseteq \stackrel{\text{hb}}{\longrightarrow}_{\rho}$ .

Consistency of  $\sigma$  follows from the consistency of  $\rho$ .

The following lemma addresses the infrastructure needed for fusion transformations.

**Lemma C.5.** Let  $\rho$  be a consistent, well-formed execution with transaction *a* in *s*. Let *b* be a new name. Let  $\sigma$  be derived from  $\rho$  by:

- introducing (a:sC)(b:sB) between the begin and end of transaction a
- replacing the end (commit/abort) of a, if any, by an end (commit/abort) of b

#### Then, $\sigma$ is well-formed and consistent.

*Proof.* Well-formedness of  $\sigma$  follows from the well-formedness of  $\rho$ . WF<sub>1</sub>–WF<sub>8</sub> are unaffected by the changes. Any violation of WF<sub>9</sub>–WF<sub>11</sub> in  $\sigma$  induces a violation of the same in  $\rho$ .

All orders in  $\sigma$  restricted to the actions from  $\rho$  are contained in the corresponding orders on  $\rho$ . Any simple cycle in any of the consistency criterion on  $\sigma$  induces a simple cycle in  $\rho$  with the new actions replaced by  $\langle a:sB \rangle$ . Thus, consistency of  $\sigma$  follows from consistency of  $\rho$ . The following lemma addresses the infrastructure needed for removing empty transactions.

**Lemma C.6.** Let  $\rho = \rho' \alpha \beta \rho''$  be a consistent, well-formed execution, where  $\alpha$  is an action of s that is not part of any transaction.

Let b be a new name. Let  $\sigma = \rho' \alpha \langle b:s B \rangle \langle b:s C \beta \rangle \rho''$ . Then,  $\sigma$  is well-formed and consistent.

*Proof.* Well-formedness of  $\sigma$  follows immediately from the well-formedness of  $\rho$ . WF<sub>1</sub>–WF<sub>8</sub> are unaffected by the changes. Any violation of WF<sub>9</sub>–WF<sub>11</sub> in  $\sigma$  induces a violation of the same in  $\rho$ .

The new actions in  $\rho$  only participate in the  $\xrightarrow{\text{po}}$  order, where they have a unique predecessor and successor. All orders in  $\sigma$  restricted to the actions from  $\rho$  are contained in the corresponding orders on  $\rho$ . Any simple cycle in any of the consistency criterion on  $\sigma$  induces a simple cycle in  $\rho$ with the new actions replaced by  $\alpha\beta$ .

#### **D** Additional Examples

The next two examples discuss aborted transactions.

**Example D.1** (Opaque writes). Final outcome r = 1 is not permitted in the program below.

 $\operatorname{atomic}_a \{ x \coloneqq 1; \operatorname{abort} \} \parallel \operatorname{atomic}_b \{ r \coloneqq x \}$ 

This is trivial to justify by well-formedness (condition 7) since  $\xrightarrow{wr}$  cannot originate from an aborted transaction.

**Example D.2** (Race-free speculation). The only permitted final outcome is r = 2.

 $\begin{array}{l} \operatorname{atomic}_a \left\{ x{+}; \, y{+} \right\} \\ \| \operatorname{atomic}_b \left\{ \operatorname{if} x \neq y \text{ then } \left\{ z \coloneqq 1; \operatorname{abort} \right\} \right\} \| z \coloneqq 2; \, r \coloneqq z \end{array}$ 

Since the guard of transaction *b* will never hold the program is race free, and hence it will never execute *z*:=1. This means that there is no danger that the abort will undo the nontransactional write to *z*. In particular, for every execution  $\langle Wz2 \rangle$  obscures the read of *z* in the third thread.

**Example D.3** (Dirty reads). Final result x = 0 and y = 1 is forbidden.

atomic<sub>a</sub> {if !y then x := 1; abort}; atomic<sub>b</sub> {if !y then x := 1} || if x = 1 then y := 1

The result would be possible if the second thread observes the write of x in transaction a, then updates y. Since a rolls back, it will restore x's value back to 0, causing transaction b to skip over the update to x on re-execution. However, in our model such an execution is not possible since non-transactional events cannot read from live or aborted transactions.

**Example D.4** (No overlapped writes). Final result r = 0 is forbidden in the program below, where z is an array. The result would be possible if transaction a initializes z[y] and then publishes it by writing it to shared *volatile* variable x.

Since lazy version copies cached values in any order, the second thread may see the update to x before it sees the update to z[y]. In our model, this results in the execution below.

| atomic <sub>a</sub> { $y \coloneqq 4$ ; $z[y] \coloneqq 1$ ; $x \coloneqq 4$ } | $a: Ry4 \rightarrow Wz[4]1 \rightarrow Wx4$ |
|--------------------------------------------------------------------------------|---------------------------------------------|
| $  r := 1;$ atomic { $q := x$ };                                               | cwr. ···. Irw                               |
| if $q \neq 0$ then $r \coloneqq z[q]$                                          | $Rx4 \rightarrow Rz[4]0$                    |

Since we model volatile accesses as a singleton committed transaction, we obtain an edge  $\xrightarrow{cwr}$  to the read of *x* in the second thread, which violates axiom (OBSERVATION).