## Robustness against Power is $\operatorname{PSpace}$ -complete

Egor Derevenetc<sup>1,2</sup> Roland Meyer<sup>1</sup>

<sup>1</sup>University of Kaiserslautern

<sup>2</sup>Fraunhofer ITWM

WEACON Kaiserslautern 13.06.2014





#### Introduction

## Power Architecture Robustness

#### Deciding Robustness

Characterization of Violating Computations Normal-Form Computations Generating Normal-Form Computations Checking Cyclicity of Happens-Before Relation Complexity

### Conclusion

- Related Work
- Summary

### Example (Message Passing Program)

Consider the multithreaded program (initially, x = y = 0):

Thread 1:Thread 2: $a: mem[x] \leftarrow 1$  $c: r_1 \leftarrow mem[y]$  $b: mem[y] \leftarrow 1$  $d: r_2 \leftarrow mem[x]$ Assumption:  $r_1 = 1$  implies  $r_2 = 1$ .

### Example (Message Passing Program)

Consider the multithreaded program (initially, x = y = 0):

Thread 1:Thread 2: $a: mem[x] \leftarrow 1$  $c: r_1 \leftarrow mem[y]$  $b: mem[y] \leftarrow 1$  $d: r_2 \leftarrow mem[x]$ Assumption:  $r_1 = 1$  implies  $r_2 = 1$ .

### Sequential Consistency (SC) [Lamport, 1979]

- Instructions are executed in order.
- Writes to memory are immediately visible to all threads.
- $\Rightarrow$  The assumption holds.

### Example (Message Passing Program)

Consider the multithreaded program (initially, x = y = 0):

Thread 1:Thread 2: $a: mem[x] \leftarrow 1$  $c: r_1 \leftarrow mem[y]$  $b: mem[y] \leftarrow 1$  $d: r_2 \leftarrow mem[x]$ Assumption: $r_1 = 1$  implies  $r_2 = 1$ .

### Sequential Consistency (SC) [Lamport, 1979]

- Instructions are executed in order.
- Writes to memory are immediately visible to all threads.
- $\Rightarrow$  The assumption holds.

### Power Architecture by IBM et al. [Sarkar et al., 2011]

- Independent instructions can be executed out of order.
- Writes can be seen by different threads in different order.
- $\Rightarrow$  The assumption does not hold.

How a thread executes an instruction on Power:

How a thread executes an instruction on Power:

First, it fetches it. Instructions must be fetched in the program order, one after another.

How a thread executes an instruction on Power:

- First, it fetches it. Instructions must be fetched in the program order, one after another.
- Next, it performs the computation prescribed by the instruction's semantics. Results of instructions, on which the current one depends, must be already computed.

How a thread executes an instruction on Power:

- First, it fetches it. Instructions must be fetched in the program order, one after another.
- Next, it performs the computation prescribed by the instruction's semantics. Results of instructions, on which the current one depends, must be already computed.
- ► Finally, it commits the instruction. Similarly, all instruction's dependencies must be committed earlier.

How a thread executes an instruction on Power:

- First, it fetches it. Instructions must be fetched in the program order, one after another.
- Next, it performs the computation prescribed by the instruction's semantics. Results of instructions, on which the current one depends, must be already computed.
- ► Finally, it commits the instruction. Similarly, all instruction's dependencies must be committed earlier.

One thread can execute multiple instructions in parallel.

How a thread executes an instruction on Power:

- First, it fetches it. Instructions must be fetched in the program order, one after another.
- Next, it performs the computation prescribed by the instruction's semantics. Results of instructions, on which the current one depends, must be already computed.
- Finally, it commits the instruction. Similarly, all instruction's dependencies must be committed earlier.

One thread can execute multiple instructions in parallel.

Example (Thread 2 of Message Passing Program)

 $c: r_1 \leftarrow \operatorname{mem}[y]; d: r_2 \leftarrow \operatorname{mem}[x].$ 

How a thread executes an instruction on Power:

- First, it fetches it. Instructions must be fetched in the program order, one after another.
- Next, it performs the computation prescribed by the instruction's semantics. Results of instructions, on which the current one depends, must be already computed.
- Finally, it commits the instruction. Similarly, all instruction's dependencies must be committed earlier.

One thread can execute multiple instructions in parallel.

Example (Thread 2 of Message Passing Program)

$$c: r_1 \leftarrow \operatorname{mem}[y]; d: r_2 \leftarrow \operatorname{mem}[x].$$

Example (Computation of Thread 2)  $\beta := \text{fetch}(c) \cdot \text{fetch}(d) \cdot \text{load}(c) \cdot \text{load}(d) \cdot \text{commit}(d) \cdot \text{commit}(c).$ 

How memory works on Power:

How memory works on Power:

A thread loads the value written by the last store to the same address propagated to this thread.

How memory works on Power:

- A thread loads the value written by the last store to the same address propagated to this thread.
- A committed store is immediately propagated to its own thread and can be later propagated to some other threads.

How memory works on Power:

- A thread loads the value written by the last store to the same address propagated to this thread.
- A committed store is immediately propagated to its own thread and can be later propagated to some other threads.
- Stores to the same address are globally ordered (coherence order) and can be propagated only in this order.

How memory works on Power:

- A thread loads the value written by the last store to the same address propagated to this thread.
- A committed store is immediately propagated to its own thread and can be later propagated to some other threads.
- Stores to the same address are globally ordered (coherence order) and can be propagated only in this order.

Example (Thread 1 of Message Passing Program)

a: mem[x]  $\leftarrow$  1; b: mem[y]  $\leftarrow$  1.

How memory works on Power:

- A thread loads the value written by the last store to the same address propagated to this thread.
- A committed store is immediately propagated to its own thread and can be later propagated to some other threads.
- Stores to the same address are globally ordered (coherence order) and can be propagated only in this order.

```
Example (Thread 1 of Message Passing Program)
```

```
a: mem[x] \leftarrow 1; b: mem[y] \leftarrow 1.
```

### Example (Computation of Thread 1)

 $\alpha := fetch(a) \cdot commit(a) \cdot prop(a, 1) \cdot fetch(b) \cdot commit(b) \cdot prop(b, 1) \cdot prop(b, 2).$ 

# Power Architecture 4/4 Example (Message Passing Program)

Initially, 
$$x = y = 0$$
.Thread 1:Thread 2: $a: mem[x] \leftarrow 1$  $c: r_1 \leftarrow mem[y]$  $b: mem[y] \leftarrow 1$  $d: r_2 \leftarrow mem[x]$ Assumption: $r_1 = 1$  implies  $r_2 = 1$ .

Example (Computation of the Program on Power)  $\tau := \alpha \cdot \beta = \text{fetch}(a) \cdot \text{commit}(a) \cdot \text{prop}(a, 1) \cdot \text{fetch}(b) \cdot \text{commit}(b) \cdot \text{prop}(b, 1) \cdot \text{prop}(b, 2) \cdot \text{fetch}(c) \cdot \text{fetch}(d) \cdot \text{load}(c) \cdot \text{load}(d) \cdot \text{commit}(d) \cdot \text{commit}(c).$ 

- Load *c* reads value 1 written by *b*.
- Load d reads the initial value 0, as store a was never propagated to Thread 2.
- $\Rightarrow$  The assumption does not hold.

### **Robustness Problem**

Check, whether a given program has the same behaviors under SC and under Power.

### Robustness Problem

Check, whether a given program has the same behaviors under SC and under Power. Behavior is the control and data dependencies between instructions.

### Robustness Problem

Check, whether a given program has the same behaviors under SC and under Power. Behavior is the control and data dependencies between instructions.

### **Our Solution**

Reduce robustness checking to an emptiness check for an intersection of languages:

$$\mathcal{L} \cap \mathcal{R} \stackrel{?}{=} \emptyset.$$

 Computations violating SC (if any) have a representative in a normal form.

### Robustness Problem

Check, whether a given program has the same behaviors under SC and under Power. Behavior is the control and data dependencies between instructions.

### **Our Solution**

Reduce robustness checking to an emptiness check for an intersection of languages:

$$\mathcal{L} \cap \mathcal{R} \stackrel{?}{=} \emptyset.$$

- Computations violating SC (if any) have a representative in a normal form.
- ► Language *L* consists of all normal-form computations.

### Robustness Problem

Check, whether a given program has the same behaviors under SC and under Power. Behavior is the control and data dependencies between instructions.

### **Our Solution**

Reduce robustness checking to an emptiness check for an intersection of languages:

$$\mathcal{L} \cap \mathcal{R} \stackrel{?}{=} \emptyset.$$

- Computations violating SC (if any) have a representative in a normal form.
- ► Language *L* consists of all normal-form computations.
- $\cap \mathcal{R}$  filters only violating computations.

### Robustness Problem

Check, whether a given program has the same behaviors under SC and under Power. Behavior is the control and data dependencies between instructions.

### **Our Solution**

Reduce robustness checking to an emptiness check for an intersection of languages:

# $\mathcal{L}\cap \mathcal{R}\stackrel{?}{=} \emptyset.$

- Computations violating SC (if any) have a representative in a normal form.
- Language  $\mathcal{L}$  consists of all normal-form computations.
- $\cap \mathcal{R}$  filters only violating computations.
- Decide  $\mathcal{L} \cap \mathcal{R} \stackrel{?}{=} \emptyset$ .

#### Introduction

Power Architecture Robustness

#### Deciding Robustness

Characterization of Violating Computations Normal-Form Computations Generating Normal-Form Computations Checking Cyclicity of Happens-Before Relation Complexity

#### Conclusion

- Related Work
- Summary

#### Introduction

Power Architecture Robustness

#### Deciding Robustness

### Characterization of Violating Computations

Normal-Form Computations Generating Normal-Form Computations Checking Cyclicity of Happens-Before Relation Complexity

#### Conclusion

- Related Work
- Summary

A computation violates SC iff it has cyclic happens-before relation.

A computation violates SC iff it has cyclic happens-before relation.

Example (Happens-Before Relation of Computation  $\tau$ )

|                   | Thread 1                          | Thread 2                            |
|-------------------|-----------------------------------|-------------------------------------|
| init <sub>x</sub> | $a: \text{ mem}[x] \leftarrow 1$  | $d: r_2 \leftarrow \texttt{mem}[x]$ |
| $init_y$          | $b: \texttt{mem}[y] \leftarrow 1$ | $c\colon r_1 \gets \texttt{mem}[y]$ |

A computation violates SC iff it has cyclic happens-before relation.

Example (Happens-Before Relation of Computation  $\tau$ )

|                   | Thread 1                      | Thread 2                                                               |
|-------------------|-------------------------------|------------------------------------------------------------------------|
| init <sub>x</sub> | a: mem[x] $\leftarrow 1$      | $d: r_2 \leftarrow \texttt{mem}[x]$                                    |
| init <sub>y</sub> | $po \ b: mem[y] \leftarrow 1$ | $\begin{array}{c} po \\ c: r_1 \leftarrow \texttt{mem}[y] \end{array}$ |

Happens-before relation is a union of four relations:

Program order — textual ordering of instructions.

A computation violates SC iff it has cyclic happens-before relation.

Example (Happens-Before Relation of Computation  $\tau$ )

 $\begin{array}{c|c} & \text{Thread 1} & \text{Thread 2} \\ \text{init}_x & \overbrace{co} \\ & a: & \text{mem}[x] \leftarrow 1 \\ & po \\ & \text{init}_y & \overbrace{co} \\ & b: & \text{mem}[y] \leftarrow 1 \end{array} & \begin{array}{c} d: & r_2 \leftarrow \text{mem}[x] \\ & po \\ & po \\ & c: & r_1 \leftarrow \text{mem}[y] \end{array}$ 

- Program order textual ordering of instructions.
- Coherence order ordering of stores to the same address.

A computation violates SC iff it has cyclic happens-before relation.

Example (Happens-Before Relation of Computation  $\tau$ )



- Program order textual ordering of instructions.
- Coherence order ordering of stores to the same address.
- Source order which store is read by which load.

A computation violates SC iff it has cyclic happens-before relation.

Example (Happens-Before Relation of Computation  $\tau$ )



- Program order textual ordering of instructions.
- Coherence order ordering of stores to the same address.
- Source order which store is read by which load.
- Conflict order which stores overwrite the value read by a load.

#### Introduction

Power Architecture Robustness

#### Deciding Robustness

Characterization of Violating Computations

#### Normal-Form Computations

Generating Normal-Form Computations Checking Cyclicity of Happens-Before Relation Complexity

#### Conclusion

- Related Work
- Summary

# Normal-Form Computations 1/4

Definition

A computation  $\tau := \tau_1 \cdots \tau_n$  is in normal form of degree *n*, if

# Normal-Form Computations 1/4

### Definition

A computation  $\tau := \tau_1 \cdots \tau_n$  is in normal form of degree *n*, if

• there are no fetch events in  $\tau_2 \cdots \tau_n$ ,

### Definition

A computation  $\tau := \tau_1 \cdots \tau_n$  is in normal form of degree *n*, if

- there are no fetch events in  $\tau_2 \cdots \tau_n$ ,
- events in each part τ<sub>1</sub>...τ<sub>n</sub> occur in the order in which corresponding fetch events occur in τ<sub>1</sub>.

### Definition

A computation  $\tau := \tau_1 \cdots \tau_n$  is in normal form of degree *n*, if

- there are no fetch events in  $\tau_2 \cdots \tau_n$ ,
- events in each part τ<sub>1</sub>...τ<sub>n</sub> occur in the order in which corresponding fetch events occur in τ<sub>1</sub>.

#### Theorem

If a program has computations with cyclic happens-before relation, it has one in the normal form of degree (number of threads + 3).

### Definition

A computation  $\tau := \tau_1 \cdots \tau_n$  is in normal form of degree *n*, if

- there are no fetch events in  $\tau_2 \cdots \tau_n$ ,
- events in each part τ<sub>1</sub>...τ<sub>n</sub> occur in the order in which corresponding fetch events occur in τ<sub>1</sub>.

#### Theorem

If a program has computations with cyclic happens-before relation, it has one in the normal form of degree (number of threads + 3).

### Proof Idea.

Take a shortest computation with cyclic happens-before relation and transform it to the normal form.

Lemma

Given a non-empty valid computation, there is a thread, such that deletion of all events belonging to its last fetched instruction produces a valid computation.

#### Lemma

Given a non-empty valid computation, there is a thread, such that deletion of all events belonging to its last fetched instruction produces a valid computation.

#### Example

 $\tau = \text{fetch}(a) \cdot \text{commit}(a) \cdot \text{prop}(a, 1) \cdot \underline{\text{fetch}(b)} \cdot \underline{\text{commit}(b)} \cdot \underline{\text{prop}(b, 1)} \cdot \underline{\text{prop}(b, 2)} \cdot \underline{\text{fetch}(c)} \cdot \underline{\text{fetch}(d)} \cdot \underline{\text{load}(c)} \cdot \underline{\text{load}(d)} \cdot \underline{\text{commit}(d)} \cdot \underline{\text{commit}(c)}.$ 



#### Lemma

Given a non-empty valid computation, there is a thread, such that deletion of all events belonging to its last fetched instruction produces a valid computation.

#### Example



 $\operatorname{commit}(d) \cdot \operatorname{commit}(c).$ 



1. Let  $\tau := \tau_1 \cdot e_1 \cdot \tau_2 \cdot e_2 \cdots \tau_n$  be a shortest computation with cyclic happens-before relation.

- 1. Let  $\tau := \tau_1 \cdot e_1 \cdot \tau_2 \cdot e_2 \cdots \tau_n$  be a shortest computation with cyclic happens-before relation.
- 2. Let  $e_1 \cdots e_{n-1}$  be the deleted events.

- 1. Let  $\tau := \tau_1 \cdot e_1 \cdot \tau_2 \cdot e_2 \cdots \tau_n$  be a shortest computation with cyclic happens-before relation.
- 2. Let  $e_1 \cdots e_{n-1}$  be the deleted events.
- 3. Assume all fetch events are in  $\tau_1 \cdot e_1$  (one can always move them to the front).

- 1. Let  $\tau := \tau_1 \cdot e_1 \cdot \tau_2 \cdot e_2 \cdots \tau_n$  be a shortest computation with cyclic happens-before relation.
- 2. Let  $e_1 \cdots e_{n-1}$  be the deleted events.
- 3. Assume all fetch events are in  $\tau_1 \cdot e_1$  (one can always move them to the front).
- 4. Computation  $\tau' := \tau_1 \cdot \tau_2 \cdots \tau_n$  is shorter than  $\tau$

- 1. Let  $\tau := \tau_1 \cdot e_1 \cdot \tau_2 \cdot e_2 \cdots \tau_n$  be a shortest computation with cyclic happens-before relation.
- 2. Let  $e_1 \cdots e_{n-1}$  be the deleted events.
- 3. Assume all fetch events are in  $\tau_1 \cdot e_1$  (one can always move them to the front).
- 4. Computation  $\tau' := \tau_1 \cdot \tau_2 \cdots \tau_n$  is shorter than  $\tau$ ,  $\Rightarrow$  not violating.

- 1. Let  $\tau := \tau_1 \cdot e_1 \cdot \tau_2 \cdot e_2 \cdots \tau_n$  be a shortest computation with cyclic happens-before relation.
- 2. Let  $e_1 \cdots e_{n-1}$  be the deleted events.
- 3. Assume all fetch events are in  $\tau_1 \cdot e_1$  (one can always move them to the front).
- 4. Computation  $\tau' := \tau_1 \cdot \tau_2 \cdots \tau_n$  is shorter than  $\tau$ ,  $\Rightarrow$  not violating.
- 5. Let  $\sigma$  be a sequentially consistent version of  $\tau'$ .

- 1. Let  $\tau := \tau_1 \cdot e_1 \cdot \tau_2 \cdot e_2 \cdots \tau_n$  be a shortest computation with cyclic happens-before relation.
- 2. Let  $e_1 \cdots e_{n-1}$  be the deleted events.
- 3. Assume all fetch events are in  $\tau_1 \cdot e_1$  (one can always move them to the front).
- 4. Computation  $\tau' := \tau_1 \cdot \tau_2 \cdots \tau_n$  is shorter than  $\tau$ ,  $\Rightarrow$  not violating.
- 5. Let  $\sigma$  be a sequentially consistent version of  $\tau'$ .
- 6. Reorder events in the way they follow in  $\sigma$ :

$$\tau'' := \sigma \downarrow \tau_1 \cdot e_1 \cdots \sigma \downarrow \tau_n.$$

- 1. Let  $\tau := \tau_1 \cdot e_1 \cdot \tau_2 \cdot e_2 \cdots \tau_n$  be a shortest computation with cyclic happens-before relation.
- 2. Let  $e_1 \cdots e_{n-1}$  be the deleted events.
- 3. Assume all fetch events are in  $\tau_1 \cdot e_1$  (one can always move them to the front).
- 4. Computation  $\tau' := \tau_1 \cdot \tau_2 \cdots \tau_n$  is shorter than  $\tau$ ,  $\Rightarrow$  not violating.
- 5. Let  $\sigma$  be a sequentially consistent version of  $\tau'$ .
- 6. Reorder events in the way they follow in  $\sigma$ :

$$\tau'' := \sigma \downarrow \tau_1 \cdot e_1 \cdots \sigma \downarrow \tau_n.$$

#### Lemma

Computation au''

is in normal form,

- 1. Let  $\tau := \tau_1 \cdot e_1 \cdot \tau_2 \cdot e_2 \cdots \tau_n$  be a shortest computation with cyclic happens-before relation.
- 2. Let  $e_1 \cdots e_{n-1}$  be the deleted events.
- 3. Assume all fetch events are in  $\tau_1 \cdot e_1$  (one can always move them to the front).
- 4. Computation  $\tau' := \tau_1 \cdot \tau_2 \cdots \tau_n$  is shorter than  $\tau$ ,  $\Rightarrow$  not violating.
- 5. Let  $\sigma$  be a sequentially consistent version of  $\tau'$ .
- 6. Reorder events in the way they follow in  $\sigma$ :

$$\tau'' := \sigma \downarrow \tau_1 \cdot e_1 \cdots \sigma \downarrow \tau_n.$$

#### Lemma

Computation au''

- is in normal form,
- has the same happens-before relation as au

- 1. Let  $\tau := \tau_1 \cdot e_1 \cdot \tau_2 \cdot e_2 \cdots \tau_n$  be a shortest computation with cyclic happens-before relation.
- 2. Let  $e_1 \cdots e_{n-1}$  be the deleted events.
- 3. Assume all fetch events are in  $\tau_1 \cdot e_1$  (one can always move them to the front).
- 4. Computation  $\tau' := \tau_1 \cdot \tau_2 \cdots \tau_n$  is shorter than  $\tau$ ,  $\Rightarrow$  not violating.
- 5. Let  $\sigma$  be a sequentially consistent version of  $\tau'$ .
- 6. Reorder events in the way they follow in  $\sigma$ :

$$\tau'' := \sigma \downarrow \tau_1 \cdot e_1 \cdots \sigma \downarrow \tau_n.$$

#### Lemma

Computation au''

- is in normal form,
- has the same happens-before relation as  $\tau$ ,  $\Rightarrow$  violating.

#### Example

A shortest computation with cyclic happens-before relation:

- $\tau = (\text{fetch}(c) \cdot \text{fetch}(d) \cdot \text{fetch}(a)) \cdot \text{fetch}(b)$ 
  - $\cdot \quad (\operatorname{commit}(a) \cdot \operatorname{prop}(a, 1)) \cdot \underline{\operatorname{commit}(b)} \cdot \underline{\operatorname{prop}(b, 1)} \cdot \underline{\operatorname{prop}(b, 2)}$
  - $(load(c) \cdot load(d) \cdot commit(d) \cdot commit(c))$

### Example

The shortened computation:

- $\tau' = (fetch(c) \cdot fetch(d) \cdot fetch(a))$ 
  - $(\operatorname{commit}(a) \cdot \operatorname{prop}(a, 1))$
  - $(load(c) \cdot load(d) \cdot commit(d) \cdot commit(c))$

### Example

The shortened computation:

- $\tau' = (fetch(c) \cdot fetch(d) \cdot fetch(a))$ 
  - $(\operatorname{commit}(a) \cdot \operatorname{prop}(a, 1))$
  - ·  $(load(c) \cdot load(d) \cdot commit(d) \cdot commit(c))$

Matching sequentially consistent computation:

- $\sigma = \operatorname{fetch}(c) \cdot \operatorname{load}(c) \cdot \operatorname{commit}(c)$ 
  - fetch(d) load(d) commit(d)
  - $fetch(a) \cdot commit(a) \cdot prop(a, 1) \cdot prop(a, 2)$

#### Example

A shortest computation with cyclic happens-before relation:

- $\tau = (\operatorname{fetch}(c) \cdot \operatorname{fetch}(d) \cdot \operatorname{fetch}(a)) \cdot \operatorname{fetch}(b)$ 
  - $\cdot \quad (\operatorname{commit}(a) \cdot \operatorname{prop}(a, 1)) \cdot \underline{\operatorname{commit}(b)} \cdot \underline{\operatorname{prop}(b, 1)} \cdot \underline{\operatorname{prop}(b, 2)}$
  - ·  $(load(c) \cdot load(d) \cdot commit(d) \cdot commit(c))$

Matching sequentially consistent computation:

- $\sigma = \operatorname{fetch}(c) \cdot \operatorname{load}(c) \cdot \operatorname{commit}(c)$ 
  - $fetch(d) \cdot load(d) \cdot commit(d)$
  - $fetch(a) \cdot commit(a) \cdot prop(a, 1) \cdot prop(a, 2)$

- $\tau'' = (\text{fetch}(c) \cdot \text{fetch}(d) \cdot \text{fetch}(a)) \cdot \text{fetch}(b)$ 
  - ·  $(\operatorname{commit}(a) \cdot \operatorname{prop}(a, 1)) \cdot \operatorname{commit}(b) \cdot \operatorname{prop}(b, 1) \cdot \operatorname{prop}(b, 2)$
  - $(load(c) \cdot commit(c) \cdot load(d) \cdot commit(d))$

### Example

A shortest computation with cyclic happens-before relation:

- $\tau = (\text{fetch}(c) \cdot \text{fetch}(d) \cdot \text{fetch}(a)) \cdot \text{fetch}(b)$ 
  - $(\operatorname{commit}(a) \cdot \operatorname{prop}(a, 1)) \cdot \operatorname{commit}(b) \cdot \operatorname{prop}(b, 1) \cdot \operatorname{prop}(b, 2)$
  - $(load(c) \cdot load(d) \cdot commit(d) \cdot commit(c))$

Matching sequentially consistent computation:

- $\sigma = \operatorname{fetch}(c) \cdot \operatorname{load}(c) \cdot \operatorname{commit}(c)$ 
  - fetch(d) · load(d) · commit(d)
  - $fetch(a) \cdot commit(a) \cdot prop(a, 1) \cdot prop(a, 2)$

- $\tau'' = (\text{fetch}(c) \cdot \text{fetch}(d) \cdot \text{fetch}(a)) \cdot \text{fetch}(b)$ 
  - (commit(a) prop(a, 1)) commit(b) prop(b, 1) prop(b, 2)
  - (load(c) commit(c) load(d) commit(d))

### Example

A shortest computation with cyclic happens-before relation:

 $\tau = (\operatorname{fetch}(c) \cdot \operatorname{fetch}(d) \cdot \operatorname{fetch}(a)) \cdot \operatorname{fetch}(b)$ 

- $(\operatorname{commit}(a) \cdot \operatorname{prop}(a, 1)) \cdot \operatorname{commit}(b) \cdot \operatorname{prop}(b, 1) \cdot \operatorname{prop}(b, 2)$
- $(load(c) \cdot load(d) \cdot commit(d) \cdot commit(c))$

Matching sequentially consistent computation:

$$\sigma = \operatorname{fetch}(c) \cdot \operatorname{load}(c) \cdot \operatorname{commit}(c)$$

- fetch(d) load(d) commit(d)
- fetch(a) commit(a) prop(a, 1) prop(a, 2)

$$\tau'' = (\text{fetch}(c) \cdot \text{fetch}(d) \cdot \text{fetch}(a)) \cdot \text{fetch}(b)$$

- (commit(a) prop(a, 1)) commit(b) prop(b, 1) prop(b, 2)
- (load(c) commit(c) load(d) commit(d))

### Example

A shortest computation with cyclic happens-before relation:

- $\tau = (\operatorname{fetch}(c) \cdot \operatorname{fetch}(d) \cdot \operatorname{fetch}(a)) \cdot \operatorname{fetch}(b)$ 
  - $\cdot \quad (\operatorname{commit}(a) \cdot \operatorname{prop}(a, 1)) \cdot \operatorname{commit}(b) \cdot \operatorname{prop}(b, 1) \cdot \operatorname{prop}(b, 2)$
  - $(load(c) \cdot load(d) \cdot commit(d) \cdot commit(c))$

Matching sequentially consistent computation:

- $\sigma = \operatorname{fetch}(c) \cdot \operatorname{load}(c) \cdot \operatorname{commit}(c)$ 
  - fetch(d) load(d) commit(d)
  - $fetch(a) \cdot commit(a) \cdot prop(a, 1) \cdot prop(a, 2)$

- $\tau'' = (\operatorname{fetch}(c) \cdot \operatorname{fetch}(d) \cdot \operatorname{fetch}(a)) \cdot \operatorname{fetch}(b)$ 
  - $(\operatorname{commit}(a) \cdot \operatorname{prop}(a, 1)) \cdot \operatorname{commit}(b) \cdot \operatorname{prop}(b, 1) \cdot \operatorname{prop}(b, 2)$
  - (load(c) commit(c) load(d) commit(d))

### Example

A shortest computation with cyclic happens-before relation:

 $\tau = (\operatorname{fetch}(c) \cdot \operatorname{fetch}(d) \cdot \operatorname{fetch}(a)) \cdot \operatorname{fetch}(b)$ 

- $\cdot \quad (\operatorname{commit}(a) \cdot \operatorname{prop}(a, 1)) \cdot \underline{\operatorname{commit}(b)} \cdot \underline{\operatorname{prop}(b, 1)} \cdot \underline{\operatorname{prop}(b, 2)}$
- $(load(c) \cdot load(d) \cdot commit(d) \cdot commit(c))$

Matching sequentially consistent computation:

$$\sigma = \operatorname{fetch}(c) \cdot \operatorname{load}(c) \cdot \operatorname{commit}(c)$$

- fetch(d) load(d) commit(d)
- fetch(a) commit(a) prop(a, 1) prop(a, 2)

$$\tau'' = (\text{fetch}(c) \cdot \text{fetch}(d) \cdot \text{fetch}(a)) \cdot \text{fetch}(b)$$

- $(\operatorname{commit}(a) \cdot \operatorname{prop}(a, 1)) \cdot \operatorname{commit}(b) \cdot \operatorname{prop}(b, 1) \cdot \operatorname{prop}(b, 2)$
- (load(c) commit(c) load(d) commit(d))

### Example

A shortest computation with cyclic happens-before relation:

- $\tau = (\operatorname{fetch}(c) \cdot \operatorname{fetch}(d) \cdot \operatorname{fetch}(a)) \cdot \operatorname{fetch}(b)$ 
  - $(\operatorname{commit}(a) \cdot \operatorname{prop}(a, 1)) \cdot \operatorname{commit}(b) \cdot \operatorname{prop}(b, 1) \cdot \operatorname{prop}(b, 2)$
  - ·  $(load(c) \cdot load(d) \cdot commit(d) \cdot commit(c))$

Matching sequentially consistent computation:

- $\sigma = fetch(c) \cdot load(c) \cdot commit(c)$ 
  - fetch(d) load(d) commit(d)
  - $fetch(a) \cdot commit(a) \cdot prop(a, 1) \cdot prop(a, 2)$

$$\tau'' = (\operatorname{fetch}(c) \cdot \operatorname{fetch}(d) \cdot \operatorname{fetch}(a)) \cdot \operatorname{fetch}(b)$$

- (commit(a) prop(a, 1)) commit(b) prop(b, 1) prop(b, 2)
- $(load(c) \cdot commit(c) \cdot load(d) \cdot commit(d))$

#### Introduction

Power Architecture Robustness

#### Deciding Robustness

Characterization of Violating Computations Normal-Form Computations

#### Generating Normal-Form Computations

Checking Cyclicity of Happens-Before Relation Complexity

#### Conclusion

- Related Work
- Summary

Describe the language  ${\mathcal L}$  of all normal-form computations of a given degree.

Describe the language  ${\mathcal L}$  of all normal-form computations of a given degree.

We need a language class that

- ▶ includes *L*,
- ▶ is closed under intersection with regular languages ( $\mathcal{L} \cap \mathcal{R}$ ),
- has decidable emptiness problem  $(\mathcal{L} \cap \mathcal{R} \stackrel{?}{=} \emptyset)$ .

Describe the language  ${\boldsymbol{\mathcal L}}$  of all normal-form computations of a given degree.

We need a language class that

- ▶ includes *L*,
- ▶ is closed under intersection with regular languages ( $\mathcal{L} \cap \mathcal{R}$ ),
- has decidable emptiness problem  $(\mathcal{L} \cap \mathcal{R} \stackrel{?}{=} \emptyset)$ .

### Properties of $\mathcal{L}$

 Number of concurrently executed instructions is unbounded not regular.

Describe the language  ${\mathcal L}$  of all normal-form computations of a given degree.

We need a language class that

- ▶ includes *L*,
- ▶ is closed under intersection with regular languages  $(\mathcal{L} \cap \mathcal{R})$ ,
- has decidable emptiness problem  $(\mathcal{L} \cap \mathcal{R} \stackrel{?}{=} \emptyset)$ .

#### Properties of $\mathcal{L}$

- Number of concurrently executed instructions is unbounded
   not regular.
- Can include computations like (fetch)<sup>n</sup> · (load)<sup>n</sup> · (commit)<sup>n</sup> ⇒ not even context-free.

Solution

Define  $\mathcal{L}$  as a language of a multiheaded automaton.

#### Solution

Define  $\mathcal{L}$  as a language of a multiheaded automaton.

Definition (Multiheaded Automaton)

An *n*-headed automaton is an extension of NFA generating n parts of a computation simultaneously, one by each head.

Solution

Define  $\mathcal{L}$  as a language of a multiheaded automaton.

Definition (Multiheaded Automaton)

An *n*-headed automaton is an extension of NFA generating n parts of a computation simultaneously, one by each head.

Example (Generating  $\tau''$  with a 3-headed Automaton)

 $\tau'' =$ 

.

Solution

Define  $\mathcal{L}$  as a language of a multiheaded automaton.

Definition (Multiheaded Automaton)

An *n*-headed automaton is an extension of NFA generating n parts of a computation simultaneously, one by each head.

Example (Generating au'' with a 3-headed Automaton)

 $\tau'' = \text{fetch}(c)$ 

.

Solution

Define  $\mathcal{L}$  as a language of a multiheaded automaton.

Definition (Multiheaded Automaton)

An *n*-headed automaton is an extension of NFA generating n parts of a computation simultaneously, one by each head.

Example (Generating au'' with a 3-headed Automaton)

 $\tau'' = \text{fetch}(c)$ 

· load(c)

Solution

Define  $\mathcal{L}$  as a language of a multiheaded automaton.

Definition (Multiheaded Automaton)

An *n*-headed automaton is an extension of NFA generating n parts of a computation simultaneously, one by each head.

Example (Generating au'' with a 3-headed Automaton)

 $\tau'' = \text{fetch}(c)$ 

·  $load(c) \cdot commit(c)$ 

### Solution

Define  $\mathcal{L}$  as a language of a multiheaded automaton.

Definition (Multiheaded Automaton)

An *n*-headed automaton is an extension of NFA generating n parts of a computation simultaneously, one by each head.

- $\tau'' = \operatorname{fetch}(c) \cdot \operatorname{fetch}(d)$ 
  - ·  $load(c) \cdot commit(c)$

### Solution

Define  $\mathcal{L}$  as a language of a multiheaded automaton.

## Definition (Multiheaded Automaton)

An *n*-headed automaton is an extension of NFA generating n parts of a computation simultaneously, one by each head.

- $\tau'' = \operatorname{fetch}(c) \cdot \operatorname{fetch}(d)$ 
  - $\cdot \quad \mathsf{load}(c) \cdot \mathsf{commit}(c) \cdot \mathsf{load}(d)$

### Solution

Define  $\mathcal{L}$  as a language of a multiheaded automaton.

## Definition (Multiheaded Automaton)

An *n*-headed automaton is an extension of NFA generating n parts of a computation simultaneously, one by each head.

- $\tau'' = \operatorname{fetch}(c) \cdot \operatorname{fetch}(d)$ 
  - ·  $load(c) \cdot commit(c) \cdot load(d) \cdot commit(d)$

### Solution

٠

Define  $\mathcal{L}$  as a language of a multiheaded automaton.

## Definition (Multiheaded Automaton)

An *n*-headed automaton is an extension of NFA generating n parts of a computation simultaneously, one by each head.

Example (Generating au'' with a 3-headed Automaton)

$$\tau'' = \operatorname{fetch}(c) \cdot \operatorname{fetch}(d) \cdot \operatorname{fetch}(a)$$

·  $load(c) \cdot commit(c) \cdot load(d) \cdot commit(d)$ 

### Solution

Define  $\mathcal{L}$  as a language of a multiheaded automaton.

## Definition (Multiheaded Automaton)

An *n*-headed automaton is an extension of NFA generating n parts of a computation simultaneously, one by each head.

- $\tau'' = \operatorname{fetch}(c) \cdot \operatorname{fetch}(d) \cdot \operatorname{fetch}(a)$ 
  - commit(a)
  - ·  $load(c) \cdot commit(c) \cdot load(d) \cdot commit(d)$

### Solution

Define  $\mathcal{L}$  as a language of a multiheaded automaton.

## Definition (Multiheaded Automaton)

An *n*-headed automaton is an extension of NFA generating n parts of a computation simultaneously, one by each head.

- $\tau'' = \operatorname{fetch}(c) \cdot \operatorname{fetch}(d) \cdot \operatorname{fetch}(a)$ 
  - $\cdot$  commit(a)  $\cdot$  prop(a, 1)
  - $\cdot \quad \mathsf{load}(c) \cdot \mathsf{commit}(c) \cdot \mathsf{load}(d) \cdot \mathsf{commit}(d)$

### Solution

Define  $\mathcal{L}$  as a language of a multiheaded automaton.

## Definition (Multiheaded Automaton)

An *n*-headed automaton is an extension of NFA generating n parts of a computation simultaneously, one by each head.

- $\tau'' = \text{fetch}(c) \cdot \text{fetch}(d) \cdot \text{fetch}(a) \cdot \text{fetch}(b)$ 
  - $\cdot$  commit(a)  $\cdot$  prop(a, 1)
  - ·  $load(c) \cdot commit(c) \cdot load(d) \cdot commit(d)$

### Solution

Define  $\mathcal{L}$  as a language of a multiheaded automaton.

## Definition (Multiheaded Automaton)

An *n*-headed automaton is an extension of NFA generating n parts of a computation simultaneously, one by each head.

- $\tau'' = \text{fetch}(c) \cdot \text{fetch}(d) \cdot \text{fetch}(a) \cdot \text{fetch}(b)$ 
  - $\cdot$  commit(a)  $\cdot$  prop(a, 1)  $\cdot$  commit(b)
  - ·  $load(c) \cdot commit(c) \cdot load(d) \cdot commit(d)$

### Solution

Define  $\mathcal{L}$  as a language of a multiheaded automaton.

## Definition (Multiheaded Automaton)

An *n*-headed automaton is an extension of NFA generating n parts of a computation simultaneously, one by each head.

- $\tau'' = \operatorname{fetch}(c) \cdot \operatorname{fetch}(d) \cdot \operatorname{fetch}(a) \cdot \operatorname{fetch}(b)$ 
  - $\cdot$  commit(a)  $\cdot$  prop(a, 1)  $\cdot$  commit(b)  $\cdot$  prop(b, 1)
  - ·  $load(c) \cdot commit(c) \cdot load(d) \cdot commit(d)$

### Solution

Define  $\mathcal{L}$  as a language of a multiheaded automaton.

## Definition (Multiheaded Automaton)

An *n*-headed automaton is an extension of NFA generating n parts of a computation simultaneously, one by each head.

- $\tau'' = \text{fetch}(c) \cdot \text{fetch}(d) \cdot \text{fetch}(a) \cdot \text{fetch}(b)$ 
  - $\cdot \quad \operatorname{commit}(a) \cdot \operatorname{prop}(a, 1) \cdot \operatorname{commit}(b) \cdot \operatorname{prop}(b, 1) \cdot \operatorname{prop}(b, 2)$
  - ·  $load(c) \cdot commit(c) \cdot load(d) \cdot commit(d)$

#### Introduction

Power Architecture Robustness

#### Deciding Robustness

Characterization of Violating Computations Normal-Form Computations Generating Normal-Form Computations Checking Cyclicity of Happens-Before Relation

Complexity

#### Conclusion

- Related Work
- Summary

## Checking Cyclicity of Happens-Before Relation

### Example (Happens-Before Relation of $\tau''$ )



Solution

## Checking Cyclicity of Happens-Before Relation

### Example (Happens-Before Relation of $\tau''$ )



## Solution

The multiheaded automaton in each thread picks two instructions in program order.

# Checking Cyclicity of Happens-Before Relation

### Example (Happens-Before Relation of $\tau''$ )



## Solution

- The multiheaded automaton in each thread picks two instructions in program order.
- Finite automata check edges between picked instructions from different threads.

#### Introduction

Power Architecture Robustness

#### Deciding Robustness

Characterization of Violating Computations Normal-Form Computations Generating Normal-Form Computations Checking Cyclicity of Happens-Before Relation Complexity

#### Conclusion

- Related Work
- Summary

## Complexity

### Theorem

Assuming finite memory, robustness is PSPACE-complete.

Proof.

## Complexity

### Theorem

Assuming finite memory, robustness is PSPACE-complete.

Proof.

• Upper bound:  $\mathcal{L} \cap \mathcal{R} \stackrel{?}{=} \emptyset$ .

## Complexity

### Theorem

Assuming finite memory, robustness is PSPACE-complete.

Proof.

- Upper bound:  $\mathcal{L} \cap \mathcal{R} \stackrel{?}{=} \emptyset$ .
- Lower bound: SC state reachability [Kozen, 1977].

#### Introduction

Power Architecture

#### **Deciding Robustness**

Characterization of Violating Computations Normal-Form Computations Generating Normal-Form Computations Checking Cyclicity of Happens-Before Relation Complexity

#### Conclusion

Related Work Summary

- Robustness
  - Characterization: [Shasha and Snir, 1988].
  - Monitoring algorithms:
    - [Burckhardt and Musuvathi, 2008] (TSO-only, broken),
    - [Burnim et al., 2011] (TSO, PSO).
  - Static overapproximation and fence insertion: [Alglave and Maranget, 2011] (TSO, Power).
  - Decidability
    - ▶ [Bouajjani et al., 2011] (TSO, PSPACE-completeness),
    - [Bouajjani et al., 2013] (TSO, reduction to SC reachability, fence insertion).

- Robustness
  - Characterization: [Shasha and Snir, 1988].
  - Monitoring algorithms:
    - [Burckhardt and Musuvathi, 2008] (TSO-only, broken),
    - [Burnim et al., 2011] (TSO, PSO).
  - Static overapproximation and fence insertion: [Alglave and Maranget, 2011] (TSO, Power).
  - Decidability
    - ▶ [Bouajjani et al., 2011] (TSO, PSPACE-completeness),
    - [Bouajjani et al., 2013] (TSO, reduction to SC reachability, fence insertion).
- State reachability
  - ▶ PSPACE for SC [Kozen, 1977],
  - ▶ Non-primitive recursive for TSO [Atig et al., 2010].

- Robustness
  - Characterization: [Shasha and Snir, 1988].
  - Monitoring algorithms:
    - [Burckhardt and Musuvathi, 2008] (TSO-only, broken),
    - [Burnim et al., 2011] (TSO, PSO).
  - Static overapproximation and fence insertion: [Alglave and Maranget, 2011] (TSO, Power).
  - Decidability
    - ▶ [Bouajjani et al., 2011] (TSO, PSPACE-completeness),
    - [Bouajjani et al., 2013] (TSO, reduction to SC reachability, fence insertion).
- State reachability
  - PSPACE for SC [Kozen, 1977],
  - ▶ Non-primitive recursive for TSO [Atig et al., 2010].
- Power models:
  - [Sarkar et al., 2011] (operational),
  - [Mador-Haim et al., 2012] (axiomatic),
  - [Alglave et al., 2013] (overview, newer axiomatic),
  - [Maranget et al., ] (tutorial, with ARM).

## Reduction of Robustness to Language Emptiness

- Look only for normal-form violating computations.
- Use multiheaded automata to generate normal-form computations.
- Check cyclicity of happens-before by regular intersection.

## Reduction of Robustness to Language Emptiness

- Look only for normal-form violating computations.
- Use multiheaded automata to generate normal-form computations.
- Check cyclicity of happens-before by regular intersection.

### Robustness against Power is $\ensuremath{\operatorname{PSPACE}}$ -complete

- Upper bound: reduction to language emptiness.
- Lower bound: sequentially consistent state reachability.

## Reduction of Robustness to Language Emptiness

- Look only for normal-form violating computations.
- Use multiheaded automata to generate normal-form computations.
- Check cyclicity of happens-before by regular intersection.

### Robustness against Power is $\ensuremath{\operatorname{PSPACE}}$ -complete

- Upper bound: reduction to language emptiness.
- Lower bound: sequentially consistent state reachability.

## First decidability result for Power!

## Reduction of Robustness to Language Emptiness

- Look only for normal-form violating computations.
- Use multiheaded automata to generate normal-form computations.
- Check cyclicity of happens-before by regular intersection.

### Robustness against Power is PSPACE-complete

- Upper bound: reduction to language emptiness.
- Lower bound: sequentially consistent state reachability.

## First decidability result for Power!

Thank you for your attention. Questions? derevenetc@cs.uni-kl.de

## References I

- Alglave, J. and Maranget, L. (2011).
   Stability in weak memory models.
   In CAV, volume 6806 of LNCS, pages 50–66. Springer.
- Alglave, J., Maranget, L., and Tautschnig, M. (2013). Herding cats. CoRR, abs/1308.6810.
- Atig, M. F., Bouajjani, A., Burckhardt, S., and Musuvathi, M. (2010).
   On the verification problem for weak memory models.

In POPL, pages 7-18. ACM.

 Bouajjani, A., Derevenetc, E., and Meyer, R. (2013). Checking and enforcing robustness against TSO. In ESOP, LNCS, pages 533–553. Springer.

## References II

- Bouajjani, A., Meyer, R., and Möhlmann, E. (2011).
   Deciding robustness against Total Store Ordering.
   In *ICALP*, volume 6756 of *LNCS*, pages 428–440. Springer.
- Burckhardt, S. and Musuvathi, M. (2008).
   Effective program verification for relaxed memory models.
   In CAV, volume 5123 of LNCS, pages 107–120. Springer.
- Burnim, J., Stergiou, C., and Sen, K. (2011). Sound and complete monitoring of sequential consistency for relaxed memory models.

In TACAS, volume 6605 of LNCS, pages 11-25. Springer.

**Kozen**, D. (1977).

Lower bounds for natural proof systems. In *FOCS*, pages 254–266. IEEE.

## References III

Lamport, L. (1979).

How to make a multiprocessor computer that correctly executes multiprocess programs.

IEEE Transactions on Computers, 28(9):690–691.

 Mador-Haim, S., Maranget, L., Sarkar, S., Memarian, K., Alglave, J., Owens, S., Alur, R., Martin, M. M. K., Sewell, P., and Williams, D. (2012).
 An axiomatic memory model for power multiprocessors. In CAV, pages 495–512. Springer.



Maranget, L., Sarkar, S., and Sewell, P. A tutorial introduction to the ARM and POWER relaxed memory models. https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/ test7.pdf. Draft.

## References IV

```
    Sarkar, S., Sewell, P., Alglave, J., Maranget, L., and Williams,
D. (2011).
    Understanding POWER multiprocessors.
    In PLDI, pages 175–186. ACM.
```

```
    Shasha, D. and Snir, M. (1988).
    Efficient and correct execution of parallel programs that share memory.
    ACM TOPLAS, 10(2):282–312.
```