# Computer-aided Verification and Construction under Relaxed Memory Models

Graduate School Weak Consistency (weacon)

Roland Meyer

Technische Universität Kaiserslautern

#### Concurrent Programs with Shared Memory

- Finite number of shared variables  $\{x, y, x_1, \ldots\}$
- Finite data domain  $\{d, d_0, d_1, \ldots\}$
- Finite number of finite-control threads  $T_1, \ldots, T_n$  with operations:

$$x = y = 0$$

Thread 1

 $a: x = 1$ 
 $b: if(y == 0)$ {

 $c: crit. sect. 1$ 
 $d:$ }

Thread 2

 $p: y = 1$ 
 $q: if(x == 0)$ {

 $r: crit. sect. 2$ 
 $s:$ }

Dekker's mutual exclusion protocol.

- Threads directly write to and read from memory
- Classical interleaving semantics
  - Computations of different threads are shuffled
  - Program order is preserved for each thread

| x = y = 0                                           |                                                   |                   | Mem           |
|-----------------------------------------------------|---------------------------------------------------|-------------------|---------------|
| Thread 1 $a: x = 1$                                 | Thread 2 $p: v = 1$                               | Thread 1 $pc = a$ | <i>x</i><br>0 |
| a: x = 1<br>b: if(y == 0){<br>c: crit.sect.1<br>d:} | $q: if(x == 0) \{$<br>r: crit. sect. 2<br>$s: \}$ | Thread 2 $pc = p$ | <i>y</i><br>0 |

- Threads directly write to and read from memory
- Classical interleaving semantics
  - Computations of different threads are shuffled
  - Program order is preserved for each thread

| x = y = 0                                                |                                             |                   | Mem           |
|----------------------------------------------------------|---------------------------------------------|-------------------|---------------|
| Thread 1 $a: x = 1$                                      | Thread 2 $p: y = 1$                         | Thread 1 $pc = b$ | x<br>1        |
| a: x = 1<br>b: if (y == 0) {<br>c: crit. sect. 1<br>d: } | $q: if(x == 0)$ { $r: crit. sect. 2$ $s:$ } | Thread 2 $pc = p$ | <i>y</i><br>0 |

- Threads directly write to and read from memory
- Classical interleaving semantics
  - Computations of different threads are shuffled
  - Program order is preserved for each thread

| x = y = 0                                             |                                         |                   | Mem           |
|-------------------------------------------------------|-----------------------------------------|-------------------|---------------|
| Thread 1 $a: x = 1$                                   | Thread 2  p: y = 1                      | Thread 1 $pc = c$ | x<br>1        |
| a: x = 1<br>b: if(y == 0){<br>c: crit. sect. 1<br>d:} | q: if(x == 0){<br>r: crit.sect.2<br>s:} | Thread 2 $pc = p$ | <i>y</i><br>0 |

- Threads directly write to and read from memory
- Classical interleaving semantics
  - Computations of different threads are shuffled
  - Program order is preserved for each thread

| x = y = 0                                             |                                         |                   | Mem    |
|-------------------------------------------------------|-----------------------------------------|-------------------|--------|
| Thread 1 $a: x = 1$                                   | Thread 2  p: y = 1                      | Thread 1 $pc = c$ | x<br>1 |
| a: x = 1<br>b: if(y == 0){<br>c: crit. sect. 1<br>d:} | q: if(x == 0){<br>r: crit.sect.2<br>s:} | Thread 2 $pc = q$ | у<br>1 |

- Threads directly write to and read from memory
- Classical interleaving semantics
  - Computations of different threads are shuffled
  - Program order is preserved for each thread

| x = y = 0                     |                                                   | Throad 1                                                   | Mem     |
|-------------------------------|---------------------------------------------------|------------------------------------------------------------|---------|
| Thread 1                      | Thread 2                                          | pc = c                                                     | n holds |
| a: x = 1<br>$b: if(y == 0)$ { | p: y = 1<br>$q: if(x == 0)$ {<br>r: crit. sect. 2 | Thread 1 $pc = c$ $tilde{val} exclusion  Mutual exclusion$ | у       |
| c: crit.sect.1<br>d:}         | r: crit.sect.2<br>s:}                             | Muttle q                                                   | 1       |

- Sequential Consistency forbids compiler and hardware optimizations
- Hence is not implemented by any processor
- Processors have various buffers to reduce latency of memory accesses
- Behavior captured by relaxed memory models
- Here: Total Store Ordering (TSO) memory model

- TSO architectures have write buffers
- FIFO buffers that store writes for later execution
- Read takes value from memory if no write to that variable is buffered
- Otherwise read value of last write to that variable in the buffer

| x =                                                   | y = 0                                                       |                   | Mem               |
|-------------------------------------------------------|-------------------------------------------------------------|-------------------|-------------------|
| Thread 1                                              | Thread 2                                                    | Thread 1 $pc = a$ | <i>x</i><br>0     |
| a: x = 1<br>b: if(y == 0){<br>c: crit. sect. 1<br>d:} | p: y = 1<br>$q: if(x == 0)$ {<br>r: crit. sect. 2<br>$s:$ } | Thread 2 $pc = p$ | <br><i>y</i><br>0 |

- TSO architectures have write buffers
- FIFO buffers that store writes for later execution
- Read takes value from memory if no write to that variable is buffered
- Otherwise read value of last write to that variable in the buffer

| x = y = 0                                             |                                                       |                   |        | Mem           |
|-------------------------------------------------------|-------------------------------------------------------|-------------------|--------|---------------|
| Thread 1                                              | Thread 2                                              | Thread 1 $pc = b$ | w(x,1) | <i>x</i><br>0 |
| a: x = 1<br>b: if(y == 0){<br>c: crit. sect. 1<br>d:} | p: y = 1<br>q: if(x == 0){<br>r: crit. sect. 2<br>s:} | Thread 2 $pc = p$ |        | у<br>О        |

- TSO architectures have write buffers
- FIFO buffers that store writes for later execution
- Read takes value from memory if no write to that variable is buffered
- Otherwise read value of last write to that variable in the buffer

| x = y = 0                                             |                                                       |                   |        | Mem           |
|-------------------------------------------------------|-------------------------------------------------------|-------------------|--------|---------------|
| Thread 1                                              | Thread 2                                              | Thread 1 $pc = c$ | w(x,1) | <i>x</i><br>0 |
| a: x = 1<br>b: if(y == 0){<br>c: crit. sect. 1<br>d:} | p: y = 1<br>q: if(x == 0){<br>r: crit. sect. 2<br>s:} | Thread 2 $pc = p$ |        | <i>y</i><br>0 |

- TSO architectures have write buffers
- FIFO buffers that store writes for later execution
- Read takes value from memory if no write to that variable is buffered
- Otherwise read value of last write to that variable in the buffer

| x =                                                   | y = 0                                                |                   |        | Mem           |
|-------------------------------------------------------|------------------------------------------------------|-------------------|--------|---------------|
| Thread 1                                              | Thread 2                                             | Thread 1 $pc = c$ | w(x,1) | <i>x</i><br>0 |
| a: x = 1<br>b: if(y == 0){<br>c: crit. sect. 1<br>d:} | p: y = 1<br>q: if(x == 0){<br>r: crit.sect. 2<br>s:} | Thread 2 $pc = q$ | w(y,1) | у<br>О        |

- TSO architectures have write buffers
- FIFO buffers that store writes for later execution
- Read takes value from memory if no write to that variable is buffered
- Otherwise read value of last write to that variable in the buffer

| x = y = 0                                             |                                                       |                   |         | Mem           |
|-------------------------------------------------------|-------------------------------------------------------|-------------------|---------|---------------|
| Thread 1                                              | Thread 2                                              | Thread 1 $pc = c$ | w(x, 1) | <i>x</i><br>0 |
| a: x = 1<br>b: if(y == 0){<br>c: crit. sect. 1<br>d:} | p: y = 1<br>q: if(x == 0){<br>r: crit. sect. 2<br>s:} | Thread 2 $pc = q$ |         | у<br>1        |

- TSO architectures have write buffers
- FIFO buffers that store writes for later execution
- Read takes value from memory if no write to that variable is buffered
- Otherwise read value of last write to that variable in the buffer

| x =                                                   | y = 0                                               |                   |        | Mem           |
|-------------------------------------------------------|-----------------------------------------------------|-------------------|--------|---------------|
| Thread 1                                              | Thread 2                                            | Thread 1 $pc = c$ | w(x,1) | <i>x</i><br>0 |
| a: x = 1<br>b: if(y == 0){<br>c: crit. sect. 1<br>d:} | p: y = 1<br>q: if(x == 0){<br>r: crit.sect.2<br>s:} | Thread 2 $pc = r$ |        | у<br>1        |

- TSO architectures have write buffers
- FIFO buffers that store writes for later execution
- Read takes value from memory if no write to that variable is buffered
- Otherwise read value of last write to that variable in the buffer

| x =                                                   | y = 0                                                 |                   | Mem        |
|-------------------------------------------------------|-------------------------------------------------------|-------------------|------------|
| Thread 1                                              | Thread 2                                              | Thread 1 $pc = c$ | x<br>1     |
| a: x = 1<br>b: if(y == 0){<br>c: crit. sect. 1<br>d:} | p: y = 1<br>q: if(x == 0){<br>r: crit. sect. 2<br>s:} | Thread 2 $pc = r$ | <br>у<br>1 |

- TSO architectures have write buffers
- FIFO buffers that store writes for later execution
- Read takes value from memory if no write to that variable is buffered
- Otherwise read value of last write to that variable in the buffer

| x = y = 0                           |                                                       | 111                                 | Mem           |
|-------------------------------------|-------------------------------------------------------|-------------------------------------|---------------|
| Thread 1 $a: x = 1$                 | Thread 2  p: y = 1                                    | Thread 1 $pc = c$ $clusion fails!!$ | X             |
| b: if(y == 0){ c: crit. sect. 1 d:} | p: y = 1<br>q: if(x == 0){<br>r: crit. sect. 2<br>s:} | Mutual ex                           | <i>y</i><br>1 |

Relaxed executions may lead to bad behavior

Relaxed executions may lead to bad behavior

If this is the real world, why does anything work?

Relaxed executions may lead to bad behavior

If this is the real world, why does anything work?

**Theorem [Adve, Hill 1993]** If a program is data-race-free, then SC and TSO semantics coincide.

Relaxed executions may lead to bad behavior

If this is the real world, why does anything work?

**Theorem [Adve, Hill 1993]** If a program is data-race-free, then SC and TSO semantics coincide.

So, go and write data-race-free programs!

Relaxed executions may lead to bad behavior

If this is the real world, why does anything work?

**Theorem [Adve, Hill 1993]** If a program is data-race-free, then SC and TSO semantics coincide.

So, go and write data-race-free programs!

Works in 90% of the cases

Relaxed executions may lead to bad behavior

If this is the real world, why does anything work?

**Theorem [Adve, Hill 1993]** If a program is data-race-free, then SC and TSO semantics coincide.

So, go and write data-race-free programs!

Works in 90% of the cases

Performance-critical code has data races

Relaxed executions may lead to bad behavior

If this is the real world, why does anything work?

**Theorem [Adve, Hill 1993]** If a program is data-race-free, then SC and TSO semantics coincide.

So, go and write data-race-free programs!

Works in 90% of the cases

Performance-critical code has data races

Concurrency libraries Operating systems HPC@Fraunhofer ITWM

Relaxed executions may lead to bad behavior

If this is the real world, why does anything work?

**Theorem [Adve, Hill 1993]** If a program is data-race-free, then SC and TSO semantics coincide.

So, go and write data-race-free programs!

Works in 90% of the cases

Performance-critical code has data races

Concurrency libraries Operating systems HPC@Fraunhofer ITWM

This is where our verification techniques apply

#### Outline

- Shared Memory Concurrency
  - Sequential Consistency Semantics
  - Total Store Ordering Semantics
- Reachability
- Robustness
- Synchronization Inference

May 2014

## Reachability

[Atig, Bouajjani, Burckhardt, Musuvathi, POPL'10]

#### State Reachability Problem

Consider a memory model MM

State Reachability Problem for MM

**Input**: Program P and a (control + memory) state s.

**Problem**: Is s reachable when P is run under MM?

#### State Reachability Problem

Consider a memory model MM

State Reachability Problem for MM

**Input**: Program P and a (control + memory) state s.

**Problem**: Is s reachable when P is run under MM?

#### Decidability / Complexity ?

Each thread is finite-state

For the SC memory model, this problem is PSPACE-complete

May 2014

#### State Reachability Problem

Consider a memory model MM

State Reachability Problem for MM

**Input**: Program P and a (control + memory) state s.

**Problem**: Is s reachable when P is run under MM?

#### Decidability / Complexity ?

Each thread is finite-state

- For the SC memory model, this problem is PSPACE-complete
- Non-trivial for relaxed memory models:

$$Paths_{TSO}(P) = Closure_{TSO}(Paths_{SC}(P))$$
 is non-regular

## Reachability

[Atig, Bouajjani, Burckhardt, Musuvathi, POPL'10]

#### Decidability:

Simulation of TSO semantics by Lossy Channel Systems

#### Decidability of State Reachability for TSO

#### Theorem [ABBM 2010]

The state reachability problem for TSO is reducible to the control-state reachability problem for LCS.

#### Decidability of State Reachability for TSO

#### Theorem [ABBM 2010]

The state reachability problem for TSO is reducible to the control-state reachability problem for LCS.

#### Theorem [Abdulla, Jonsson 1993]

The control-state reachability problem for LCS is decidable.

#### Decidability of State Reachability for TSO

#### Theorem [ABBM 2010]

The state reachability problem for TSO is reducible to the control-state reachability problem for LCS.

#### Theorem [Abdulla, Jonsson 1993]

The control-state reachability problem for LCS is decidable.

#### **Corollary**

The state reachability problem for TSO is decidable.

#### From TSO to LCS 1/5

```
Thread 1: x = 1; y = 1; x = 2; y = 2; y = 3;

Thread 2: if (x == 2) { if (y == 0) {...} }
```

#### Write buffers are perfect FIFO channels



May 2014

#### From TSO to LCS 1/5

Thread 1: 
$$x = 1$$
;  $y = 1$ ;  $x = 2$ ;  $y = 2$ ;  $y = 3$ ;  
Thread 2: if  $(x == 2)$  { if  $(y == 0)$  {...} }

#### Write buffers are perfect FIFO channels



Mem

#### From TSO to LCS 1/5

Thread 1: 
$$x = 1$$
;  $y = 1$ ;  $x = 2$ ;  $y = 2$ ;  $y = 3$ ;  
Thread 2: if  $(x == 2)$  { if  $(y == 0)$  {...} }

#### Write buffers are perfect FIFO channels



Mem

Thread 1: 
$$x = 1$$
;  $y = 1$ ;  $x = 2$ ;  $y = 2$ ;  $y = 3$ ;  
Thread 2: if  $(x == 2)$  { if  $(y == 0)$  {...} }

#### Write buffers are perfect FIFO channels



Mem

Thread 1: 
$$x = 1$$
;  $y = 1$ ;  $x = 2$ ;  $y = 2$ ;  $y = 3$ ;  
Thread 2: if  $(x == 2)$  { if  $(y == 0)$  {...} }

#### Write buffers are perfect FIFO channels



Thread 1: 
$$x = 1; y = 1; x = 2; y = 2; y = 3;$$
 Thread 2: if  $(x == 2)$  { if  $(y == 0)$  {...} }

#### Write buffers are perfect FIFO channels



Thread 2 reads x = 2

Thread 1: 
$$x = 1$$
;  $y = 1$ ;  $x = 2$ ;  $y = 2$ ;  $y = 3$ ; Thread 2: if  $(x == 2)$  { if  $(y == 0)$  {...} }

#### Write buffers are perfect FIFO channels



Thread 2 deadlocks as y = 1

```
Thread 1: x = 1; y = 1; x = 2; y = 2; y = 3;

Thread 2: if (x == 2) { if (y == 0) {...} }
```

- Write buffers made for batch processing
- Batch processing is similar to lossiness
- So assume write buffers are lossy FIFO channels



Mem

```
Thread 1: x = 1; y = 1; x = 2; y = 2; y = 3;

Thread 2: if (x == 2) { if (y == 0) {...} }
```

- Write buffers made for batch processing
- Batch processing is similar to lossiness
- So assume write buffers are lossy FIFO channels



Mem

```
Thread 1: x = 1; y = 1; x = 2; y = 2; y = 3;

Thread 2: if (x == 2) { if (y == 0) {...} }
```

- Write buffers made for batch processing
- Batch processing is similar to lossiness
- So assume write buffers are lossy FIFO channels



Mem

```
Thread 1: x = 1; y = 1; x = 2; y = 2; y = 3;

Thread 2: if (x == 2) { if (y == 0) {...} }
```

- Write buffers made for batch processing
- Batch processing is similar to lossiness
- So assume write buffers are lossy FIFO channels



Mem

```
Thread 1: x = 1; y = 1; x = 2; y = 2; y = 3;

Thread 2: if (x == 2) { if (y == 0) {...} }
```

- Write buffers made for batch processing
- Batch processing is similar to lossiness
- So assume write buffers are lossy FIFO channels



Mem

```
Thread 1: x = 1; y = 1; x = 2; y = 2; y = 3;

Thread 2: if (x == 2) { if (y == 0) {...} }
```

- Write buffers made for batch processing
- Batch processing is similar to lossiness
- So assume write buffers are lossy FIFO channels



Mem

```
Thread 1:
               x = 1: y = 1: x = 2: y = 2: y = 3:
Thread 2: if (x == 2) { if (y == 0) {...} }
```

- Write buffers made for batch processing
- Batch processing is similar to lossiness
- So assume write buffers are lossy FIFO channels



This is wrong! Lost the effect of w(y, 1).

Mem

2 х

#### Channel = sequence of memory states + lossiness

$$x = 2$$
  $x = 2$   $x = 2$   $x = 1$   $x = 1$   
 $y = 3$   $y = 2$   $y = 1$   $y = 1$   $y = 0$ 

#### Mem

0

X

0 y

Mem

0

x

$$w(y,3)\ w(y,2)\ w(x,2)\ w(y,1)\ w(x,1)$$

#### Channel = sequence of memory states + lossiness

$$x = 2$$
  $x = 2$   $x = 2$   $x = 1$   $x = 1$   
 $y = 3$   $y = 2$   $y = 1$   $y = 1$   $y = 0$ 

#### Mem

0

X

0 y

Mem

0

X

$$\mathsf{TSO}\ \mathsf{buffer} = \mathsf{perfect}\ \mathsf{FIFO}\ \mathsf{channel}$$

$$w(y,3)\ w(y,2)\ w(x,2)\ w(y,1)\ w(x,1)$$

#### Channel = sequence of memory states + lossiness

$$x = 2$$
  $x = 2$   $x = 2$   $x = 1$   $x = 1$   
 $y = 3$   $y = 2$   $y = 1$   $y = 0$ 

#### Mem

0

x

0 y

Mem

0

X

$$\mathsf{TSO}\ \mathsf{buffer} = \mathsf{perfect}\ \mathsf{FIFO}\ \mathsf{channel}$$

$$w(y,3)\ w(y,2)\ w(x,2)\ w(y,1)\ w(x,1)$$

#### Channel = sequence of memory states + lossiness

$$x = 2$$
  $x = 2$   $x = 2$   
 $y = 3$   $y = 2$   $y = 1$ 

#### Mem

0

X

0

У

#### Mem

1

X

$$\mathsf{TSO}\ \mathsf{buffer} = \mathsf{perfect}\ \mathsf{FIFO}\ \mathsf{channel}$$

$$w(y,3)\ w(y,2)\ w(x,2)\ w(y,1)$$

#### Channel = sequence of memory states + lossiness

$$x = 2$$
  $x = 2$   $x = 2$   
 $y = 3$   $y = 2$   $y = 1$ 

#### Mem

1

x

0

У

#### Mem

1

X

$$\mathsf{TSO}\ \mathsf{buffer} = \mathsf{perfect}\ \mathsf{FIFO}\ \mathsf{channel}$$

$$w(y,3)\ w(y,2)\ w(x,2)\ w(y,1)$$

#### Channel = sequence of memory states + lossiness

$$x = 2$$
  $x = 2$   
 $y = 3$   $y = 2$ 



#### Mem

1

X

0 y

#### Mem

2

X

L

$$\mathsf{TSO}\ \mathsf{buffer} = \mathsf{perfect}\ \mathsf{FIFO}\ \mathsf{channel}$$

$$w(y,3)\ \textcolor{red}{w(y,2)}\ w(x,2)$$

#### Channel = sequence of memory states + lossiness



#### Mem

1

X

1 y

#### Mem

2

X

1

Lossiness = unobservable memory states

$$\mathsf{TSO}\ \mathsf{buffer} = \mathsf{perfect}\ \mathsf{FIFO}\ \mathsf{channel}$$

 $w(y,3)\ \textcolor{red}{w(y,2)}$ 

#### Channel = sequence of memory states + lossiness



#### Mem

2

x

1

y

#### Mem

2

X



- Write: Compute a new memory state; send it to the channel
- Read: Check the channel/memory
- Memory update: Receive a state; copy it to the memory

Problem: Interference between threads?



- Write: Compute a new memory state; send it to the channel
- Read: Check the channel/memory
- Memory update: Receive a state; copy it to the memory

Problem: Interference between threads?

Each thread guesses writes of other threads

| Thread | $\rightarrow$ |  | $\rightarrow$ | Memory |
|--------|---------------|--|---------------|--------|
|        |               |  |               |        |

- Write: Compute a new memory state; send it to the channel
- Read: Check the channel/memory
- Memory update: Receive a state; copy it to the memory
- Guessed Write: Send the guessed state to the channel

Problem: Interference between threads?

Each thread guesses writes of other threads

| Thread | $\rightarrow$ |  | $\rightarrow$     | Memory |
|--------|---------------|--|-------------------|--------|
| Thread | $\rightarrow$ |  | $\longrightarrow$ | Memor  |

- Write: Compute a new memory state; send it to the channel
- Read: Check the channel/memory
- Memory update: Receive a state; copy it to the memory
- Guessed Write: Send the guessed state to the channel

Check that all threads agree on their guesses

Problem: Interference between threads?

Each thread guesses writes of other threads

Thread  $\rightarrow$   $\rightarrow$  Memory

- Write: Compute a new memory state; send it to the channel
- Read: Check the channel/memory
- Memory update: Receive a state; copy it to the memory
- Guessed Write: Send the guessed state to the channel

Check that all threads agree on their guesses

Synchronization of the LCS over send actions

#### Theorem [ABBM 2010]

The state reachability problem for TSO is reducible to the control-state reachability problem for LCS.

## State Reachability: Conclusion

- Decidable for TSO (and beyond)
- But it is a hard problem non-primitive recursive
- However, it is possible to have efficient analysis techniques
- Abstraction-based techniques:

```
e.g., [Kuperstein, Vechev, Yahav, PLDI'11]
```

Symbolic techniques:

```
[Abdulla, Atig, Chen, Leonardson, Rezine, TACAS'12]
[Linden, Wolper, SPIN'10'11]
```

# Robustness

```
[Bouajjani, M., Möhlmann, ICALP'11]
```

[Bouajjani, Derevenetc, M., ESOP'13]

Idea of robustness:

TSO behavior that deviates from SC is a programming error

Idea of robustness:

TSO behavior that deviates from SC is a programming error

What is the notion of behavior?

Idea of robustness:

TSO behavior that deviates from SC is a programming error

What is the notion of behavior?

#### **Trace Robustness:**

TSO- and SC-traces are the same [Shasha, Snir'88]

Idea of robustness:

TSO behavior that deviates from SC is a programming error

What is the notion of behavior?

#### **Trace Robustness:**

TSO- and SC-traces are the same [Shasha, Snir'88]

Good: Allows for quite relaxed behaviors

Idea of robustness:

TSO behavior that deviates from SC is a programming error

What is the notion of behavior?

#### **Trace Robustness:**

TSO- and SC-traces are the same [Shasha, Snir'88]

Good: Allows for quite relaxed behaviors

Very Good: Only PSPACE-complete

#### Computation = sequence of actions as seen by memory

| x = y = 0        |                                                   |  |  |
|------------------|---------------------------------------------------|--|--|
| Thread 1         | Thread 2                                          |  |  |
| a: x = 1         | p: y = 1<br>$q: if(x == 0)$ {<br>r: crit. sect. 2 |  |  |
| $b: if(y == 0){$ | $q: if(x == 0)\{$                                 |  |  |
| c: crit.sect.1   | r: crit.sect.2                                    |  |  |
| d: }             | s: }                                              |  |  |
|                  |                                                   |  |  |

|          | Mem   |
|----------|-------|
| Thread 1 | <br>X |
| pc = a   | <br>0 |
|          |       |
| Thread 2 | <br>У |
| pc = p   | 0     |
|          |       |

#### Computation = sequence of actions as seen by memory

| x = y = 0        |                                                   |  |  |
|------------------|---------------------------------------------------|--|--|
| Thread 1         | Thread 2                                          |  |  |
| a: x = 1         | p: y = 1<br>$q: if(x == 0)$ {<br>r: crit. sect. 2 |  |  |
| $b: if(y == 0){$ | $q: if(x == 0)\{$                                 |  |  |
| c: crit.sect.1   | r: crit.sect.2                                    |  |  |
| d: }             | s: }                                              |  |  |
|                  |                                                   |  |  |

|                   |        | Mem           |
|-------------------|--------|---------------|
| Thread 1 $pc = b$ | w(x,1) | <i>x</i><br>0 |
| Thread 2 $pc = p$ |        | у<br>0        |

#### Computation = sequence of actions as seen by memory

| x = y = 0         |                                                   |  |  |
|-------------------|---------------------------------------------------|--|--|
| Thread 1          | Thread 2                                          |  |  |
| a: x = 1          | p: y = 1<br>$q: if(x == 0)$ {<br>r: crit. sect. 2 |  |  |
| $b: if(y == 0)\{$ | $q: if(x == 0)\{$                                 |  |  |
| c: crit. sect. 1  | r: crit.sect.2                                    |  |  |
| d: }              | s: }                                              |  |  |
|                   |                                                   |  |  |

Thread 1 
$$pc = c$$
  $w(x,1)$   $x$   $0$ 

Thread 2  $y$   $0$ 
 $pc = p$ 

Mem

#### Computation = sequence of actions as seen by memory

| x = y = 0         |                                                   |  |  |
|-------------------|---------------------------------------------------|--|--|
| Thread 1          | Thread 2                                          |  |  |
| a: x = 1          | p: y = 1<br>$q: if(x == 0)$ {<br>r: crit. sect. 2 |  |  |
| $b: if(y == 0)\{$ | $q: if(x == 0)\{$                                 |  |  |
| c: crit. sect. 1  | r: crit.sect.2                                    |  |  |
| d: }              | s:}                                               |  |  |
|                   |                                                   |  |  |

|          |        | Mem           |
|----------|--------|---------------|
| Thread 1 | w(x,1) | <i>x</i><br>0 |
| pc = c   |        | O             |
| Thread 2 | w(y,1) | y             |
| pc = q   | (y, 1) | Ü             |

| x = y = 0         |                                                   |  |  |
|-------------------|---------------------------------------------------|--|--|
| Thread 1          | Thread 2                                          |  |  |
| a: x = 1          | p: y = 1<br>$q: if(x == 0)$ {<br>r: crit. sect. 2 |  |  |
| $b: if(y == 0)\{$ | $q: if(x == 0)\{$                                 |  |  |
| c: crit. sect. 1  | r: crit.sect.2                                    |  |  |
| d: }              | s: }                                              |  |  |
|                   |                                                   |  |  |

Thread 1
$$pc = c$$

$$w(x,1)$$

$$x$$

$$0$$
Thread 2
$$pc = q$$

$$1$$

$$r(y,0) \cdot w(y,1)$$

| x = y = 0                                             |                                                     |                   |         | Mem           |  |
|-------------------------------------------------------|-----------------------------------------------------|-------------------|---------|---------------|--|
| Thread 1                                              | Thread 2                                            | Thread 1 $pc = c$ | w(x, 1) | <i>x</i><br>0 |  |
| a: x = 1<br>b: if(y == 0){<br>c: crit. sect. 1<br>d:} | p: y = 1<br>q: if(x == 0){<br>r: crit.sect.2<br>s:} | Thread 2 $pc = r$ |         | у<br>1        |  |

$$r(y,0) \cdot w(y,1) \cdot r(x,0)$$

$$r(y,0) \cdot w(y,1) \cdot r(x,0) \cdot w(x,1)$$

$$r(y,0)\cdot w(y,1)\cdot r(x,0)\cdot w(x,1)$$

Traces abstract computations to happens before dependencies

$$\mathsf{Trace}(r(y,0)\cdot w(y,1)\cdot r(x,0)\cdot w(x,1))$$

Traces abstract computations to happens before dependencies

• Program order: Order of actions issued by a thread

Trace
$$(r(y,0) \cdot w(y,1) \cdot r(x,0) \cdot w(x,1))$$

$$w(x,1) \downarrow \qquad \qquad \downarrow w(y,1)$$

$$r(y,0) \downarrow \qquad \qquad \downarrow r(x,0)$$

#### Traces abstract computations to happens before dependencies

- Program order: Order of actions issued by a thread
- Store order: Order of writes to a variable

Trace
$$(r(y,0) \cdot w(y,1) \cdot r(x,0) \cdot w(x,1))$$

$$w(x,1) 
\downarrow r(y,0) 
\downarrow r(x,0)$$

#### Traces abstract computations to happens before dependencies

- Program order: Order of actions issued by a thread
- Store order: Order of writes to a variable
- Source relation: write is source of read.

Trace
$$(r(y,0) \cdot w(y,1) \cdot r(x,0) \cdot w(x,1))$$

$$\begin{array}{c} w(x,1) \\ r(y,0) \end{array} \qquad \begin{array}{c} w(y,1) \\ r(x,0) \end{array}$$

#### Traces abstract computations to happens before dependencies

- Program order: Order of actions issued by a thread
- Store order: Order of writes to a variable
- Source relation: write is source of read.
- Conflict relation: read is overwritten by write.

Trace
$$(r(y,0) \cdot w(y,1) \cdot r(x,0) \cdot w(x,1))$$

$$w(x,1) \qquad w(y,1)$$

$$r(y,0) \qquad r(x,0)$$

Consider a memory model MM

Trace Robustness Problem against MM

**Input**: Program P.

**Problem**: Does  $Traces_{MM}(P) \subseteq Traces_{SC}(P)$  hold?

Consider a memory model MM

Trace Robustness Problem against MM

**Input**: Program P.

**Problem**: Does  $Traces_{MM}(P) \subseteq Traces_{SC}(P)$  hold?

Decidability / Complexity ?

Consider a memory model MM

Trace Robustness Problem against MM

**Input**: Program P.

**Problem**: Does  $Traces_{MM}(P) \subseteq Traces_{SC}(P)$  hold?

Decidability / Complexity ?

Proof method

Theorem [Shasha, Snir 1988]

Program P is robust against MM iff all traces in Traces<sub>MM</sub>(P) are acyclic.

Consider a memory model MM

Trace Robustness Problem against MM

**Input**: Program P.

**Problem**: Does  $Traces_{MM}(P) \subseteq Traces_{SC}(P)$  hold?

Decidability / Complexity ?

Proof method

Theorem [Shasha, Snir 1988]

Program P is robust against MM iff all traces in Traces<sub>MM</sub>(P) are acyclic.

Shasha and Snir do not give an algorithm to find cyclic traces!

# Robustness

```
[Bouajjani, M., Möhlmann, ICALP'11]
[Bouajjani, Derevenetc, M., ESOP'13]
```

#### Upper Bound:

Combinatorics

From Robustness to SC Reachability

### **Deciding Robustness**



### **Deciding Robustness**



Understand shape of minimal violations

### **Deciding Robustness**



Understand shape of minimal violations

Check whether computation of this shape exists

# Robustness

```
[Bouajjani, M., Möhlmann, ICALP'11]
[Bouajjani, Derevenetc, M., ESOP'13]
```

#### Upper Bound:

Combinatorics — Locality and Attacks

From Robustness to SC Reachability

#### Goal: Locality

We can restrict ourselves to violations where only one thread reorders its actions.

Proof tool: Minimal violations

Number of inversions (out-of-program-order placements) minimal among all violating computations

Consider minimal violation  $\alpha \cdot \mathbf{b} \cdot \beta \cdot \mathbf{a} \cdot \gamma$  where  $\mathbf{b}$  has overtaken  $\mathbf{a}$ 

Consider minimal violation  $\alpha \cdot b \cdot \beta \cdot a \cdot \gamma$  where b has overtaken a. Then b and a have happens before path through  $\beta$ 

Consider minimal violation  $\alpha \cdot \mathbf{b} \cdot \beta \cdot \mathbf{a} \cdot \gamma$  where  $\mathbf{b}$  has overtaken  $\mathbf{a}$ 

Then b and a have happens before path through  $\beta$ 

Subword  $b_1 \dots b_k$  with

$$b_i \rightarrow_{src/st/cf} b_{i+1}$$
 or  $b_i \rightarrow_{p}^{+} b_{i+1}$ 

$$b_i \rightarrow_p^+ b_{i+}$$

Consider minimal violation  $\alpha \cdot \mathbf{b} \cdot \beta \cdot \mathbf{a} \cdot \gamma$  where  $\mathbf{b}$  has overtaken  $\mathbf{a}$ 

Then b and a have happens before path through  $\beta$ 

Subword  $b_1 \dots b_k$  with

$$b_i \rightarrow_{src/st/cf} b_{i+1}$$
 or  $b_i \rightarrow_p^+ b_{i+1}$ 

$$r(y,0)$$
 $w(y,1)$ 
 $r(x,0)$ 

$$\underbrace{r(y,0)\cdot w(y,1)\cdot r(x,0)\cdot w(x,1)}_{\rightarrow_{hb}}$$

### Theorem (Locality) [BMM 2011]

In a minimal violation, only a single thread uses its buffer.

#### Theorem (Locality) [BMM 2011]

In a minimal violation, only a single thread uses its buffer.

#### Proof sketch

Pick last writes that are overtaken in two threads  $t_i$  and  $t_j$ :

### Theorem (Locality) [BMM 2011]

In a minimal violation, only a single thread uses its buffer.

#### Proof sketch

Pick last writes that are overtaken in two threads  $t_i$  and  $t_j$ :

Case 1: No interference



### Theorem (Locality) [BMM 2011]

In a minimal violation, only a single thread uses its buffer.

#### Proof sketch

Pick last writes that are overtaken in two threads  $t_i$  and  $t_j$ :

Case 1: No interference

$$r_j \stackrel{\checkmark}{\longleftarrow} w_j - r_i \stackrel{\checkmark}{\longleftarrow} w_i - r_i \stackrel{}{\longleftarrow} w_i - r_i - r_i \stackrel{}{\longleftarrow} w_i - r_i - r_i$$

Lemma: happens before cycle  $r_j \rightarrow_{hb}^+ w_j \rightarrow_p^+ r_j$ 

### Theorem (Locality) [BMM 2011]

In a minimal violation, only a single thread uses its buffer.

#### Proof sketch

Pick last writes that are overtaken in two threads  $t_i$  and  $t_j$ :

Case 1: No interference

$$r_j \sim w_j \sim v_i \sim w_i$$

Lemma: happens before cycle  $r_j \rightarrow_{hb}^+ w_j \rightarrow_p^+ r_j$ Read  $r_i$  not involved, delete everything from  $r_i$  on

### Theorem (Locality) [BMM 2011]

In a minimal violation, only a single thread uses its buffer.

#### Proof sketch

Pick last writes that are overtaken in two threads  $t_i$  and  $t_j$ :

Case 1: No interference

Lemma: happens before cycle  $r_j \rightarrow_{hb}^+ w_j \rightarrow_p^+ r_j$ Read  $r_i$  not involved, delete everything from  $r_i$  on Saves a reordering, contradiction to minimality

#### Theorem (Locality) [BMM 2011]

In a minimal violation, only a single thread uses its buffer.

#### Proof sketch

Pick last writes that are overtaken in two threads  $t_i$  and  $t_j$ :

Case 2: Overlap



### Theorem (Locality) [BMM 2011]

In a minimal violation, only a single thread uses its buffer.

#### Proof sketch

Pick last writes that are overtaken in two threads  $t_i$  and  $t_j$ : Case 2: Overlap



Argumentation similar, delete again ri

### Theorem (Locality) [BMM 2011]

In a minimal violation, only a single thread uses its buffer.

#### Proof sketch

Pick last writes that are overtaken in two threads  $t_i$  and  $t_j$ :

Case 3: Interference



#### Theorem (Locality) [BMM 2011]

In a minimal violation, only a single thread uses its buffer.

#### Proof sketch

Pick last writes that are overtaken in two threads  $t_i$  and  $t_j$ :

Case 3: Interference



Lemma: happens before cycle  $r_j \rightarrow_{hb}^+ w_j \rightarrow_p^+ r_j$ 

#### Theorem (Locality) [BMM 2011]

In a minimal violation, only a single thread uses its buffer.

#### Proof sketch

Pick last writes that are overtaken in two threads  $t_i$  and  $t_j$ :

Case 3: Interference



Lemma: happens before cycle  $r_j \rightarrow^+_{hb} w_j \rightarrow^+_p r_j$ Only thread  $t_i$  may contribute, delete rest

#### Theorem (Locality) [BMM 2011]

In a minimal violation, only a single thread uses its buffer.

#### Proof sketch

Pick last writes that are overtaken in two threads  $t_i$  and  $t_j$ : Case 3: Interference



Lemma: happens before cycle  $r_j \rightarrow_{hb}^+ w_j \rightarrow_p^+ r_j$ Only thread  $t_i$  may contribute, delete rest Lemma: happens before cycle  $r_i \rightarrow_{hb}^+ w_i \rightarrow_p^+ r_i$ 

#### Theorem (Locality) [BMM 2011]

In a minimal violation, only a single thread uses its buffer.

#### Proof sketch

Pick last writes that are overtaken in two threads  $t_i$  and  $t_j$ : Case 3: Interference



Lemma: happens before cycle  $r_j \rightarrow_{hb}^+ w_j \rightarrow_p^+ r_j$ Only thread  $t_i$  may contribute, delete rest Lemma: happens before cycle  $r_i \rightarrow_{hb}^+ w_i \rightarrow_p^+ r_i$ Read  $r_i$  not on this cycle, delete it, contradiction

Reformulate Robustness

absence of feasible attacks

Reformulate Robustness

absence of feasible attacks

If P is not robust, there are these violation:



Reformulate Robustness

absence of feasible attacks

If P is not robust, there are these violation:



Attacker The thread that uses its buffer: only one by locality

May 2014

Reformulate Robustness

absence of feasible attacks

If P is not robust, there are these violation:



Attacker The thread that uses its buffer: only one by locality

Helpers Remaining threads close cycle:  $\mathbf{r} \rightarrow^+_{hb} \mathbf{w} \mathbf{w} \rightarrow^+_{p} \mathbf{r}$ 

Reformulate Robustness

absence of feasible attacks

If P is not robust, there are these violation:



Attacker The thread that uses its buffer: only one by locality

Helpers Remaining threads close cycle:  $\mathbf{r} \rightarrow^+_{hb} \mathbf{w} \mathbf{w} \rightarrow^+_{p} \mathbf{r}$ 

$$\underbrace{r(y,0)\cdot w(y,1)\cdot r(x,0)\cdot w(x,1)}_{\rightarrow_{hb}}$$

- Fix thread, write instruction, read instruction
- Given these parameters, find a violation as above

- Fix thread, write instruction, read instruction
- Given these parameters, find a violation as above

#### Attack

- An attack is a triple A = (thread, write, read)
- A TSO witness for attack A is a computation as above:



- Fix thread, write instruction, read instruction
- Given these parameters, find a violation as above

#### Attack

- An attack is a triple A = (thread, write, read)
- A TSO witness for attack A is a computation as above:



## Theorem [BDM'13]

Program P is robust if and only if no attack has a TSO witness.

- Fix thread, write instruction, read instruction
- Given these parameters, find a violation as above

#### Attack

- An attack is a triple A = (thread, write, read)
- A TSO witness for attack A is a computation as above:



## Theorem [BDM'13]

Program P is robust if and only if no attack has a TSO witness.

The number of attacks is quadratic in the size of P.

# Robustness

```
[Bouajjani, M., Möhlmann, ICALP'11]
[Bouajjani, Derevenetc, M., ESOP'13]
```

#### Upper Bound:

Combinatorics

From Robustness to SC Reachability

TSO witnesses for attack A considerably restrict TSO behavior,

TSO witnesses for attack A considerably restrict TSO behavior, enough to find TSO witnesses with SC reachability

TSO witnesses for attack A considerably restrict TSO behavior, enough to find TSO witnesses with SC reachability



Let attacker execute under SC

TSO witnesses for attack A considerably restrict TSO behavior, enough to find TSO witnesses with SC reachability



Let attacker execute under SC

$$\frac{}{\alpha}$$
  $\mathbf{w} \cdot \mathbf{r} \frac{}{\rho \sqcup \omega} \mathbf{r} \frac{}{\beta}$ 

TSO witnesses for attack A considerably restrict TSO behavior, enough to find TSO witnesses with SC reachability



Let attacker execute under SC

Problem Writes may conflict with helper reads



TSO witnesses for attack A considerably restrict TSO behavior, enough to find TSO witnesses with SC reachability



Let attacker execute under SC

Problem Writes may conflict with helper reads

Solution Hide them from other threads

$$\frac{}{\alpha}$$
  $w_{loc} \cdot r \frac{}{\rho \sqcup \omega_{loc}} r \frac{}{\beta}$ 

May 2014

TSO witnesses for attack A considerably restrict TSO behavior, enough to find TSO witnesses with SC reachability



Let attacker execute under SC

Problem Writes may conflict with helper reads

Solution Hide them from other threads

$$\frac{}{\alpha}$$
  $w_{loc} \cdot r \frac{}{\rho \sqcup \omega_{loc}} r \frac{}{\beta}$ 

#### Theorem [BDM'13]

Attack A has a TSO witness iff  $P_A$  reaches goal state under SC.

May 2014

#### Trace Robustness: Conclusion

- Decidable for TSO (and beyond)
- Is an easy problem PSPACE-complete
- Locality: only one thread uses the buffer
- Analysis parallelizable
- Monitoring techniques:

```
e.g., [Burckhardt, Musuvathi CAV'08, Sen et al. TACAS'11]
```

Static analysis:

```
[Shasha Snir TOPLAS'88, Alglave, Maranget CAV'11]
```

Semantics:

[Owens ECOOP'10]

# Synchronization Inference

[Bouajjani, Derevenetc, M., ESOP'13]

Synchronization instructions enforce visibility of commands

Synchronization instructions enforce visibility of commands Called fences (TSO), syncs, or barriers depending on architecture

Synchronization instructions enforce visibility of commands Called fences (TSO), syncs, or barriers depending on architecture

Consider a weak memory model MM

Synchronization Inference Problem for MM

**Input**: Program P and cost function  $C : \mathsf{LAB} \to \mathbb{R}_{>0}$ .

**Problem**: Find an optimal set of synchronization instructions F so that P + F is robust.

Synchronization instructions enforce visibility of commands Called fences (TSO), syncs, or barriers depending on architecture

Consider a weak memory model MM

Synchronization Inference Problem for MM

**Input**: Program P and cost function  $C : LAB \to \mathbb{R}_{>0}$ .

**Problem**: Find an optimal set of synchronization instructions F so that P + F is robust.

Focus on TSO (fences)

#### Efficiency

Fencing every write yields a robust program:

Ruins all performance benefits brought by the memory model

#### Efficiency

Fencing every write yields a robust program:

Ruins all performance benefits brought by the memory model

Solve the problem wrt. a cost function

Minimize program size or maximize program performance

## Efficiency

Fencing every write yields a robust program:

Ruins all performance benefits brought by the memory model

Solve the problem wrt. a cost function

Minimize program size or maximize program performance

Decidability / Complexity / Computation

PSPACE-complete by enumeration

## Efficiency

Fencing every write yields a robust program:

Ruins all performance benefits brought by the memory model

Solve the problem wrt. a cost function

Minimize program size or maximize program performance

## Decidability / Complexity / Computation

PSPACE-complete by enumeration

Compute an optimal fence set

**Phase 1:** Compute candidate fence sets.

Phase 2: Select fence sets via integer linear programming (ILP).

**Insight 1:** Optimal fence sets are irreducible.

**Insight 1:** Optimal fence sets are irreducible.

**Insight 2:** Every irreducible fence set is a union of irreducible fence sets for all feasible attacks.

**Insight 1:** Optimal fence sets are irreducible.

**Insight 2:** Every irreducible fence set is a union of irreducible fence sets for all feasible attacks.

**Consequence:** For each attack, compute the irreducible fence sets that eliminate it.

**Insight 1:** Optimal fence sets are irreducible.

**Insight 2:** Every irreducible fence set is a union of irreducible fence sets for all feasible attacks.

**Consequence:** For each attack, compute the irreducible fence sets that eliminate it.

This computation relies on robustness.

Assume attack A has candidate fence sets  $\mathcal{F}_1, \ldots \mathcal{F}_n$ . Select one:

$$\sum_{1 \leq i \leq n} x_{\mathcal{F}_i} \geq 1.$$

Assume attack A has candidate fence sets  $\mathcal{F}_1, \dots \mathcal{F}_n$ . Select one:

$$\sum_{1 \leq i \leq n} x_{\mathcal{F}_i} \geq 1.$$

Make sure every location in  $\mathcal{F}$  is fenced:

$$\sum_{I\in\mathcal{F}}x_I\geq |\mathcal{F}|x_{\mathcal{F}}.$$

Assume attack A has candidate fence sets  $\mathcal{F}_1, \dots \mathcal{F}_n$ . Select one:

$$\sum_{1 \leq i \leq n} x_{\mathcal{F}_i} \geq 1.$$

Make sure every location in  $\mathcal{F}$  is fenced:

$$\sum_{I\in\mathcal{F}}x_I\geq |\mathcal{F}|x_{\mathcal{F}}.$$

Optimize the selected locations:

$$\sum_{I\in\mathsf{LAB}}\mathcal{C}(x_I) o\mathsf{min}\,.$$

Assume attack A has candidate fence sets  $\mathcal{F}_1, \dots \mathcal{F}_n$ . Select one:

$$\sum_{1 \leq i \leq n} x_{\mathcal{F}_i} \geq 1.$$

Make sure every location in  $\mathcal{F}$  is fenced:

$$\sum_{I\in\mathcal{F}}x_I\geq |\mathcal{F}|x_{\mathcal{F}}.$$

Optimize the selected locations:

$$\sum_{I\in\mathsf{LAB}}\mathcal{C}(\mathsf{x}_I) o\mathsf{min}\,.$$

Let  $x^*$  be an optimal solution to the ILP problem.

#### **Theorem**

Fence set  $\mathcal{F}(x^*)$  solves the synchronization inference problem.