| TSO Reachability 1 | 2  |
|--------------------|----|
| TSO Reachability 2 | 12 |

8. TSO Readability Goal: Solve readesility for posulled programs (assembly) running on x86 Total Store Ordering (TSO) architectures Exproved: Recluce to LCS readability 8.1 Syntax of Puellel Programs We conside posdlet programs with should memory Definition: as defined by Sollowing gramma: (prog) := program (pid) (Mrd) " Wame & Smith number of threads (three) :: Mread (tid) 111dent 10 reys (rey) 1/ Lat of local registers init clasely 11 mitial instruction begin Llins 1> and 1/Lit of. estelled instructions clinity :== (label); Linst); goto clasely; 11 Load Linsty := <reg> = mem [<rey>] 11 Store 1 mom [(rey)] = (rey) 11 Henory Jena 1 monce 11 Local assignment 1 Lrey > = Lexpr> 1 asset Lexpr> lexpr > := fun ( krey > ") · Ul assume programs come with a finite data domain DOM and a Junction domain FUN that we defined on DOM. · all compresse. · We assume . O E DOM

TSO Reachability 1. All values in Doll can be used as addresses.

8.2 150 - Semanties of Paullel Programs: Idea of TIO: . Each Bread has a store befor to reduce lakency of memory accesses · Stores are suffered locally and law propugated to the memory in butch processing style. · To provide the Musion that a sequental program runs on a cohecul memory, beford stores are forwarded to later boards => early reads Example: Dehher's mukx li: monty ] = 1; goto le ly: mem [x] = 1; goto le le: r' men [x]; goto la list = mem [y]; goto ls l3: anut (1=0); goto less l3: assent (r=0); golo less lesz: critical section less: critical section  $|X=1|_{Y=0} |_{Y=1} |_{X=0} |_{X=0} |_{X=1} |_{X=0} |_{X=0} |_{X=1} |_{X=0} |_{X=1} |_{X=0} |_{X=1} |_{X=0} |_{X=1} |_{X=0} |_{X=1} |_{X=0} |_{X=0} |_{X=1} |_{X=0} |_{X=1} |_{X=0} |_{X=0} |_{X=1} |_{X=0} |_{X=0}$ Fonces instructions (mores) stop a thread until all preceding writes have wrived in memory. For the definition, fix a program Pwik Bread Ita..., tas. Assume to has identifice in inital lasel lo,i, and declares registers Ti. De use TIO := 5% ..., no for the thread relentifics. Ju the locations in all threads, and LAB VITIR:= DOM U OF For for the addresses and registers.

The ITO simanhes is operational, in toms of hansitions between configurations. The sel of 130-configurations is CF:2 LAB TID & DOM WAR (DOM X DOM) . We wish a configuration as of = (pc, val, buf), where pe(1) is the program could of threal +. · val(r) is the valuation of register, and · buf(1) is the buffer content for thread to a requerer of address-volve pairs. The initial configuration is cfo:= (peo, volo, bufo) with pro(t)= lost for all teTID valo(x)=0 for all x & VAR bufo(t)= E for all t & TID. The ISO-honsilin relation ->150 = CFxCF is the smollest relation that substres the Jollowing rules, whore we assume pc(t)=l, an instruction li Linst); golo l' and whoe we set pe': = pelt = lJ: (inst) = r = mon [r'], a = val(r') by(1)1(a=\*) = (a=v).B (pc, vd, buf) -> TIO (pc', val[r:= v], buf) lins/>= re mem [r'], a = vol(r'), v = vol(a) buf(1) L(a=+) = E TSO Reachability pc, vol, buf) -> TSO (pc', vol [:= v], buf)

(STORE)  $\frac{2\pi s/2 = mon E \cdot J \leftarrow r', \ \alpha = val(r), \ v = val(r')}{(pc, val, buf) \longrightarrow_{TSO} (pc', val, buf[d:=(a=v).buf[d])}$   $(DPONTE) \qquad buf(d) = \beta. (a=v)$   $(pc, val, buf) \longrightarrow_{TSO} (pc, val[a:=v], buf[d:=\beta])$   $(FENCE) \qquad (pc, val, buf) \longrightarrow_{TSO} (pc', val, buf)$  + 2 rules for assignment and asserts.

8.3 From 150 to LCS Goal: Represent T30 semanties of program P by LCS Lp. Proproad: Defice LCS Lp, Lp Lp, Lp will more features. that fix problems in ewliv vosions. Toward Lp: Idea: · Shoved memory communication is like message loss: es a slace may get over ithen before it is received by cnother thread. · Underland TSO buffers as lossy clamels Problem: . Higman's ordoing is no smulaha relation for 150. 4 Consider lo: ld(x, 1); || ló: sl(y, 1); ln: ld(y, 0); || li: st(x, 1); l2 where ld(x,1) is a shahul for ramen [x]; assert(r=1)4) Then configuration (lo, l', x=0=y, E, x=1.y=1) Cannol read le, la VI whereas (la, l2', x=0=y, E, x=1) can. Lossiness gives inconsider memory configurations. Fix: Let threads send entire memory snapshots La Each message in a buffer defines values of all variables.

$$(l_0, l_2', x=0=y, \mathcal{E}, (x=1)(x=0))$$
  
and  
 $(l_0, l_2', x=0=y, \mathcal{E}, (x=1))$  we incompassle.

Toward Lp:

Proslem: · Some behaviour unde Lp is not possible undo 150.

Example:

lo: st(y,0); lo': st(x,1)
la: la': ld(x,0)

Lo Then (la, la') is not To-read Ste. TSO can only load I from X after store has been performed.

Li The casignatia is readelle in Lp.

(lo, lo', x=0=y, E, E)

 $-5(l_1, l_1', x=0=y, (x=0), (x=1))$ 

-> (ln, l', x=1, (x=0), E) 4=0 (y=0), E)

-> (la, l', x=0=y, E, E)

-> (ln, l2', x=0=y, E, E).

Thread do not synchronize on memory woodals (thread may use value that art no longer in memory).

Fix: Let all Kread shore the same befor.

 $\begin{pmatrix} x=1\\ y=0 \end{pmatrix}$ .  $\begin{pmatrix} x=1\\ y=0 \end{pmatrix}$  Store y=0 teches x=1 into account.

TSO Reachability 1

Toward Lp:

Problem: Some To belavious is not possible in Lp?.

Example:

St(x,1);

St(x,2);

St(x,2);

Coloring reason:

St(y,2) has be sent to memory before st(y,2);

to This belowing reason:

St(y,2) has be sent to memory before st(y,2);

to Thoware value of x will be (and stay) 2.

Then ld(x,1) is blocked.

Then ld(x,1) is blocked.

· In Lp, openins in he buffer will be delivered to the memory in the order in which key entered the buffer.

4) Then throught 2

13 not oble to load y=2 before y=1.

Le forces memory updates to occur
in the same order as the order of the corresponding stores.

issue issue mon str sta str

In 180, He memory updates can be presounced in opposite order, as the stores stem from different buffers.

Fix: Hold to each Bread a point to to a position minde the before to position minde the before to post memory but = Inture memory management states for to memory to that of the best of t

W.K Kis, updates ar smulated by moving he pointer to the left.  $\begin{pmatrix} x=2\\ y=2 \end{pmatrix} \begin{pmatrix} x=2\\ y=2 \end{pmatrix} \begin{pmatrix} x=2\\ y=0 \end{pmatrix} \begin{pmatrix} x=1\\ y=0 \end{pmatrix} \begin{pmatrix} x=0\\ y=0 \end{pmatrix}$ ty to the Why & Kis correct? · TSO can be undestood as The automata synchronize when updates leave the clamels. T2 \_\_\_\_\_/M2 Li I'm update is puformed on the and Mr. · Since the channels are FIFO, Ris & equivalet will Hoe, the channel have the same ontal (one channel) Tz \_\_\_\_\_\_[1/2] bed move to the memory at different speed. · Sice the chand contact & idulated, this is equished with This is Lp. Toward Lp,

Proslem: In Lp, early read ove still missing Remembe the last with of a thread to an address.

Construction: Given a program P. we define the LCS with strong symbols Lp = (Q, go, C, M, S, ->) strong symbols = connel be lost, Q := LOCTID & DOM VITR but bounded. C = 2 1 buff M:= DOM DOM 5:2 (DOM DOM x (TID x DOM J {E}) x P(TID)) (DOM x 3ES xt&f) In a configuration (pc, val) pc = current program countres vad = valuation of local registus, copy of last memory valuation that was sent to the before To define the hansitions, consider (pe, ral) = Q with pe(1) = l. If l: mem [r] = r'; goto l' and val(,) = a, val(r') = v, then we have the Jollowing LCS transition: (pc, val) buf { (mval, E, tidset) / (mval, (s,a), tidset) } buf ? (val[a:=v], (a+), 0), (pc[1:=1'], val[a:=v]).

TSO Reachability 1

restricted to DOM

1 l: r = man [r']; goto l' and val(r') = a, there are two hansitions: (pc, val) assert (mrd, (v, a), tidsed) & buf , (pc[t:=l'], val[r:=modlu)], 11 early read (pc, vol) assert (mval', (+, a), tridset') & but > [pe[+:= l'], vol[r:= mvol(a)]
and (mvd, \*, 516 v tridset) e but l: mfence; goto l', Kon (pc, vel) arsert (mvel, +, the hidset) = head(by), (pc[1:=1'], vel). (pc, val) assert but = w1. m1. m2. we meetily but = w1. mi mi we pe, vd. Update: m= (mvalz, losts, holselz) where m2 = (mvolz, losts, hidsels u (+() mi = (muds, lusts) { (+, +) }, trobels v {+() mi = (mudz, lostz, kidsetz). Theorem ( Phig, Bonajsani, Buchwell, Musuvaki 2010'12 114): If control-slate pc is reachable when Pis executed under TSO Considu a program P. iff pe is reachaste in Lp. Since the latte problem is decidable (Abdulla 1996), control-clah reachesility is decideble for TSO.

Inherition to the construction from lost time:

The construction makes beary use

of the following times in automate theory:

to shuffle two languages L(M1) and L(M2)

ove christian alphabets En n Ez = Ø,

extend M1 by loops Ez on each state,

and Tz by loops Q.

The resulting automate M1' and The subject

L(M1) n L(M2') = L(M1) W L(M2).

Example:

{ a 5 { w | xy2 }

= 2( 2 a 2 b 2 ) n 2( 2 x 2 y 2 z 8).

1.) The memory sees a shaffle of the stores from both threads:

For example, the memory may see S= sl(a, 1, tn). sl(b, 2, t2). sl(a, 2, tn).



Here, one really defines alphabels ( $\Sigma_i$  consists of sloves stla,  $v_i t_i$ ), and applies the loop back.

3.) Since the channels are FIFO,

the stores leave the buffer in the order

in which they we put into the buffer.

So instead of equening the buffered stores of 12

at the memory,

thread of suenes the stores of the

already when insuling commands into the buffer:



4.) Now the content of both before is identical, up to the fact that the and the process the buffer at different speed:

and (c/0/n/+ ///)

to Icloinidiein My

can be understood as one buffer with two pointes: to

Remark: One may think that it should be sufficient to guen the sequence of namony updates S= Sl(a, 1, 4, ). sl(b, 2, 4, ). sd(a, 2, 4, ) and write this should sequence into both befor: This, however, yield a shiet unde-approximation of 150. 130 requires that sequence & results from store actions leaving the buffer. The Seve model additionally enforces this order

when the stores are sond to the buffer.

4) This is more than TSO ashs for.

Is That the two memories can process the should before at different speech Joses the problem.