# **PipeProof:**

Automated Memory Consistency Proofs for Microarchitectural Specifications

## Yatin A. Manerkar, Daniel Lustig\*, Margaret Martonosi, and Aarti Gupta

Princeton University

\*NVIDIA

MICRO-51



http://check.cs.princeton.edu/

- Specify rules governing values returned by loads in parallel programs
- MCM must be correctly implemented for <u>all possible programs</u>



Microarchitecture



- Specify rules governing values returned by loads in parallel programs
- MCM must be correctly implemented for <u>all possible programs</u>





- Specify rules governing values returned by loads in parallel programs
- MCM must be correctly implemented for <u>all possible programs</u>



- Specify rules governing values returned by loads in parallel programs
- MCM must be correctly implemented for <u>all possible programs</u>



- Specify rules governing values returned by loads in parallel programs
- MCM must be correctly implemented for <u>all possible programs</u>















[Images: HeeWann Kim, tzblacktd, audino]

Forest goes on forever (infinite number of possible programs)











[Images: HeeWann Kim, tzblacktd, audino]

.





Can check known hideouts (verify design for test programs)

1 CY



+∞



10





Are Pokemon lurking in unexplored areas? (Do tested programs provide complete coverage?)

[Images: HeeWann Kim, tzblacktd, audino]





Have we caught all the Pokemon? (Are there any MCM bugs left in the design?)

[Images: HeeWann Kim, tzblacktd, audino]

## **PipeProof Overview**

### First automated all-program microarchitectural MCM verification!

- Covers all possible addresses, values, numbers of cores
- Proof methodology based on automatic abstraction refinement
- Early-stage: Can be conducted before RTL is written!





## Outline

### Background

- ISA-level MCM specs
- Microarchitectural ordering specs
- Microarchitectural Correctness Proof
  - Transitive Chain (TC) Abstraction
- Overall PipeProof Operation
  - TC Abstraction Support Proof
  - Chain Invariants
- Results



- Defined in terms of relational patterns [Alglave et al. TOPLAS 2014]
- ISA-level executions are graphs
  - Nodes: instructions, edges: ISA-level relations between instrs
- Correctness based on acyclicity, irreflexivity, etc of relational patterns
  - Eg: SC is  $acyclic(po \cup co \cup rf \cup fr)$

| Core 0                  | Core 1                      |  |
|-------------------------|-----------------------------|--|
| $(i1) [x] \leftarrow 1$ | (i3) $r1 \leftarrow [y]$    |  |
| $(i2) [y] \leftarrow 1$ | (i4) $r2 \leftarrow [x]$    |  |
| Under SC: For           | Under SC: Forbid r1=1, r2=0 |  |





- Defined in terms of relational patterns [Alglave et al. TOPLAS 2014]
- ISA-level executions are graphs
  - Nodes: instructions, edges: ISA-level relations between instrs
- Correctness based on acyclicity, irreflexivity, etc of relational patterns
  - Eg: SC is *acyclic(po U co U rf U fr)*

| Core 0                      | Core 1                   |
|-----------------------------|--------------------------|
| $(i1) [x] \leftarrow 1$     | (i3) $r1 \leftarrow [y]$ |
| (i2) [y] ← 1                | (i4) r2 $\leftarrow$ [x] |
| Under SC: Forbid r1=1, r2=0 |                          |





- Defined in terms of relational patterns [Alglave et al. TOPLAS 2014]
- ISA-level executions are graphs
  - Nodes: instructions, edges: ISA-level relations between instrs
- Correctness based on acyclicity, irreflexivity, etc of relational patterns
  - Eg: SC is *acyclic(po U co U rf U fr)*

| Core 0                  | Core 1                      |  |
|-------------------------|-----------------------------|--|
| $(i1) [x] \leftarrow 1$ | (i3) $r1 \leftarrow [y]$    |  |
| $(i2) [y] \leftarrow 1$ | (i4) $r2 \leftarrow [x]$    |  |
| Under SC: For           | Under SC: Forbid r1=1, r2=0 |  |





- Defined in terms of relational patterns [Alglave et al. TOPLAS 2014]
- ISA-level executions are graphs
  - Nodes: instructions, edges: ISA-level relations between instrs
- Correctness based on acyclicity, irreflexivity, etc of relational patterns
  - Eg: SC is  $acyclic(po \cup co \cup rf \cup fr)$

| Core 0                      | Core 1                   |
|-----------------------------|--------------------------|
| $(i1) [x] \leftarrow 1$     | (i3) $r1 \leftarrow [y]$ |
| (i2) [y] ← 1                | (i4) r2 $\leftarrow$ [x] |
| Under SC: Forbid r1=1, r2=0 |                          |





- Defined in terms of relational patterns [Alglave et al. TOPLAS 2014]
- ISA-level executions are graphs
  - Nodes: instructions, edges: ISA-level relations between instrs
- Correctness based on acyclicity, irreflexivity, etc of relational patterns

Eg: SC is acyclic(po U co U rf U fr)

| Core 0                      | Core 1                   |
|-----------------------------|--------------------------|
| $(i1) [x] \leftarrow 1$     | (i3) $r1 \leftarrow [y]$ |
| (i2) [y] ← 1                | (i4) r2 $\leftarrow$ [x] |
| Under SC: Forbid r1=1, r2=0 |                          |





Used to generate microarchitectural executions as µhb graphs

- Nodes: instr. sub-events, edges: happens-before relations between instrs
- Observability based on cyclicity of graphs
  - Cyclic graph  $\rightarrow$  Unobservable
  - Acyclic graph  $\rightarrow$  Observable

Message passing (mp) litmus test

| Core 0                      | Core 1                   |
|-----------------------------|--------------------------|
| $(i1) [x] \leftarrow 1$     | (i3) r1 $\leftarrow$ [y] |
| (i2) [y] ← 1                | (i4) r2 $\leftarrow$ [x] |
| Under SC: Forbid r1=1, r2=0 |                          |



Used to generate microarchitectural executions as µhb graphs

- Nodes: instr. sub-events, edges: happens-before relations between instrs
- Observability based on cyclicity of graphs
  - Cyclic graph  $\rightarrow$  Unobservable
  - Acyclic graph  $\rightarrow$  Observable

Message passing (mp) litmus test

| Core 0                      | Core 1                   |
|-----------------------------|--------------------------|
| $(i1) [x] \leftarrow 1$     | (i3) r1 $\leftarrow$ [y] |
| (i2) [y] ← 1                | (i4) r2 $\leftarrow$ [x] |
| Under SC: Forbid r1=1, r2=0 |                          |





Used to generate microarchitectural executions as µhb graphs

- Nodes: instr. sub-events, edges: happens-before relations between instrs
- Observability based on cyclicity of graphs
  - Cyclic graph  $\rightarrow$  Unobservable
  - Acyclic graph  $\rightarrow$  Observable

Message passing (mp) litmus test

| Core 0                      | Core 1                   |
|-----------------------------|--------------------------|
| $(i1) [x] \leftarrow 1$     | (i3) $r1 \leftarrow [y]$ |
| (i2) [y] ← 1                | (i4) r2 $\leftarrow$ [x] |
| Under SC: Forbid r1=1, r2=0 |                          |



Used to generate microarchitectural executions as µhb graphs

- Nodes: instr. sub-events, edges: happens-before relations between instrs
- Observability based on cyclicity of graphs
  - Cyclic graph  $\rightarrow$  Unobservable
  - Acyclic graph  $\rightarrow$  Observable

Message passing (mp) litmus test

| Core 0                      | Core 1                   |
|-----------------------------|--------------------------|
| $(i1) [x] \leftarrow 1$     | (i3) r1 $\leftarrow$ [y] |
| (i2) [y] ← 1                | (i4) r2 $\leftarrow$ [x] |
| Under SC: Forbid r1=1, r2=0 |                          |



Used to generate microarchitectural executions as µhb graphs

- Nodes: instr. sub-events, edges: happens-before relations between instrs
- Observability based on cyclicity of graphs
  - Cyclic graph  $\rightarrow$  Unobservable
  - Acyclic graph  $\rightarrow$  Observable

Message passing (mp) litmus test

| Core 0                      | Core 1                   |
|-----------------------------|--------------------------|
| $(i1) [x] \leftarrow 1$     | (i3) $r1 \leftarrow [y]$ |
| (i2) [y] ← 1                | (i4) r2 $\leftarrow$ [x] |
| Under SC: Forbid r1=1, r2=0 |                          |



Microarchitecture in µspec DSL

```
Axiom "Decode_is_FIFO":
... EdgeExists ((i1, Decode), (i2, Decode))
=> AddEdge ((i1, Execute), (i2, Execute)).
...
Axiom "P0_Fetch":
... SameCore i1 i2 /\ ProgramOrder i1 i2 =>
AddEdge ((i1, Fetch), (i2, Fetch)).
```



Litmus Test

| Core 0                      | Core 1                   |
|-----------------------------|--------------------------|
| $(i1) [x] \leftarrow 1$     | (i3) r1 $\leftarrow$ [y] |
| (i2) [y] ← 1                | (i4) r2 $\leftarrow$ [x] |
| Under SC: Forbid r1=1, r2=0 |                          |

Microarchitecture in µspec DSL

```
Axiom "Decode_is_FIFO":
... EdgeExists ((i1, Decode), (i2, Decode))
=> AddEdge ((i1, Execute), (i2, Execute)).
...
Axiom "PO_Fetch":
... SameCore i1 i2 /\ ProgramOrder i1 i2 =>
```

AddEdge ((i1, Fetch), (i2, Fetch)).



| Core 0                      | Core 1                   |
|-----------------------------|--------------------------|
| $(i1) [x] \leftarrow 1$     | (i3) $r1 \leftarrow [y]$ |
| (i2) [y] ← 1                | (i4) r2 $\leftarrow$ [x] |
| Under SC: Forbid r1=1, r2=0 |                          |



Microarchitectural happens-before (µhb) graphs

Microarchitecture in µspec DSL

```
Axiom "Decode_is_FIFO":
... EdgeExists ((i1, Decode), (i2, Decode))
=> AddEdge ((i1, Execute), (i2, Execute)).
```

```
Axiom "PO_Fetch":
```

... SameCore i1 i2 /\ ProgramOrder i1 i2 =>
 AddEdge ((i1, Fetch), (i2, Fetch)).

| ISA-Level<br>Outcome | Observable<br>(≥ 1 Graph Acyclic) | Not Observable<br>(All Graphs Cyclic) |
|----------------------|-----------------------------------|---------------------------------------|
| Allowed              | ОК                                | OK (stricter<br>than necessary)       |
| Forbidden            | <b>Consistency violation!</b>     | ОК                                    |
| J                    | Under SC: Forbid r1=1, r          | 2=0                                   |



Microarchitectural happens-before (µhb) graphs

Microarchitecture in µspec DSL

```
Axiom "Decode_is_FIFO":
... EdgeExists ((i1, Decode), (i2, Decode))
=> AddEdge ((i1, Execute), (i2, Execute)).
```

```
Axiom "PO_Fetch":
```

... SameCore i1 i2 /\ ProgramOrder i1 i2 =>
 AddEdge ((i1, Fetch), (i2, Fetch)).

| ISA-Level<br>Outcome        | Observable<br>(≥ 1 Graph Acyclic) | Not Observable<br>(All Graphs Cyclic) |
|-----------------------------|-----------------------------------|---------------------------------------|
| Allowed                     | ОК                                | OK (stricter<br>than necessary)       |
| Forbidden                   | Consistency violation!            | ОК                                    |
| Under SC: Forbid r1=1, r2=0 |                                   |                                       |



Microarchitectural happens-before (µhb) graphs

Microarchitecture in µspec DSL

Axiom "Decode\_is\_FIFO":
 ... EdgeExists ((i1, Decode), (i2, Decode))
 => AddEdge ((i1, Execute), (i2, Execute)).



## **Perennial Question:**

## "Do your litmus tests cover all possible MCM bugs?"

## How to automatically prove correctness for all programs?

| Allowed   | UK | than necessary) |
|-----------|----|-----------------|
| Forbidden |    |                 |
|           |    |                 |

Microarchitectural happens-before (µhb) graphs

[Lustig et al. MICRO-47, ...]

#### All non-unary cycles containing fr



#### All non-unary cycles containing fr



fr  $i_1 \xrightarrow{co} i_2 \xrightarrow{po} i_3$ 





Transitive chain (sequence) of ISA-level edges

All non-unary cycles containing fr









Using TC Abstraction











 $\bullet$   $\bullet$   $\bullet$ 

All non-unary cycles containing fr









Using TC Abstraction











 $\bullet$   $\bullet$   $\bullet$ 

**Using TC Abstraction** 













rt



## Microarchitectural Correctness Proof

















### Concretization

- All concretizations must be unobservable
- Observable concretizations are counterexamples





### Concretization

- All concretizations must be unobservable
- Observable concretizations are counterexamples



Additional instruction and ISA-level edge modelled => extra constraints

• May be enough to make execution unobservable



### **Decomposition:**



- Additional instruction and ISA-level edge modelled => extra constraints
  - May be enough to make execution unobservable



### **Decomposition:**



- Additional instruction and ISA-level edge modelled => extra constraints
  - May be enough to make execution unobservable



### **Decomposition:**



- Additional instruction and ISA-level edge modelled => extra constraints
  - May be enough to make execution unobservable



### **Decomposition:**



- Additional instruction and ISA-level edge modelled => extra constraints
  - May be enough to make execution unobservable



### **Decomposition:**



### Outline

### Background

- ISA-level MCM specs
- Microarchitectural ordering specs
- Microarchitectural Correctness Proof
  - Transitive Chain (TC) Abstraction
- Overall PipeProof Operation
  - TC Abstraction Support Proof
  - Chain Invariants
- Results





















execution that is observable) is often returned





# Transitive Chain (TC) Abstraction Support Proof Ensure that ISA-level pattern and µarch. support TC Abstraction

Base case: Do initial ISA-level edges guarantee connection?



Inductive case: Extend transitive chain => extend transitive connection?



- Abstractly represent repeated ISA-level patterns
- Sometimes needed for refinement loop to terminate
- Inductively proven by PipeProof before their use in proof algorithms
- Example: checking for edge from i1 to i5 (TC abstraction support proof)



#### Abstract Counterexample

- Abstractly represent repeated ISA-level patterns
- Sometimes needed for refinement loop to terminate
- Inductively proven by PipeProof before their use in proof algorithms
- Example: checking for edge from i1 to i5 (TC abstraction support proof)





- Abstractly represent repeated ISA-level patterns
- Sometimes needed for refinement loop to terminate
- Inductively proven by PipeProof before their use in proof algorithms
- Example: checking for edge from i1 to i5 (TC abstraction support proof)



#### **Repeating ISA-Level Pattern**

Can continue decomposing in this way forever!



- Abstractly represent repeated ISA-level patterns
- Sometimes needed for refinement loop to terminate
- Inductively proven by PipeProof before their use in proof algorithms
- Example: checking for edge from i1 to i5 (TC abstraction support proof)



-po\_plus = arbitrary
number of repetitions of po
-Next edge peeled off will
be something other than po

### In the paper...

- Optimizations
  - Covering Sets: Eliminate redundant transitive connections
  - Memoization: Eliminate redundant ISA-level cycles
- Inductive ISA edge generation
- Adequate Model Over-Approximation
  - Needed to ensure soundness of PipeProof's abstraction-based approach
- ...and more!



- Ran PipeProof on simpleSC (SC) and simpleTSO (TSO) µarches
  - 3-stage in-order pipelines
- Proved correctness of both microarchitectures for all programs
  - With optimizations, runtimes < 1 hour!

|            | simpleSC  | simpleSC<br>(w/ Covering Sets + Memoization) |
|------------|-----------|----------------------------------------------|
| Total Time | 225.9 sec | 19.1 sec                                     |

|            | simpleTSO | simpleTSO<br>(w/ Covering Sets + Memoization) |
|------------|-----------|-----------------------------------------------|
| Total Time | Timeout   | 2449.7 sec<br>(≈ 41 mins)                     |



- Ran PipeProof on simpleSC (SC) and simpleTSO (TSO) µarches
  - 3-stage in-order pipelines
- Proved correctness of both microarchitectures for all programs
  - With optimizations, runtimes < 1 hour!

|            | simpleSC  | simpleSC<br>(w/ Covering Sets + Memoization) |
|------------|-----------|----------------------------------------------|
| Total Time | 225.9 sec | 19.1 sec                                     |

|            | simpleTSO | simpleTSO<br>(w/ Covering Sets + Memoization) |
|------------|-----------|-----------------------------------------------|
| Total Time | Timeout   | 2449.7 sec<br>(≈ 41 mins)                     |



- Ran PipeProof on simpleSC (SC) and simpleTSO (TSO) µarches
  - 3-stage in-order pipelines
- Proved correctness of both microarchitectures for all programs
  - With optimizations, runtimes < 1 hour!

|            | simpleSC  | simpleSC<br>(w/ Covering Sets + Memoization) |
|------------|-----------|----------------------------------------------|
| Total Time | 225.9 sec | 19.1 sec                                     |

|            | simpleTSO | simpleTSO<br>(w/ Covering Sets + Memoization) |
|------------|-----------|-----------------------------------------------|
| Total Time | Timeout   | 2449.7 sec<br>(≈ 41 mins)                     |



- Ran PipeProof on simpleSC (SC) and simpleTSO (TSO) µarches
  - 3-stage in-order pipelines
- Proved correctness of both microarchitectures for all programs
  - With optimizations, runtimes < 1 hour!

|            | simpleSC  | simpleSC<br>(w/ Covering Sets + Memoization) |
|------------|-----------|----------------------------------------------|
| Total Time | 225.9 sec | 19.1 sec                                     |

|            | simpleTSO | simpleTSO<br>(w/ Covering Sets + Memoization) |
|------------|-----------|-----------------------------------------------|
| Total Time | Timeout   | 2449.7 sec<br>(≈ 41 mins)                     |



- Ran PipeProof on simpleSC (SC) and simpleTSO (TSO) µarches
  - 3-stage in-order pipelines
- Proved correctness of both microarchitectures for all programs
  - With optimizations, runtimes < 1 hour!

|            | simpleSC  | simpleSC<br>(w/ Covering Sets + Memoization) |
|------------|-----------|----------------------------------------------|
| Total Time | 225.9 sec | 19.1 sec                                     |

|            | simpleTSO | simpleTSO<br>(w/ Covering Sets + Memoization) |
|------------|-----------|-----------------------------------------------|
| Total Time | Timeout   | 2449.7 sec<br>(≈ 41 mins)                     |



### Conclusions

- PipeProof: Automated All-Program Microarchitectural MCM Verification
  - Designers no longer need to choose between completeness and automation
- Transitive Chain Abstraction allows inductive modelling and verification of the infinite set of all possible executions
  - Abstraction is automatically refined as necessary to prove correctness
- Verified simple microarchitectures implementing SC and TSO in < 1 hour!</p>

Code available at https://github.com/ymanerka/pipeproof

# We caught 'em all!



# PipeProof: Automated Memory Consistency Proofs for Microarchitectural Specifications

### Yatin A. Manerkar, Daniel Lustig\*, Margaret Martonosi, and Aarti Gupta

Code available at https://github.com/ymanerka/pipeproof



http://check.cs.princeton.edu/

- Must verify across all possible transitive connections
- Each decomposition creates a new set of transitive connections
  - Can quickly lead to a case explosion
- The Covering Sets Optimization eliminates redundant transitive connections





- Must verify across all possible transitive connections
- Each decomposition creates a new set of transitive connections
  - Can quickly lead to a case explosion
- The Covering Sets Optimization eliminates redundant transitive connections

Graph A has an edge from  $x \rightarrow z$  (tran conn.)





- Must verify across all possible transitive connections
- Each decomposition creates a new set of transitive connections
  - Can quickly lead to a case explosion
- The Covering Sets Optimization eliminates redundant transitive connections

Graph A has an edge from x→z (tran conn.)



Graph B has edges from y→z (tran conn.) and x→z (by transitivity)



- Must verify across all possible transitive connections
- Each decomposition creates a new set of transitive connections
  - Can quickly lead to a case explosion
- The Covering Sets Optimization eliminates redundant transitive connections

Graph A has an edge from  $x \rightarrow z$  (tran conn.)



Graph B has edges from y→z (tran conn.) and x→z (by transitivity)

Correctness of A => Correctness of B (since B contains A's tran conn.) Checking B explicitly is redundant!



- Base PipeProof algorithm examines some cycles multiple times
- Memoization eliminates redundant checks of cycles that have already been verified





- Base PipeProof algorithm examines some cycles multiple times
- Memoization eliminates redundant checks of cycles that have already been verified







- Base PipeProof algorithm examines some cycles multiple times
- Memoization eliminates redundant checks of cycles that have already been verified







- Base PipeProof algorithm examines some cycles multiple times
- Memoization eliminates redundant checks of cycles that have already been verified

Some

Tran.



i4

EX





Same cycle is checked 3 times!

- Base PipeProof algorithm examines some cycles multiple times
- Memoization eliminates redundant checks of cycles that have already been verified





Same cycle is checked 3 times!

<u>Procedure:</u> If all ISA-level cycles containing edge r<sub>i</sub> have been checked, do not peel off r<sub>i</sub> edges when checking subsequent cycles



### The Adequate Model Over-Approximation

- Addition of an instruction can make unobservable execution observable!
- Need to work with over-approximation of microarchitectural constraints
- PipeProof sets all exists clauses to true as its over-approximation



