



# Beyond the Surface: Validation Challenges and Opportunities for Confidential Computing

Jo Van Bulck

☆ DistriNet, KU Leuven, Belgium 
☐ jo.vanbulck@cs.kuleuven.be 
☐ @jovanbulck 
☐ vanbulck.net

September 9, 2024, PAVeTrust @ FM'24











#### **Enclaved Execution: Reducing Attack Surface**







"Platonic" ideal: Hardware-level isolation and attestation

#### **Enclaved Execution: Privileged Side-Channel Attacks**







**Reality #1:** Microarchitectural side channels

#### **Confidential Computing Spectrum**









Reality #2: Heterogeneous CPU spectrum



#### Case Study: Hardware-Software Co-Design for Secure IRQs



**Interrupts** == Universal attack primitive



• Small CPU, open source



Large CPU, proprietary

#### Case Study: Hardware-Software Co-Design for Secure IRQs



**Interrupts** == Universal attack primitive

→ explore synergy low-end <> high-end TEEs







- Small CPU, open source
- Formal operational semantics

- Large CPU, proprietary
- Pragmatic mitigation primitives



## Reality #1: IRQ Side channels?

#### Wait a Cycle: Interrupt Latency as a Side Channel





#### Sancus: Open-Source Trusted Computing for the IoT

#### Embedded enclaved execution:

- Isolation & attestation
- Save + clear CPU state on interrupt

#### Small CPU (openMSP430):

- Area: ≤ 2 kLUTs
- Deterministic execution: no pipeline/cache/MMU/...
- Research vehicle for rapid prototyping of attacks & mitigations





## Example: Extracting Keystrokes with Interrupt Latency







## **Example: Extracting Keystrokes with Interrupt Latency**







## Mitigation Strategy #1: Hardware-Level Padding





## Mitigation Strategy #1: Hardware-Level Padding





#### **Deductive Validation: Proving Contextual Equivalence**





- Operational semantics: Sancus<sub>H/L</sub> w/ and w/o interrupts (~35 pages)
- **Pen-and-paper proof:** Any attack in Sancus<sub>L</sub> can also be expressed in Sancus<sub>H</sub> w/o interrupts...



#### **Deductive Validation: Proving Contextual Equivalence**



```
(CPU-NOT)
                                               \mathcal{B} \neq \langle \bot, \bot, t_{pad} \rangle i, \mathcal{R}, pc_{old}, \mathcal{B} \vdash_{mac} \mathsf{OK} \mathcal{R}' = \mathcal{R}[\mathsf{pc} \mapsto \mathcal{R}[\mathsf{pc}] + 2][\mathsf{r} \mapsto \neg \mathcal{R}[\mathsf{r}]]
                   \mathcal{D} \vdash \delta, t, t_a \curvearrowright_{\mathcal{D}}^{cycles(i)} \delta', t', t'_a \quad \mathcal{D} \vdash \langle \delta', t', t'_a, \mathcal{M}, \mathcal{R}', \mathcal{R}[\mathsf{pc}], \mathcal{B} \rangle \hookrightarrow_{\mathsf{I}} \langle \delta'', t'', t''_a, \mathcal{M}', \mathcal{R}'', \mathcal{R}[\mathsf{pc}], \mathcal{B}' \rangle
                                                                                                                                                                                                                                                                                                  i = decode(\mathcal{M}, \mathcal{R}[pc]) = NOT r
                                                                    \mathcal{D} \vdash (\delta, t, t_0, \mathcal{M}, \mathcal{R}, pc_{old}, \mathcal{B}) \rightarrow (\delta'', t'', t'', \mathcal{M}', \mathcal{R}'', \mathcal{R}[pc], \mathcal{B}')
          (CPU-AND)
                               \mathcal{B} \neq (\bot, \bot, t_{red}) i, \mathcal{R}, pc_{old}, \mathcal{B} \vdash_{mac} \mathsf{OK} \mathcal{R}' = \mathcal{R}[\mathsf{pc} \mapsto \mathcal{R}[\mathsf{pc}] + 2][\mathsf{r}_2 \mapsto \mathcal{R}[\mathsf{r}_1] \& \mathcal{R}[\mathsf{r}_2]]
                             \mathcal{R}'' = \mathcal{R}'[\mathbf{sr.N} \mapsto \mathcal{R}'[\mathbf{r}_2] \& 0x8000, \mathbf{sr.Z} \mapsto (\mathcal{R}'[\mathbf{r}_2] == 0), \mathbf{sr.C} \mapsto (\mathcal{R}'[\mathbf{r}_2] \neq 0), \mathbf{sr.V} \mapsto 0]
            \mathcal{D} \vdash \delta, t, t_a \curvearrowright_{\mathcal{D}}^{cycles(i)} \delta', t', t'_a \quad \mathcal{D} \vdash \langle \delta', t', t'_a, \mathcal{M}, \mathcal{R}'', \mathcal{R}[pc], \mathcal{B} \rangle \hookrightarrow_{\mathsf{I}} \langle \delta'', t'', t''_a, \mathcal{M}', \mathcal{R}''', \mathcal{R}[pc], \mathcal{B}' \rangle
                                                                                                                                                                                                                                                                                               i = decode(\mathcal{M}, \mathcal{R}[pc]) = AND r_1 r_2
                                                             \mathcal{D} \vdash \langle \delta, t, t_a, \mathcal{M}, \mathcal{R}, pc_{old}, \mathcal{B} \rangle \rightarrow \langle \delta'', t'', t'', \mathcal{M}', \mathcal{R}''', \mathcal{R}[pc], \mathcal{B}' \rangle
(CPU-CMP)
                               \mathcal{B} \neq \langle \bot, \bot, t_{pad} \rangle i, \mathcal{R}, pc_{old}, \mathcal{B} \vdash_{mac} OK \mathcal{R}' = \mathcal{R}[pc \mapsto \mathcal{R}[pc] + 2][r_2 \mapsto \mathcal{R}[r_1] - \mathcal{R}[r_2]]
 \mathcal{R}'' = \mathcal{R}'[\mathsf{sr.N} \mapsto (\mathcal{R}'[\mathsf{r}_2] < 0), \mathsf{sr.Z} \mapsto (\mathcal{R}'[\mathsf{r}_2] == 0), \mathsf{sr.C} \mapsto (\mathcal{R}'[\mathsf{r}_2] \neq 0), \mathsf{sr.V} \mapsto overflow(\mathcal{R}[\mathsf{r}_1] - \mathcal{R}[\mathsf{r}_2])]
            \mathcal{D} \vdash \delta, t, t_a \curvearrowright_{\mathcal{D}}^{cycles(i)} \delta', t', t'_a \quad \mathcal{D} \vdash \langle \delta', t', t'_a, \mathcal{M}, \mathcal{R}'', \mathcal{R}[pc], \mathcal{B} \rangle \hookrightarrow_{\mathsf{I}} \langle \delta'', t'', t''_a, \mathcal{M}', \mathcal{R}''', \mathcal{R}[pc], \mathcal{B}' \rangle
                                                                                                                                                                                                                                                                                                          i = decode(\mathcal{M}, \mathcal{R}[pc]) = CMP r_1 r_2
                                                              \mathcal{D} \vdash \langle \delta, t, t_a, \mathcal{M}, \mathcal{R}, pc_{old}, \mathcal{B} \rangle \rightarrow \langle \delta'', t'', t_a'', \mathcal{M}', \mathcal{R}''', \mathcal{R}[pc], \mathcal{B}' \rangle
(CPU-ADD)
                              \mathcal{B} \neq \langle \bot, \bot, t_{nod} \rangle i, \mathcal{R}, pc_{old}, \mathcal{B} \vdash_{mac} OK \mathcal{R}' = \mathcal{R}[pc \mapsto \mathcal{R}[pc] + 2][r_2 \mapsto \mathcal{R}[r_1] + \mathcal{R}[r_2]]
 \mathcal{R}'' = \mathcal{R}'[\mathsf{sr.N} \mapsto (\mathcal{R}'[\mathsf{r}_2] < 0), \mathsf{sr.Z} \mapsto (\mathcal{R}'[\mathsf{r}_2] == 0), \mathsf{sr.C} \mapsto (\mathcal{R}'[\mathsf{r}_2] \neq 0), \mathsf{sr.V} \mapsto overflow(\mathcal{R}[\mathsf{r}_1] + \mathcal{R}[\mathsf{r}_2])]
            \mathcal{D} \vdash \delta, t, t_{\alpha} \curvearrowright_{\mathcal{D}}^{cycles(i)} \delta', t', t'_{\alpha} \quad \mathcal{D} \vdash (\delta', t', t'_{\alpha}, \mathcal{M}, \mathcal{R}'', \mathcal{R}[pc], \mathcal{B}) \hookrightarrow_{\mathbf{I}} (\delta'', t'', t''_{\alpha}, \mathcal{M}', \mathcal{R}''', \mathcal{R}[pc], \mathcal{B}')
                                                                                                                                                                                                                                                                                                        -i = decode(\mathcal{M}, \mathcal{R}[pc]) = ADD r_1 r_2
                                                              \mathcal{D} \vdash (\delta, t, t_0, \mathcal{M}, \mathcal{R}, pc_{old}, \mathcal{B}) \rightarrow (\delta'', t'', t'', \mathcal{M}', \mathcal{R}''', \mathcal{R}[pc], \mathcal{B}')
(CPU-SUB)
                              \mathcal{B} \neq \langle \bot, \bot, t_{pad} \rangle i, \mathcal{R}, pc_{old}, \mathcal{B} \vdash_{mac} \mathsf{OK} \mathcal{R}' = \mathcal{R}[\mathsf{pc} \mapsto \mathcal{R}[\mathsf{pc}] + 2][\mathsf{r}_2 \mapsto \mathcal{R}[\mathsf{r}_1] - \mathcal{R}[\mathsf{r}_2]]
 \mathcal{R}'' = \mathcal{R}'[\text{sr.N} \mapsto (\mathcal{R}'[\mathbf{r}_2] < 0), \text{sr.Z} \mapsto (\mathcal{R}'[\mathbf{r}_2] == 0), \text{sr.C} \mapsto (\mathcal{R}'[\mathbf{r}_2] \neq 0), \text{sr.V} \mapsto overflow(\mathcal{R}[\mathbf{r}_1] - \mathcal{R}[\mathbf{r}_2])]
            \mathcal{D} \vdash \delta, t, t_a \curvearrowright_{\mathcal{D}}^{cycles(i)} \delta', t', t'_a \quad \mathcal{D} \vdash \langle \delta', t', t'_a, \mathcal{M}, \mathcal{R}'', \mathcal{R}[pc], \mathcal{B} \rangle \hookrightarrow_{\mathsf{I}} \langle \delta'', t'', t''_a, \mathcal{M}', \mathcal{R}''', \mathcal{R}[pc], \mathcal{B}' \rangle
                                                                                                                                                                                                                                                                                                        -i = decode(\mathcal{M}, \mathcal{R}[pc]) = SUB r_1 r_2
                                                              \mathcal{D} \vdash (\delta, t, t_a, \mathcal{M}, \mathcal{R}, pc_{old}, \mathcal{B}) \rightarrow (\delta'', t'', t'', t'', \mathcal{M}', \mathcal{R}''', \mathcal{R}[pc], \mathcal{B}')
```

#### Mind the Gap: Studying Model Mismatches



- Inductively study 2 "provably secure" systems (Sancus + VRASED)
- >16 model mismatches/incompleteness



TABLE I. List of falsified and exploitable assumptions found in Sancus<sub>V</sub>. IM = Implementation-model mismatch; MA = Missing attacker capability.

| IM | V-B1<br>V-B2<br>V-B3<br>V-B4<br>V-B5<br>V-B6<br>V-B7 | Instruction execution time does not depend on the context. The maximum instruction execution time is $T=6$ . Interrupted enclaves can only be resumed once with reti. Interrupted enclaves cannot be restarted from the ISR. The system only supports a single enclave. Enclave software cannot access unprotected memory. Enclave software cannot manipulate interrupt functionality. |
|----|------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| MA | V-C1<br>V-C2                                         | Untrusted DMA peripherals are not modeled.  Interrupts from the watchdog timer are not modeled.                                                                                                                                                                                                                                                                                        |

#### Mitigation Strategy #2: Compile-Time Branch Balancing



#### **Inductive Validation: Microarchitectural Profiling**



#### "Principled" ISA augmentation:

- 1) Exhaustively generate all instructions
- 2) Extract leakage model
- 3) Feed to compiler + binary validator

#### ~ Hardware-software contract

 $\rightarrow$  incl.  $\mu$ -arch timing



#### **Inductive Validation: Systematic ISA Augmentation**





## **Excurse: Subverting and Securing TI IPE "Enclaves"**



|               | Attack primitive                    | CX           | ΙΧ         | Section |
|---------------|-------------------------------------|--------------|------------|---------|
| Architectural | Controlled call corruption (new)    | •            | •          | §3.1    |
|               | Code gadget reuse [35]              |              | •          | §3.2    |
|               | Interrupt register state [73]       | •            | •          | §3.3    |
|               | Interface sanitization [69]         | $lackbox{0}$ | •          | §6.1    |
| Side channels | Cache timing side channel [23, 39]  | •            | 0          | §3.4.1  |
|               | Interrupt latency side channel [71] | lacksquare   | $\bigcirc$ | §3.4.2  |
|               | Controlled channel [25, 77]         | •            | $\bigcirc$ | §3.4.3  |
|               | Voltage fault injection [31, 40]    | $\bigcirc$   | $\bigcirc$ | §A.1    |
|               | DMA contention side channel [7,8]   | $\bigcirc$   | $\bigcirc$ | §A.2    |

#### Excurse: Subverting and Securing TI IPE "Enclaves"





#### **Excurse: Subverting and Securing TI IPE "Enclaves"**



**PSIRT Notification** 

## MSP430FR5xxx and MSP430FR6xxx IP Encapsulation Write Vulnerability



#### **Summary**

The IP Encapsulation feature of the Memory Protection Unit may not properly prevent writes to an IPE protected region under certain conditions. This vulnerability assumes an attacker has control of the device outside of the IPE protected region (access to non-protect memory, RAM, and CPU registers).

#### **Vulnerability**

Bognar et al. "Intellectual Property Exposure: Subverting and Securing Intellectual Property Encapsulation in Texas Instruments Microcontrollers", USENIX Sec'24.



## Reality #2: Larger CPUs?

## **Challenge: Side-Channel Sampling Rate**



#### SGX-Step: Executing Enclaves one Instruction at a Time





### SGX-Step: Executing Enclaves one Instruction at a Time





#### **SGX-Step: A Versatile Open-Source Attack Toolkit**





#### **Root-Causing SGX-Step: Aiming the Timer Interrupt**





#### **Root-Causing SGX-Step: CPU Microcode Assists**



|                    | PTE A-bit | Mean (cycles) | Stddev (cycles) | • • •                                                 |                                       |
|--------------------|-----------|---------------|-----------------|-------------------------------------------------------|---------------------------------------|
|                    | A=1       | 27            | 30              |                                                       |                                       |
|                    | A=0       | 666           | 55              |                                                       |                                       |
|                    |           |               | ~               | 3. Assisted PT walk                                   |                                       |
|                    |           |               |                 |                                                       |                                       |
| <u> </u>           |           |               |                 | page walk (\$RIP)                                     | ехес                                  |
| 1. Clear PTE A-bit |           | 2. TLB flush  | ******          | ******                                                | # # # # # # # # # # # # # # # # # # # |
| Arm timer          |           | ERESUME       |                 | NOP <sub>1</sub>                                      |                                       |
|                    |           | -{[·]         |                 | $ \longrightarrow                                   $ | 2                                     |

## **Root-Causing SGX-Step: CPU Microcode Assists**





#### **Ideas That Were Rejected (1)**





#### **Ideas That Were Rejected (2)**







# CHAPTER 8 ASYNCHRONOUS ENCLAVE EXIT NOTIFY AND THE EDECCSSA USER LEAF FUNCTION

#### 8.1 INTRODUCTION

Asynchronous Enclave Exit Notify (AEX-Notify) is an extension to Intel® SGX that allows Intel SGX enclaves to be notified after an asynchronous enclave exit (AEX) has occurred. EDECCSSA is a new Intel SGX user leaf function (ENCLUEDECCSSA) that can facilitate AEX notification handless well as some and the second sec

The following list summarizes the a details are provided in Section 8.3)

- SECS.ATTRIBUTES.AEXNOTIFY
- TCS.FLAGS.AEXNOTIFY: This e



→ shipped in millions of devices ≥ 4th Gen Xeon CPU

 SSA.GPRSGX.AEXNOTIFY: Enclave-writable byte that allows enclave software to dynamically enable/disable AEX notifications.

An AEX notification is delivered by ENCLU[ERESUME] when the following conditions are met:

#### **AEX-Notify: Idea Overview**





#### **AEX-Notify: Software Implementation**





#### **Conclusions and Take-Away**



Value of **deductive formal models** 



... guided and refined by inductive validation!



**Synergy** attacks ↔ defenses; open-source research prototypes



Thank you! Questions?