PowerPoint Presentation
Side Channel Attacks
‹#›
Key
plaintext
ciphertext
‹#›
Key
plaintext
ciphertext
Side channel
‹#›
AES Implementation
‹#›
AES Implementation
‹#›
AES T-table access
Assume we know the plaintext and the index (s0>>24)
We can recover the most significant byte of the key
s0 = plaintext ^ key
t0 = Te0[s0>>24]
‹#›
AES Implementation
If we know the plaintext and all of the indices in the first round we can recover the key.
‹#›
Microarchitecture
The Microarchitecture
ISA
Data
Cache
Instruction
Cache
Pipeline
BPU
TLB
MMU
DRAM
Interconnect
LLC
‹#›
CPU vs. Memory
Processor
Speed
1 MHz
8*2600 MHz
Memory
Latency
500 ns
63 ns
‹#›
Bridging the gap
Cache utilises locality to bridge the gap
Divides memory into lines
Stores recently used lines
In a cache hit, data is retrieved from the cache
In a cache miss, data is retrieved from memory and inserted to the cache
Processor
Memory
Cache
‹#›
Another thing that changed since my Apple // days is the processor speed. My apple 2 ran at 1 MHz. My mac runs at 2GHz. Memory speed, however, has not fared so well. The memory on my apple needed half a clock cycle to produce data, whereas modern computers require about 200. When the processor reads from memory it needs to wait.
The cache is the part of the processor that bridges this gap. It is a small memory that is included within the processor. Being small, being closer to the execution core and using more expensive technologies than the main memory mean that the cache is much faster than the memory. On intel the spead ranges from 4 to 50 cycles, depending on the level of cache.
Yada yada
10
Set Associative Caches
Memory lines map to cache sets. Multiple lines map to the same set.
Sets consist of ways. A memory line can be stored in any of the ways of the set it maps to.
When a cache miss occurs, one of the lines in the set is evicted.
Memory
Ways
Sets
‹#›
11
The Prime+Probe Attack
Allocate a cache-sized memory buffer
Prime: fills the cache with the contents of the buffer
Probe: measure the time to access each cache set
Slow access indicates victim access to the set
The probe phase primes the cache for the next round
Memory
‹#›
12
Sample Victim: Data Rattle
‹#›
Cache Fingerprint of the Rattle Program
‹#›
AES T-tables and cache lines
Cache Line 0
Cache Line 1
Cache Line 2
Cache Line 3
Cache Line 4
Cache Line 5
‹#›
AES T-tables and cache lines
If 0≤plaintext[0]^key[0]<16, Cache Line 0 is accessed.
What if plaintext[0]^key[0]≥16?
Cache Line 0
Cache Line 1
Cache Line 2
‹#›
Analysing the AES Implementation
‹#›
Analysing the AES Implementation
Each round, Te0 is accessed 4 times
AES has 10 rounds
Te0 is accessed 40 times in an AES encryption
The first access misses Cache Line 0
Each following access misses Cache Line 0 with a probability of 15/16
The probability that all accesses miss Cache Line 0 is about 8%
‹#›
Prime+Probe Attack on AES
Repeat 1000000 times:
Generate a random plaintext
Prime the cache
Encrypt the plaintext
Probe the cache and record results
For each plaintext byte
Partition results based on the most significant half of a plaintext byte
Find the cache set with the slowest average access time for each partition
Identify the most significant half of the corresponding key byte
‹#›
PP Attack on AES - Results
‹#›
PP Attack on AES – More Results
‹#›
What's now?
Recover the second half of the key
Second round attack – similar but with ugly maths
How to perform the attack
Easy: use Mastik:
http://cs.adelaide.edu.au/~yval/Mastik/
How to defend?
Later…
‹#›
The Flush+Reload Technique
Leaks information on victim access to shared memory.
Spy monitors victim’s access to shared code
Spy can determine what victim does
Spy can infer the data the victim operates on
‹#›
23
Data (copied)
Data (copied)
Code (shared)
Code (shared)
Code
Data
Code Sharing
Recall that programs that run the same executable can share the code
Executable file:
Process A
Process B
‹#›
and data
Some other code
Data (copied)
Code (shared)
Code mapped
as data
Code
Data
Code is Data
In Von Neumann architectures code is a type of data
Program file:
Process A
Process B
‹#›
Cache Consistency
Memory and cache can be in inconsistent states
Rare, but possible
Solution: Flushing the cache contents
Ensures that the next load is served from the memory
Processor
Memory
Cache
‹#›
Flush+Reload
Flush memory line
Wait a bit
Measure time to Reload line
slow-> no access
fast-> access
Repeat
Processor
Memory
Cache
‹#›
Reminder: Fast 50 cycles. Slow 200. Measurement adds about 30 cycles.
Wrap-up – the weakness is that the spy can monitor access to shared memory lines and consequently infer information about the data the victim processes
27
The RSA Encryption System
The RSA encryption is a public key cryptographic scheme
C = Me mod N
M
C
M = Cd mod N
Key Generation:
Select random primes p and q
Calculate N = pq
Select a public exponent e(=65537)
Compute d=e-1 mod φ(N)
(N, e) is the public key
(p, q, d) is the private key
‹#›
Schnorr Signatures
(A, α) = keypair()
A
(R, r) = keypair()
s
e
s=r-eα
M
e=Hash(R,M)
R=gs·Ae (mod p)
e=?Hash(R,M)
R=gr mod p
‹#›
Operation x i di
1 2 101
Square 1 2 101
reduce 1 2 101
Multiply 11 2 101
reduce 11 2 101
Square 121 1 101
reduce 21 1 101
Square 441 0 101
reduce 41 0 101
Multiply 451 0 101
reduce 51 0 101
GnuPG 1.4.13 Exponentiation
Example:
115 mod 100 =
161,051 mod 100 = 51
x ⟵1
for i ⟵|d|-1 downto 0 do
x ⟵x2 mod n
if (di = 1) then
x = xC mod n
endif
done
return x
The private key is encoded in the sequence of operations
!!!
‹#›
Old implementation – not in use
30
Flush+Reload on GnuPG 1.4.13
‹#›
FR vs. PP
Flush+Reload tends to be more accurate
Prime+Probe has less prerequisites
‹#›
Variants
Prime+Probe
Instruction cache
Last-level cache
TLB, BPU
Prime+Abort
Flush+Reload
Flush-Flush
Evict+Reload
Evict+Time
CacheBleed
Open DRAM rows
Prefetch Channel
‹#›
Countermeasures– System Level
Avoid sharing hardware
Goes against modern software deployment trends
Safe hardware implementations
Limited applicability
Hardware partitioning
Partial support (if any)
State sanitisation
Partial support (if any)
Hardware randomisation
Not currently supported
Clock randomisation
Ineffective
‹#›
Software Countermeasures
Preloading
Read all of the AES tables prior to decryption
Ineffective against asynchronous adversaries
AES S-table implementation
A single table of size 256 bytes
Reduces chance of missing a cache line
‹#›
GnuPG 1.4.14 Square and Multiply Always
x ⟵1
for i ⟵|d|-1 downto 0 do
x ⟵x2 mod n
if (di = 1) then
x = xC mod n
endif
done
return x
x ⟵1
for i ⟵|d|-1 downto 0 do
x ⟵x2 mod n
t = xC mod n
if (di = 1) then
x = t
endif
done
return x
‹#›
Constant-Time Programming
A programming style that avoids:
Instructions whose timing depends on secret data
Conditional execution based on secret data
Memory access to addresses that depend on secret data
‹#›
Eliminating Conditional Statements
if (condition)
t = f1()
else
t = f2()
t1 = f1()
t2 = f2()
t = select(t1, t2, condition)
‹#›
Implementing select
Case 1: condition evaluates to 0 or 1
mask = condition – 1
return (t1 & ~mask) | (t2 & mask)
Case 2: condition (c) evaluates to 0 or non-0
mask = ((c ^ (c-1)) & ~c)>>32
return (t1 & ~mask) | (t2 & mask)
‹#›
Caveats
The result of select depends on secret data. Anything that depends on it also depends on secret data.
In particular, swapping pointers using select does not produce constant-time code
The choices of processor, languages and compiler matter
In most processors, division is not constant-time
In some processors multiplication is not constant-time
Compiler optimisations may kill constant-time code
These issues have been exploited
‹#›
/docProps/thumbnail.jpeg