程序代写代做代考 scheme compiler cache PowerPoint Presentation

PowerPoint Presentation

Side Channel Attacks

‹#›

Key
plaintext
ciphertext

‹#›

Key
plaintext

ciphertext

Side channel

‹#›

AES Implementation

‹#›

AES Implementation

‹#›

AES T-table access
Assume we know the plaintext and the index (s0>>24)
We can recover the most significant byte of the key

s0 = plaintext ^ key
t0 = Te0[s0>>24]

‹#›

AES Implementation

If we know the plaintext and all of the indices in the first round we can recover the key.

‹#›

Microarchitecture
The Microarchitecture

ISA
Data
Cache
Instruction
Cache
Pipeline
BPU
TLB
MMU
DRAM
Interconnect
LLC

‹#›

CPU vs. Memory

Processor
Speed
1 MHz
8*2600 MHz
Memory
Latency
500 ns
63 ns

‹#›

Bridging the gap
Cache utilises locality to bridge the gap
Divides memory into lines
Stores recently used lines

In a cache hit, data is retrieved from the cache
In a cache miss, data is retrieved from memory and inserted to the cache

Processor

Memory

Cache

‹#›

Another thing that changed since my Apple // days is the processor speed. My apple 2 ran at 1 MHz. My mac runs at 2GHz. Memory speed, however, has not fared so well. The memory on my apple needed half a clock cycle to produce data, whereas modern computers require about 200. When the processor reads from memory it needs to wait. and if it needs to read from the same location again it needs to wait again. . So, we have a speed gap between the processor and the memory.

The cache is the part of the processor that bridges this gap. It is a small memory that is included within the processor. Being small, being closer to the execution core and using more expensive technologies than the main memory mean that the cache is much faster than the memory. On intel the spead ranges from 4 to 50 cycles, depending on the level of cache.

Yada yada
10

Set Associative Caches
Memory lines map to cache sets. Multiple lines map to the same set.
Sets consist of ways. A memory line can be stored in any of the ways of the set it maps to.
When a cache miss occurs, one of the lines in the set is evicted.

Memory

Ways

Sets

‹#›

11

The Prime+Probe Attack
Allocate a cache-sized memory buffer
Prime: fills the cache with the contents of the buffer
Probe: measure the time to access each cache set
Slow access indicates victim access to the set
The probe phase primes the cache for the next round

Memory

‹#›

12

Sample Victim: Data Rattle

‹#›

Cache Fingerprint of the Rattle Program

‹#›

AES T-tables and cache lines

Cache Line 0
Cache Line 1
Cache Line 2
Cache Line 3
Cache Line 4
Cache Line 5

‹#›

AES T-tables and cache lines
If 0≤plaintext[0]^key[0]<16, Cache Line 0 is accessed. What if plaintext[0]^key[0]≥16? Cache Line 0 Cache Line 1 Cache Line 2 ‹#› Analysing the AES Implementation ‹#› Analysing the AES Implementation Each round, Te0 is accessed 4 times AES has 10 rounds Te0 is accessed 40 times in an AES encryption The first access misses Cache Line 0 Each following access misses Cache Line 0 with a probability of 15/16 The probability that all accesses miss Cache Line 0 is about 8% ‹#› Prime+Probe Attack on AES Repeat 1000000 times: Generate a random plaintext Prime the cache Encrypt the plaintext Probe the cache and record results For each plaintext byte Partition results based on the most significant half of a plaintext byte Find the cache set with the slowest average access time for each partition Identify the most significant half of the corresponding key byte ‹#› PP Attack on AES - Results ‹#› PP Attack on AES – More Results ‹#› What's now? Recover the second half of the key Second round attack – similar but with ugly maths How to perform the attack Easy: use Mastik: http://cs.adelaide.edu.au/~yval/Mastik/ How to defend? Later… ‹#› The Flush+Reload Technique Leaks information on victim access to shared memory. Spy monitors victim’s access to shared code Spy can determine what victim does Spy can infer the data the victim operates on ‹#› 23 Data (copied) Data (copied) Code (shared) Code (shared) Code Data Code Sharing Recall that programs that run the same executable can share the code Executable file: Process A Process B ‹#› and data Some other code Data (copied) Code (shared) Code mapped as data Code Data Code is Data In Von Neumann architectures code is a type of data Program file: Process A Process B ‹#› Cache Consistency Memory and cache can be in inconsistent states Rare, but possible Solution: Flushing the cache contents Ensures that the next load is served from the memory Processor Memory Cache ‹#› Flush+Reload Flush memory line Wait a bit Measure time to Reload line slow-> no access
fast-> access
Repeat

Processor
Memory

Cache

‹#›

Reminder: Fast 50 cycles. Slow 200. Measurement adds about 30 cycles.

Wrap-up – the weakness is that the spy can monitor access to shared memory lines and consequently infer information about the data the victim processes
27

The RSA Encryption System
The RSA encryption is a public key cryptographic scheme

C = Me mod N
M
C
M = Cd mod N
Key Generation:
Select random primes p and q
Calculate N = pq
Select a public exponent e(=65537)
Compute d=e-1 mod φ(N)
(N, e) is the public key
(p, q, d) is the private key

‹#›

Schnorr Signatures

(A, α) = keypair()
A
(R, r) = keypair()
s
e
s=r-eα

M
e=Hash(R,M)
R=gs·Ae (mod p)
e=?Hash(R,M)
R=gr mod p

‹#›

Operation x i di
1 2 101
Square 1 2 101
reduce 1 2 101
Multiply 11 2 101
reduce 11 2 101
Square 121 1 101
reduce 21 1 101
Square 441 0 101
reduce 41 0 101
Multiply 451 0 101
reduce 51 0 101

GnuPG 1.4.13 Exponentiation
Example:
115 mod 100 =
161,051 mod 100 = 51
x ⟵1
for i ⟵|d|-1 downto 0 do
x ⟵x2 mod n
if (di = 1) then
x = xC mod n
endif
done
return x
The private key is encoded in the sequence of operations
!!!

‹#›

Old implementation – not in use
30

Flush+Reload on GnuPG 1.4.13

‹#›

FR vs. PP
Flush+Reload tends to be more accurate
Prime+Probe has less prerequisites

‹#›

Variants
Prime+Probe
Instruction cache
Last-level cache
TLB, BPU
Prime+Abort
Flush+Reload
Flush-Flush
Evict+Reload
Evict+Time
CacheBleed
Open DRAM rows
Prefetch Channel

‹#›

Countermeasures– System Level
Avoid sharing hardware
Goes against modern software deployment trends
Safe hardware implementations
Limited applicability
Hardware partitioning
Partial support (if any)
State sanitisation
Partial support (if any)
Hardware randomisation
Not currently supported
Clock randomisation
Ineffective

‹#›

Software Countermeasures
Preloading
Read all of the AES tables prior to decryption
Ineffective against asynchronous adversaries
AES S-table implementation
A single table of size 256 bytes
Reduces chance of missing a cache line

‹#›

GnuPG 1.4.14 Square and Multiply Always
x ⟵1
for i ⟵|d|-1 downto 0 do
x ⟵x2 mod n
if (di = 1) then
x = xC mod n
endif
done
return x
x ⟵1
for i ⟵|d|-1 downto 0 do
x ⟵x2 mod n
t = xC mod n
if (di = 1) then
x = t
endif
done
return x

‹#›

Constant-Time Programming
A programming style that avoids:
Instructions whose timing depends on secret data
Conditional execution based on secret data
Memory access to addresses that depend on secret data

‹#›

Eliminating Conditional Statements
if (condition)
t = f1()
else
t = f2()
t1 = f1()
t2 = f2()
t = select(t1, t2, condition)

‹#›

Implementing select
Case 1: condition evaluates to 0 or 1
mask = condition – 1
return (t1 & ~mask) | (t2 & mask)
Case 2: condition (c) evaluates to 0 or non-0
mask = ((c ^ (c-1)) & ~c)>>32
return (t1 & ~mask) | (t2 & mask)

‹#›

Caveats
The result of select depends on secret data. Anything that depends on it also depends on secret data.
In particular, swapping pointers using select does not produce constant-time code
The choices of processor, languages and compiler matter
In most processors, division is not constant-time
In some processors multiplication is not constant-time
Compiler optimisations may kill constant-time code
These issues have been exploited

‹#›

/docProps/thumbnail.jpeg