CS306: Introduction to IT Security Fall 2020
Labs 5 & 6: More on Hashing
Instructor: Nikos Triandopoulos October 8 & 15, 2020
2
6.5 Hash functions
Cryptographic hash functions
Basic cryptographic primitive
u maps “objects” to a fixed-length binary strings
u core security property: mapping avoids collisions
input
arbitrarily
output
long string
u collision: distinct objects (x ≠ y) are mapped to the same hash value (H(x) = H(y))
short digest, fingerprint, “secure”
H
description
u although collisions necessarily exist, they are infeasible to find
Important role in modern cryptography
u lie between symmetric- and asymmetric-key cryptography
u capture different security properties of “idealized random functions” u qualitative stronger assumption than PRF
77
Hash & compression functions
Map messages to short digests
u a general hash function H() maps
u a message of an arbitrary length to a l(n)-bit string
input
arbitrarily long string
H
output
l(n)-bit string
u a compression (hash) function h() maps
u a long binary string to a shorter binary string
u an l’(n)-bit string to a l(n)-bit string, with l’(n) > l(n)
input
l’(n)-bit string
h
output
l(n)-bit string
4
Collision resistance (CR)
Attacker wins the game if x ≠ x’ & H(x) = H(x’)
description of H
x, x’
H is collision-resistant if any PPT A wins the game only negligibly often. 5
A
Hash function H
T
Weaker security notions
Given a hash function H: X ® Y, then we say that H is
u preimage resistant (or one-way)
u if given y Î Y, finding a value x Î X s.t. H(x) = y happens negligibly often
u 2-nd preimage resistant (or weak collision resistant)
u ifgivenauniformxÎX,findingavaluex’ÎX,s.t.x’1xandH(x’)=H(x)
happens negligibly often
u cf. collision resistant (or strong collision resistant)
u if finding two distinct values x’, x Î X, s.t. H(x’) = H(x) happens negligibly often 6
7
6.6 Design framework
Domain extension via the Merkle-Damgård transform
General design pattern for cryptographic hash functions
u reduces CR of general hash functions to CR of compression functions
input
H
u thus, in practice, it suffices to realize a collision-resistant compression function h
input
output
arbitrarily long string
output
≤ l’(n)-bit string
l(n)-bit string
l(n)-bit string
h
u compressing by 1 single bit is a least as hard as compressing by any number of bits!
8
Merkle-Damgård transform: Design
Suppose that h: {0,1}2n→ {0,1}n is a collision-resistant compression function Consider the general hash function H: M = {x : |x|<2n} → {0,1}n, defined as
Merkle-Damgård design
u H(x) is computed by applying h() in a “chained” manner over n-bit message blocks
u pad x to define a number, say B, message blocks x1, ..., xB, with |xi| = n
u set extra, final, message block xB+1 as an n-bit encoding L of |x|
u starting by initial digest z0 = IV = 0n, output H(x) = zB+1, where zi = hs(zi-1||xi)
9
Merkle-Damgård transform: Security
If the compression function h is CR,
then the derived hash function H is also CR!
10
Compression function design: The Davies-Meyer scheme
Employs PRF w/ key length m & block length n u define h: {0,1}n+m → {0,1}n as
h(x||k) = Fk(x) XOR x
Security
u hisCR,ifFisanidealcipher
11
Well known hash functions
u MD5 (designed in 1991)
u output 128 bits, collision resistance completely broken by researchers in 2004 u today (controlled) collisions can be found in less than a minute on a desktop PC
u SHA1 – the Secure Hash Algorithm (series of algorithms standardized by NIST) u output 160 bits, considered insecure for collision resistance
u broken in 2017 by researchers at CWI
u SHA2 (SHA-224, SHA-256, SHA-384, SHA-512)
u outputs 224, 256, 384, and 512 bits, respectively, no real security concerns yet u based on Merkle-Damgård + Davies-Meyer generic transforms
u SHA3(Kessac)
u completely new philosophy (sponge construction + unkeyed permutations)
12
SHA-2-512 overview
13
Current hash standards
14
15
6.7 Generic attacks
Generic attacks against cryptographic hashing
Assume a CR compression function h : {0,1}l’(n) → {0,1}l(n) u brute-force attack
u for each string x in the domain
u compute and record hash value h(x)
u if h(x) equals a previously recorded hash h(y) (i.e., x ≠ y but h(x)=h(y)), halt and output collision on x ≠ y
u birthday attack
u surprisingly, a more efficient generic attack exists!
16
Birthday paradox
“In any group of 23 people (or more), it is more likely (than not) that at least two individuals have their birthday on the same day”
u based on probabilistic analysis of a random “balls-into-bins” experiment:
“k balls are each, independently and randomly, thrown into one out of m bins”
u captures likelihood that event E = “two balls land into the same bin” occurs
u analysis shows: Pr[E] » 1 - e-k(k-1)/2m (1)
u ifPr[E]=1/2,Eq.(1)givesk»1.17m1⁄2 u thus,form=365,kisaround23(!)
u assuming a uniform birth distribution 17
Pr[E]
m=365
k
Birthday attack
Applies “birthday paradox” against cryptographic hashing u exploits the likelihood of finding collisions for hash function h
using a randomized search, rather than an exhausting search
uanalogy
u k balls: distinct messages chosen to hash u m bins: number of possible hash values u independent & random throwing
u how is this achieved?
u message selection, hash mapping
Mk ...M4 M 3
hash function h
M2
M1 bin1 bin2
...
18
binm
Probabilistic analysis
Experiment
u k balls are each, independently and randomly, thrown into one out of m bins Analysis
u the probability that the i-th ball lands in an empty bin is: 1 - (i - 1)/m
u the probability Fk that after k throws, no balls land in the same bin is:
Fk =(1-1/m)(1-2/m)(1-3/m)...(1-(k-1)/m)
u by the standard approximation 1 - x » e-x: Fk » e-(1/m + 2/m + 3/m + ... + (k-1)/m) = e-k(k-1)/2m
u thus, two balls land in same bin with probability Pr[E] = 1 - Fk = 1 - e-k(k-1)/2m u lower bound – Pr[E] increases if the bin-selection distribution is not uniform
19
What birthday attacks mean in practice...
u # hash evaluations for finding collisions on n-bit digests with probability p nm
u for large m = 2n, average # hash evaluations before finding the first collision is 1.25(m)1/2
20
Overall
Assume a CR function h producing hash values of size n
u brute-force attack
u evaluate h on 2n + 1 distinct inputs
u by the “pigeon hole” principle, at least 1 collision will be found
u birthday attack
u evaluate h on (much) fewer distinct inputs that hash to random values
u by “balls-into-bins” probabilistic analysis, at least 1 collision will likely be found u when hashing only half distinct inputs, it’s more likely to find a collision!
u thus, in order to get n-bit security, we (at least) need hash values of length 2n
21
22
6.8 Applications of hashing to cryptography
Hash functions enable efficient MAC design!
Back to problem of designing secure MAC for messages of arbitrary lengths
u so far, we have seen two solutions
u block-based “tagging” u based on PRFs
u inefficient
u CBC-MAC
u also based on PRFs u more efficient
r||1||δ||m1 r||2||δ||m1 r||d||δ||m1 Fk Fk ... Fk
t=r,t1, t2, td
23
[1] Hash-and-MAC: Design
Generic method for designing secure MAC for messages of arbitrary lengths u based on CR hashing and any fix-length secure MAC
m
m H H(m) Mac m t
u new MAC (Gen’, Mac’, Vrf’) as the name suggests u Gen’: instantiate H and Mack with key k
u Mac’: hash message m into h = H(m), output Mack-tag t on h
u Vrf’: canonical verification
Mac’ hash
h = H(m)
Mac
24
t
[1] Hash-and-MAC: Security
The Hash-and-MAC construction is a secure as long as u Hiscollisionresistant;and
u theunderlyingMACissecure
Intuition
u since H is CR:
authenticating digest H(m) is a good as authenticating m itself!
m
Mac’ hash
h = H(m)
Mac
25
t
[2] Hash-based MAC
u so far, MACs are based on block ciphers
u can we construct a MAC based on CR hashing?
26
[2] A naïve, insecure, approach
Set tag t as:
Mack(m) = H(k||m)
u intuition: given H(k||m) it should be infeasible to compute H(k||m’), m’ ≠ m
Insecure construction
u practical CR hash functions
employ the Merkle-Damgård design
u length-extension attack
u knowledge of H(m1) makes it feasible to compute H(m1||m2)
u by knowing the length of m1, one can learn internal state zB even without knowing m1!
27
[2] HMAC: Secure design
Set tag t as:
HMACk[m] = H[ (kÅopad) || H[ (kÅipad) || m ] ]
u intuition: instantiation of hash & sign paradigm u two layers of hashing H
u upper layer
u y = H( (k Å ipad) || m ) u y = H’(m), i.e., “hash”
u lower layer ut=H((kÅopad) ||y’)
u t = Mac’(kout, y’), i.e., “sign”
28
[2] HMAC: Security
If used with a secure hash function and according to specs, HMAC is secure u no practical attacks are known against HMAC
29
30
6.9 Applications to security
Recall: Weaker security notions
Given a hash function H: X ® Y, then we say that H is
u preimage resistant (or one-way)
u if given y Î Y, finding a value x Î X s.t. H(x) = y happens negligibly often
u 2-nd preimage resistant (or weak collision resistant)
u ifgivenauniformxÎX,findingavaluex’ÎX,s.t.x’1xandH(x’)=H(x)
happens negligibly often
u cf. collision resistant (or strong collision resistant)
u if finding two distinct values x’, x Î X, s.t. H(x’) = H(x) happens negligibly often 31
Generally: Message digests
Short secure description of data primarily used to detect changes
M
Hash function
Encrypted for
authe
o
n
e
way
“
n
ti
c
ity
compression”
Message digest
32
Application 1: Secure cloud storage
u Bob has files f1, f2,...,fn
u Bob sends to a cloud storage provider
u the hashes h(f1||r), h(f2||r),..., h(fn||r) u files f1, f2,...,fn
u Bob stores locally randomness r and keeps it secret
u Every time Bob reads a file fi, he also reads h(fi||r) and verifies the integrity of fi u Any problems with writes?
33
Application 2: Fairness (I)
Suppose Alice, Bob, Charlie are bidders in an online auction u Alice plans to bid A, Bob B and Charlie C
u they do not trust that bids will be secret
u nobody is willing to submit their bid
u solution
u Alice, Bob, Charlie submit hashes h(A), h(B), h(C) of their bids u all received hashes are posted online
u then parties’ bids A, B and C revealed
u analysis
u “hiding:” hashes do not reveal bids
u “binding:” cannot change bid after hash sent 34
(which property?) (which property?)
Application 2: Fairness (II)
A general issue with concealing private data via hashing
u due to the small search space, this protocol is not secure! u a forward search attack is possible
u e.g., Bob computes h(A) for the most likely bids A u how to prevent this?
u increase search space
u e.g., Alice computes h(A||R), where R is randomly chosen
u at the end, Alice must reveal A and R
u but before he chooses B, Bob cannot try all A and R combination
35
Application 2: Digital envelops
Commitment schemes u two operations
u commit(x, r) = C
u i.e., put message x into an envelop (using randomness r)
u e.g., commit(x, r) = h(x || r)
u hiding property: you cannot see through an (opaque) envelop
u open(C, m, r) = ACCEPT or REJECT
u i.e., open envelop (using r) to check that it has not been tampered with u e.g., open(C, m, r): check if h(x || r) =? C
u binding property: you cannot change the contents of a sealed envelop
36
Application 2: Security properties
Hiding: perfect opaqueness
u similar to indistinguishability; commitment reveals nothing about message
u adversary selects two messages x1, x2 which he gives to challenger
u challenger randomly selects bit b, computes (randomness and) commitment Ci of xi u challenger gives Cb to adversary, who wins if he can find bit b (better than guessing)
Binding: perfect sealing
u similar to unforgeability; cannot find a commitment “collision”
u adversary selects two distinct messages x1, x2 and two corresponding values r1, r2 u adversary wins if commit(x1, r1) = commit(x2, r2)
37
Example 2: Fair decision via coin flipping
Alice is to “call” the coin flip and Bob is to flip the coin
u to decide who will do the dishes...
u problem: Alice may change her mind, Bob may skew the result u protocol
u Alice "calls" the coin flip but only tells Bob a commitment to her call u Bob flips the coin and reports the result
u Alice reveals what she committed to
u Bob verifies that Alice's call matches her commitment
u If Alice's revelation matches the coin result Bob reported, Alice wins
u hiding: Bob does not get any advantage by seeing Alice commitment u binding: Alice cannot change her mind after the coin is flipped
38
Application 3: Forward-secure key rotation
Alice and Bob secretly communicate using symmetric encryption
u Eve intercepts their messages and later breaks into Bob’s machine to steal the shared key
Alice key k
s1 = k
Bob
key k
h s2
h s3
h ...sk
✗
h sk+1 leakage
39
key
Application 4: Hash values as file identifiers
Consider a cryptographic hash function H applied on a file F u the hash (or digest) H(M) of F serves as a unique identifier for F
u “uniqueness”
u if another file F’ has the same identifier, this contradicts the security of H
u thus
u the hash H(F) of F is like a fingerprint
u one can check whether two files are equal by comparing their digests
Many real-life applications employ this simple idea!
40
Examples
4.1 Virus fingerprinting
u When you perform a virus scan over your computer, the virus scanner application tries to identify and block or quarantine programs or files that contain viruses
u This search is primarily based on comparing the digest of your files against a database of the digests of already known viruses
u The same technique is used for confirming that is safe to download an application or open an email attachment
4.2 Peer-to-peer file sharing
u In distributed file-sharing applications (e.g., systems allowing users to contribute contents that are shared amongst each other), both shared files and participating peer nodes (e.g., their IP addresses) are uniquely mapped into identifiers in a hash range
u When a given file is added in the system it is consistently stored at peer nodes that are responsible to store files those digests fall in a certain sub-range
u When a user looks up a file, routing tables (storing values in the hash range) are used to eventually locate one of the machines storing the searched file
41
Example 4.3: Data deduplication
Goal: Elimination of duplicate data
u Consider a cloud provider, e.g., Gmail or Dropbox, storing data from numerous users.
u A vast majority of stored data are duplicates; e.g., think of how many users store the same email attachments, or a popular video...
u Huge cost savings result from deduplication: u a provider stores identical contents
possessed by different users once!
u this is completely transparent to end users!
Idea: Check redundancy via hashing
u Files can be reliably checked whether they are duplicates by comparing their digests.
u When a user is ready to upload a new file to the cloud, the file’s digest is first uploaded.
u The provider checks to find a possible duplicate, in which case a pointer to this file is added.
u Otherwise, the file is being uploaded literally
u This approach saves both storage and bandwidth!
42
Example 4.4: Password hashing
Goal: User authentication Problem: How to protect password files
u Today, passwords are the dominant means for u user authentication, i.e., the process of
verifying the identity of a user (requesting
access to some computing resource).
If password are stored at the server in the clear, an attacker can steal the password file after breaking into the authentication server – this type of attack happens routinely nowadays...
Password hashing involved having the server storing the hashes of the users passwords.
Thus, even if a password file leaks to an attacker, the onewayness of the used hash function can guarantee some protections against user- impersonation simply by providing the stolen password for a victim user.
u This is a “something you know” type of user authentication, assuming that only the legitimate user knows the correct password.
u When you provide your password to a computer system (e.g., to a server through a web interface), the system checks if your submitted password matches the password that was initially stored in the system at setup.
u u
43
Example 4.4: Password storage
Plaintext
Concealed via hashing
44
Application 5: Hash-and-digitally-sign (looking ahead)
Very often digital signatures are used with hash functions
u the hash of a message is signed, instead of the message itself
Signing message M
u let h be a cryptographic hash function, assume RSA setting (n, d, e) u compute signature σ = h(M)d mod n
u send σ, M
Verifying signature σ
u use public key (e,n)
u computeΗ=σe modn
u if H = h(M) output ACCEPT, else output REJECΤ
45
46
6.10 Database-as-a- service authentication model
Securing your cloud storage optimally – setup
server
user
files
2. upload files
3. send h & hashing scheme
d files
1. pre-process files using CR hash function h
files = F = (F1, F2, ..., F7, F8)
digest d is computed over all files |d| << |F|
47
Securing your cloud storage optimally – data authentication
1. give me file F1 2. here it is, F1’
goals
security: Step 4 is reliable test efficiency:
u disassmallaspossible
u proofisassmallaspossible
u verificationisasfastaspossible
server
files +
3. “proof”
d
4. verification
F = (F1, F2, ..., F7, F8)(or helper information)
user has
u authenticdigestd(locallystored) u fileF1’(tobechecked)
u proof(tohelpcheckingintegrity)
“is F1’ intact?” verification involves
user
48
u combinefileF1’withtheproofto re-compute candidate digest d’
u checkifd’=d
u ifyes,thenF1isintact;otherwise tampering is detected!
Hashing the files: Individually or as a whole?
server
user
files
F = (F1, F2, ..., Fn)
1. give me file F1
2. here it is, F1’
+
3. “proof”
(or helper information)
d
4. verification
goals
security: Step 4 is reliable test efficiency:
u disassmallaspossible
u proof is as small as possible
“isF1’intact?”u verificationisasfastaspossible lethi =h(Fi),1≤i≤n
d = (h1, h2,..., hn) Vs. d = h(h1||h2||...||hn) Vs. d = h(F1||F2||...||Fn)
49
Hashing the files: In a chain or in a partition?
1. give me file F1
2. here it is, F1’
goals
security: Step 4 is reliable test efficiency:
u disassmallaspossible
u proof is as small as possible
server
user
files
+
3. “proof”
(or helper information)
d
4. verification
“isF1’intact?”u verificationisasfastaspossible lethi =h(Fi),1≤i≤n
F = (F1, F2, ..., Fn)
d=hn,hi =h(Fi||hi−1),12:
u hashing k files in a hash chain is preferable to hashing files as a whole
u hash chains over k objects (hash values/files) result in unbalanced verification costs
u e.g., proof size/verification time are sensitive to a file’s position in the hash chain u hashing file subsets individually across a balanced partition of the files:
u offers trade-offs between space and verification costs u allows for hierarchical hashing schemes
51
Hashing via the Merkle tree
server
user
files
2. upload files
3. send h & hashing scheme
d
files = F = (F1, F2, …, F7, F8)
digest d is computed over all files |d| << |F|
h: collision resistant iterative hash function (e.g., SHA-2)
1. pre-process files using CR hash function h
F1 F2
dd
... F7 F8 F1 F2 ... F7 F8
52
The Merkle tree
files
1. give me file F1
2. here it is, F1’
server
user
+
3. “proof”
(or helper information)
d
4. verification
F = (F1, F2, ..., F8)
h(h12 ||h34 )= h14 h h58 =h(h56 ||h78 )
h(h ||h )= h h56 h
1 2 12 34 78
d=h(h14 ||h48 )
h1 h2 ...
h7 h8 lethi =h(Fi),1≤i≤n
53
Generalization: DB-as-a-service authentication model
source
server
query answer
user “is answer correct?”
DB DB
cloud-based DB portals, hubs
54
Generalization: DB-as-a-service authentication model
source
server
query answer
user “is answer correct?”
DB DB
Internet protocols (DNS, OCSP)
55
Generalization: DB-as-a-service authentication model
source
digest
server
user
query
answer DB
file hosting
56
DB-as-a-service authentication model
source server DB DB
query answer
user “is answer correct?”
Integrity checks offer authenticated queries
guarantees correct answer, as if coming directly from the source!
57
DB-as-a-service authentication model
malicious
source
DB DB
query
answer
+ proof
user “is answer correct?”
verification
server
Integrity checks offer authenticated queries
u crypto-based: harden data/computations to provide verifiable answers
u reliable: allow no false positives or negatives
crypto does its work!
58
DB-as-a-service authentication model
malicious
source
DB DB
query
answer
+ proof
user “is answer correct?”
verification
server
Integrity checks offer authenticated queries
u crypto-based: harden data/computations to provide verifiable answers
u reliable: allow no false positives or negatives
u utility-preserving: practical & easy to adopt minimize all overheads
u fast times for query, verification
u near total storage, low bandwidth usage for proof 59
DB-as-a-service authentication model
source DB
update
malicious
server DB
query
user “is answer correct?”
answer
+ proof
verification
Integrity checks offer authenticated queries
u crypto-based: harden data/computations to provide verifiable answers u reliable: allow no false positives or negatives
u utility-preserving: practical & easy to adopt
u fast times for query, verification , update
u near total storage, low bandwidth usage for proof , update 60
Intro to authenticated querying (AQ101)
δ
server
DB
query answer
user
source
DB
compute compact & secure digest δ of DB
61
Intro to authenticated querying (AQ101)
δ
query
user
answer
+ proof
source
DB
compute compact & secure digest δ of DB
DB
compute proof that links answer to digest
server
62
Intro to authenticated querying (AQ101)
Assumption: user possesses authentic δ
verification
use proof to securely link answer to authentic digest
δ
query answer
DB + compute proof that proof
links answer to digest
source
server
user
DB
compute compact & secure digest δ of DB
63
Intro to authenticated querying (AQ101)
Assumption: user possesses authentic δ easily removed using
PKI, digital signatures
verification
use proof to securely link answer to authentic digest
δ
query answer
DB + compute proof that proof
links answer to digest
source
server
user
DB
compute compact & secure digest δ of DB
64
Intro to authenticated querying (AQ101)
δ
query answer
DB + compute proof that proof
links answer to digest
source
server
DB
compute compact & secure digest δ of DB
(given δ)
verification
use proof to securely link answer to authentic digest
Using digital signatures to provide authentic digest δ
65
user
Example: Set membership – digest computation
δ
server
is 3 in DB?
yes
source
{ 2, 3, 6, 7} =
DB
DB δ=h(h12 ||h34 )
h34=h(6||7)
compute compact & secure digest δ of DB
Merkletreehash
h(2||3)= h12
h: collision resistant terative hash function (e.g., SHA-2)
user
2367
66
Example: Set membership – proof computation
δ
is 3 in DB?
yes
+ proof
source
server
user
{2,3,6,7}= DB
compute compact & secure digest δ of DB
Merkle tree hash
h12
DB
δ
h34
compute proof that links answer to digest
2367
67
Example: Set membership – proof verification
δ
is 3 in DB? yes
DB +
compute proof that links answer to digest
h34
verification
(given δ)
use proof to securely link answer to authentic digest
i.e.:
recompute h12 recompute δ
compare against given δ
source
server
user
{2,3,6,7}= DB
compute compact & secure digest δ of DB
Merkle tree hash
h12
proof
δ
2367
68