CS代考 0368.4162: Introduction to Cryptography Ran Canetti

0368.4162: Introduction to Cryptography Ran Canetti
17 November 2008
Fall 2008 Scribes: Meir- &
1. Existence of weak one-way functions imply existence of strong ones

Copyright By PowCoder代写 加微信 powcoder

2. Stream Ciphers
3. Pseudo-Random-Generators (PRG) preliminaries
Preliminaries
Definition 1: (Uniform Binary Distribution)
Let Un denote the uniform binary distribution over the {0,1}n domain.
Definition 2: (Negligible Function)
afunction:NRissaidtobenegligibleif cNthereexistsan n0 suchthatwe have  (n)  nc for all n  n0
Definition 3: (Efficient Algorithm)
A non-uniform polynomial-time in its input length (restricted to language recognition, these algorithms correspond to the P/poly complexity class which is known to contain BPP).
Also recall from the previous lecture:
Definition 4: (One-Way Function)
f :{0,1}* {0,1}* is a (strong) one-way function if:
f is efficiently computable.
f is not invertible by any efficient algorithm on “almost” all the range of f. formally, for all non-uniform polynomial algorithms A, there exists a negligible function  such that for any input length n:
Pr [f(A(1n,f(x)) f(x)](n) xUn
That is, the probability to invert f is negligible.
Defintion 5: (Weak One-Way Function)
f :{0,1}* {0,1}* is a weak one-way function if:
f is computable in polynomial time.
f is not invertible by a polynomial algorithm on at least a non-negligible part of the range, or formally: There exists a constant c  0 such that for all efficient
algorithms A:
That is, it is easy to invert f on almost all of its range but a negligible part.
n c PrxUn[f(A(1,f(x)) f(x)]1n

Strong Weak
Image 1: (Strong) one-way function vs. weak one-way function.
The clear part represents easy to invert images while the filled part represents hard ones.
Hardness Amplification
Generally it is not known whether one-way functions even exist, however we will now show a construction of a strong one-way function given a weak one-way function. This result is called hardness amplification – intuitively given such a weak one-way function f , we shall apply it enough times such that at least one of its inputs will be hard
to invert.
Theorem 1: (Existence of weak one-way function, implies existence of strong ones)
Let f :{0,1}n {0,1}n be a weak one-way function, then there exists a t polynomial in the input length n, such that f ‘:{{0,1}n}t {{0,1}n}t as defined below is a strong one-
way function:
Notice that each x {0,1}n
f ‘(x1, x2 ,..xt )  ( f (x1), f (x2 )…, f (xt )) 
If we can show that for almost all x  (x1, x2 ,..xn ) , at least one xj belongs to the “hard” part of f , then we can conclude that f ‘ is strongly one-way.
Notice that if we take t  nc we can expect to have one non-invertible x
the proof however, we use t  nc1 in order to increase the probability of this event.
(for some j), in The above discussion is good for intuition, but does not provide a proof as, a theoretic
inverting algorithm A need not necessarily try to invert each f (xj ) independently on its own, but may try to exploit the dependencies between the different values of the f (xj )’s.
(recall the discussion about this “wrong proof” from the previous lecture, and notice that in general we should not assume anything about the manner in which an efficient algorithm works, unless we know nothing is lost by such an assumption) Instead we will proceed to prove by reducing the hardness of f ‘ to that of f (a typical proof structure in
cryptography).

Proof (Theorem 1)
We will prove by contradiction by assuming f ‘ is not a strong one-way function.
Contradicting the strong “one-wayness” of f ‘, let A’ be a poly-time algorithm that
inverts f ‘ with probability  ncˆ for some cˆ  0 .
Given any c  0 , we will use A ‘ to construct an efficient algorithm A that inverts f with
probability1nc , thereby contradicting f being a weak one-way function (where c is the constant taken from the hardness definition of a weak one-way functions).
Define the following procedure I0 as a single trial for inverting y: I0
Input: 1n, y (an output of f we wish to invert) Algorithm:
For j1..t
1. Choose x1..xt Un
2. Compute (z1,..zt )  A'[ f (x1), f (x2 )…, f (xj1), y, f (xj1)…, f (xt )]
3. If y  f (zj ) output zj and halt
To improve our chances of inverting f , we will run I0 multiple times. That is, define the
algorithm A to run I0 nd times with its input y for some d (whose value depends on c and cˆ , to be defined later on), outputting the first result that matches the criterion in
step 3. If after nd no appropriate zj is found, output FAIL.
Remark: we could have defined I0 to pick j uniformly at random instead of incrementing,
however the current definition simplifies the analysis.
We now wish to show that A succeeds almost always, even though A’ may rarely succeed.GiventhisdefineG{0,1}n -informally,thesetofgoodinputsxwhichare easy to invert by I0 – as follows:
G{x|Pr [I invertsf(x)]n 1} I0 0 nd
We would like to show that G upholds two properties:
1. For each xG , A inverts f (x) with a very probability 1en
Claim 1: xG: PrA[A inverts f (x)]1en
Note: here the subscript A refers to the random coins A uses.
2. Almost all input in the domain{0,1}n are in G
Claim 2: | G |  1  1 2n 2nc

Recall: (Marginalization of Conditional Probabilities)
For any event A   P(A)P(A,B)P(A,B)P(A|B)P(B)P(A|B)P(B) (3.1)
The proof of Theorem 1 will follow from these two claims as: PrA,x[A fails to invert f (x)]
=Pr[A fails to invert f (x)| xG]Pr[xG] +Pr[A fails to invert f (x)| xG]Pr[xG]
en 1 nc 2nc
Where the first equality is due to (3.1), and the second inequality is due to claim 1 applied to Pr[FA,x | xG] and claim 2 applied to Pr[xG], and then upper-bounding the
other terms by 1.
This contradicts f being a weak one-way function, as it is easy to invert on almost all of its range aside of a less than negligible part (For example, consider a function that is hard
to invert on only a 2n part of the range. It clearly doesn’t meet the weak one-way function definition. The existence of a polynomial fraction of hard to invert points is necessary for the proof to go through. We do not know of a hardness amplification that can start with a negligible fraction of hard points).
We are now left with proving Claim 1 and Claim 2: Proof of Claim 1: (xG: PrA[A inverts f (x)]1en )
The adversary/algorithm A repeats I0 nd times. The attempts are independent (since x is fixed, so each iteration depends only on the random coins of A), each having success
probability n / nd at least, and G is defined for a single iteration of I0. Therefore: Pr [A fails inverting f (x)on all nd iterations|xG] (1 n )nd  1
And therefore,
Pr [Ainvertsf(x)|xG]1 1 A en
Proof of Claim 2: (G is large or | G | 2n  2n / 2nc ),
Let S be the event that A ‘ inverts f ‘( x1 , …, xt ) and we partition S to two disjoint events:
1. S1  S (j:xj G)- that is, A’ inverted although one of the f (xj )’s is hard to
2. S2  S  (j : xj G) – all f (xj ) ’s are (relatively) easy to invert.

Pr[S]Pr[S1]Pr[S2] (3.2)
Recalling our hypothesis, that f ‘ is not a strong one-way function, and that A’ in
particular is the algorithm that successfully shows this, we know that: Pr[S](tn)cˆ nccˆ2cˆ (t=nc+1)
Intuitively, S1 is unlikely, as inverting an xG is unlikely, so event S2 must be likely, but for that to happen, G must be large. More precisely we’ll show:
Lemma: Pr[S1]n(dc2) Which will imply that S2 is very large.
Deferring the proof of the lemma we obtain from it and (3.2) that: Pr[S2]nccˆ2cˆ n(dc2) ,
We now finally choose d  c  cˆ  2cˆ  c  3 to obtain (recall d controls the number of iterations)
Pr[S ]ncˆc2cˆ ncˆc2cˆ1  1 (3.3) 2 2 n cˆ  c  2 cˆ
However, if we assume Claim 2 is wrong (formally:Pr [xG]11/2nc ) we obtain: x
Pr[S2]Pr[S(j:xj G)] Pr[(j:xj G)]
 (11/ 2nc )nc1 (1/e)n 1/poly(n)
Contradicting (3.3).
We now move on to prove the lemma, which will complete the proof of claim 2.
Proof of Lemma: (Pr[S1] n(dc2) ) First notice that for allaG, i[t]:
Pr [A’ inverts f ‘(x)| x  a] Pr [I inverts f (a)] (3.4)
A,x iI00 nd
Where the first inequality holds since in I0 iterates over all j[t], and the second inequality follows from the definition of G.

Prx[S1]Pr[S(j:xj G)] t
(1) (2) (3) (4)
Pr[Sx a]
j1 aG =Pr[S | x
 a]Pr[x  a] jj
maxPr[S|x a]
 n(d c2)
(1) is due to union bound (2) is due to conditional probability, (3) is justified by noticing that we are substituting a weighted average, by a maximum function and finally, (4) is by (3.4).
Thus we have completed the proof of the lemma and Claim 2 in turn.
The theorem shows that the existence of weak one-way functions is equivalent to that of strong one-way functions.
Although we have shown how to construct a strong one-way function f ‘ from a weak
one, we comment that this construction is not very good, in the sense that we would expect more interference in the computation derived from the inputs, while we have not done so.
Moreover, the result we have obtained seems weak if we compare the computational
strength of adversaries that can invert f to those that can invert f ‘ – our construction
yields a strong one-way function which is (sadly) only secure against far weaker adversaries, compared to the weak one-way function we used to construct it. More formally, if f is secure against adversaries that run in q(n)-time, A’ is secure against
adversaries even weaker than q(n)nd .
To get an intuition about this, consider the fact that we have assumed the existence of a “black-box” algorithm A’ which efficiently inverts a strong one-way function with some
running time (say, ( p(n)) ), only to obtain a new algorithm A whose running time is far
worse in its input length n, compared to A ‘ . Explicitly, if indeed A ‘ runs in time ( p(n))
in its input length, Io runs in time (t  p(tn)) , and therefore A runs in (nd  t  p(tn))
This may stem from the treatment of A’ as a black box rather than actually inspecting it closely and using the insight that allows it to invert f ‘ efficiently.

One Time Pad and Stream Ciphers
We now move on to define and later on construct another cryptographic primitive, namely the Pseudo-Random Generators. We start with a motivating discussion:
Consider the following problem: Alice (denote A) wishes to send a message to Bob (denote B) over an unsecured channel. For the purpose of secrecy, A wishes to encrypt the message before sending to B and we assume that A and B can agree on a mutual secret key beforehand.
Assuming they don’t care about the secret key’s length, there is a perfect solution for this
problem called the One-Time-Pad (OTP) encryption:
Suppose A and B share a long secret s of random bits that are not available to
eavesdroppers, then secure communication is easy – Let A’s message m be composed of
n bits m(m,…m ), A obtains n bits r(r,…r) from the “One Time Pad” s and 1n 1n
proceeds to compute the ciphertext c as follows (where  denotes the bitwise-XOR operator):
c(c,…c)(mr,m,…m r) 1n112nn
To decrypt the message, B simply uses the agreed upon r to obtain m  c  r . If indeed s
is statistically random then so is c (independently of m!) and an eavesdropper E will be unable to obtain even a single bit of m regardless of its computational power. Thus, it is easy to see that the OTP is “perfectly secret”, a term we haven’t formalized yet, but never the less it is clear that as long as the bits are random and unknown the produced ciphertext c is completely independent from the plaintext m.
In other words, for any two distinct messages m,m’{0,1}n we have that the random
variables c  m  r and c ‘  m ‘ r are distributed identically.
Notice that the used r bits should be discarded since if E obtains m  r and m ‘ r she can compute m  m ‘ which is not uniform and therefore reveals information about m and m’.
Although the OTP provides perfect secrecy, it is not used in practice as generating truly random bits is a hard task, but mostly, because its users need to predict the length of all of their future communication and agree on a with at least the same length.
To overcome this issue, A and B will surely need to use a short secret key, which as mentioned on the first lesson, will not provide them with perfect secrecy (Shannon). Cryptography suggests using Pseudo-Random Generators (PRG) – functions that work on a seed (key) of length k and produce a “random-looking” string of length n>k.

output: G(x)=r=
x1 x2 … xk
Image 2: The Pseudo-Random Generator G receives a seed x of length k and produces a random string of length n>k
Using this output we can maintain the same overall structure of the encryption scheme – generating a long “random looking” pad to be xored with the message. The term “stream cipher” refers to this structure and a PRG is an abstraction which specifies the requirements from a stream cipher.
Example: LFSR (Linear Feedback Shift Register)
Consider the following deterministic algorithm G which takes an input seed of size k and outputs a long string of “random-looking” bits
Given an initial seed of size n bits, the LFSR generates a stream of bits that appear to be random. But it is impossible to create an endless stream of random data from a limited amount of information.
The construction of LFSR is as follows:
Let S  (sk ,sk1,…s1) be our initial key of size k bits
Let B  (b ,b ,…b ) be our feed back register of size k bit.
 Let I [n] of indexes I {i1,i2,…it}
Image 3: an LFSR circuit for generating a “random looking” string with parameters k  7, I  {3, 7}
Algorithm (Sketch):
Initially assign B the value of the secret S:
To produce a new “random looking” bit b, we XOR the bits at the indices indicated in I
to produce a new bit b as follows:
b   Bi iI

On each iteration, the entire register shifts one bit to the right and the bit at position 1 is extract as the random bit. The new bit b is inserted as the left-most bit.
In our example above (Image 2), b depends on the values of b3, b7 and b1, and indeed different LFSR implementations differ in the seed/register length k and the content of the index set I.
Clearly LFSR does not produce truly random bits as, the cipher will start to repeat itself within less then 2k steps (the maximum number of possible configuration of B). It is also not clear if all 2k values will be exhausted, and it might be the case that the LFSR will get “stuck” in a shorter period. Research as shown that it is possible to ensure a long period
by having the index set I encode an irreducible polynomial over GF (2k ) .
In itself this scheme is not effective as given k consecutive bits of its output, it is possible to know the internal state of the LFSR and thus predict the following bits (in particular, the first k bits are the key). To circumvent this problem, researchers suggest to add an additional (usually non-linear) operation on the final stage before outputting.
One such approach for strengthening LFSR for usage as a PRG is the Shrinking Generator [Coppersmith, Krawczyk, Mansour 93]:
Image 4: a Shrinking Generator; in the final step, the first bit of the upper register will be used as a random bit only when the lower register’s first bit is 1.
Given a seed (key) S split its bits between the two LFSR’s L1 and L2. Run both L1 and L2
at the lockstep and output L1‘s bit b1 as the stream cipher if and only if L2’s output bit b12
is 1, otherwise discard the bit. This breaks the linearity of the generator making it less predictable. In fact, there are no known weaknesses for this cipher.
Another well-known stream cipher is RC4, designed by and widely used in practice (e.g., WEP – an algorithm for securing wireless networks).

Computational Indistinguishability
The former discussion was strictly intuitive, as we haven’t defined what a “random looking” string is, and in what sense do we obtain secrecy. We will now proceed to formalize our discussion and introduce several tools and definitions that will aid us define these notions of randomness and secrecy.
Definition: (Statistical Distance I)
Let D , D be two distributions over the same domain D and let f : D  {0,1} then the 12
statistical distance  between D , D with respect to the function f is defined as: f12
 (D,D)|Pr [f(d)1]Pr [f(d)1]| f12 dD1 dD2
D , D are called   statistically  close if max [ (D , D )]   12 f12
Alternatively, we write SD(D , D )   . 12
An alternative, but equivalent definition for statistical difference is the following:
Definition: (Statistical Distance II)
Let D , D be two distributions over the same domain D and let  be their common 12
support then the statistical distance  between D , D is defined as: 12
(D,D)1|PrdD[d]PrdD [d]| 122 1 2
Formulation II is the common definition of statistical distance between two distributions, while formulation I is used to motivate the notion of computational Indistinguishability. Indeed, using formulation I it is straight forward to define the computational analogue, given our experience with one-way functions:
Definition: (Computational Distance)
Given the above definitions and an algorithm A then the computational-difference  between D , D with respect to A is defined as:
 (D,D)1|Pr [A(d)1]Pr [A(d)1]| A 1 2 2 dD1 dD2
D , D are called  (t )  computationally  close if 1 2
A runs in time t
 ( D , D )   (t ) A 1 2

Moving on to deal with asymptotic “closeness” of distribution (as usual, for simplicity, flexibility and generality), we shall start with the notion of probability ensembles:
Definition: (Probability Ensemble)
A probability ensemble is an infinite sequence of distributions ˆ
D {Dn}n where each Dn is a probability distribution.
We will usually think of Dn as describing an experiment with security parameter dependent on n.
Now we can define the statistical and computational distance in asymptotic terms:
Definition: (Statistical Distance for Distribution Ensemble)
ˆˆ ˆˆ Two distribution ensembles D1,D2 are statistically-close, denoted D1  D2
SD(D1 ,D2 )  max[ (D1 ,D2 )](n) nnffnn
for a negligible function  .
ˆ1 ˆ2 Similarly to the above definitions, two distribution ensembles D , D are
computationally-close, denoted D1  D2 if for any non-uniform poly-time algorithm A
(also known as a distinguisher), the distance
 (D1 ,D2 )(n)
for a negligible function  . Technical remarks:
1. This is weaker than saying that for any n (or even for any large enough n) the distributions D1 , D 2 are  (t ) -close where  is a negligible function.
2. A should be given 1n as input in addition to the samples from D1 and D2 nn
Finally, lets us define what it means for a distribution (ensemble) to be a Pseudo-Random
Definition: (Pseudo-Random)
An ensemble D is Pseudo-Random if (D,U) are computationally close, where
U  {U } and for all n U is the uniform distribution {0,1}n .
Definition: (Computational Distance for Distribution Ensemble)
concept of computational closeness is a “deep” concept, it is a basis for a new statistics notion – whatever cannot be done by an efficient computation effectively, cannot be done at all. It trades off two seemingly unrelated concepts: Computational hardness and Randomness. This mixing of the two concepts underlies most of cryptography and in fact, it has also trickled into many areas in computer science.

References
1. Computational Complexity: A Conceptual Perspective by Chapter 7 / Section 7.1.2 – Amplification of Weak One-Way Functions.
2. Computational Complexity: A Modern Approach by and ; Chapter 10 – Cryptography;
(Draft: http://www.cs.princeton.edu/theory/complexity/)

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com