A primer on cryptography for cryptocurrencies
By Joseph protocols, One-way functions, and PRFs
In-depth technical reading:
● GCAC 18.6 (challenge-response protocols)
Copyright By PowCoder代写 加微信 powcoder
● GCAC 4.4 (PRFs)
Consider one of the most essential security mechanisms: a door lock. Mechanical locks are a fascinating area with a long history, but we’ll consider a contactless electronic lock for now, as widely installed at NYU and many other organizations.
Your NYU ID card contains an RFID transponder that receives a signal (and electric power) from the reader (the door) and automatically responds. The door should then open.
The simplest design is for the card to simply transmit its ID to the door (we’ll use my NYU ID: jb6395).
ID: jb6395
Card ——————————————-> Door
This is obviously not secure though, because anybody who knows my ID (which is on my website) can make a fake card that transmits it and can open any door my card can.
A better design is to have my card transmit a secret value. Call it k (for key): ID: jb6395, key: k
Card ——————————————-> Door
The key k should be unique for each user, otherwise any card can fairly easily impersonate any other card. This may seem obvious but it’s been screwed up in practice: The same key was used by Volkswagen for every car manufactured for nearly 20 years from the mid-90s to mid-2010s.
The door will have to store a large table mapping each user to their key k. Later we’ll see how (using asymmetric cryptography) we could build such a protocol with only constant storage at the door regardless of the number of users.
This protocol is slightly better because my key is not on my website, but it is still subject to a replay attack. Any attacker who records the data my card sends to the door can replay it exactly to open the door themselves. Since the data is transmitted via radio waves, it is not hard to capture.
Preventing replay attacks with challenge-response protocols
The simple protocols above are static protocols in that the same information is always transmitted by the card. Any static protocol will be vulnerable to replay attacks.
To prevent replay attacks, we’ll need a dynamic protocol where the card doesn’t always send the same information to the door. The most common approach is a challenge-response protocol. The door will now send the card a one-time challenge value. This number is called a nonce (number used once), which we’ll call n:
ID: jb6395
Card ——————————————-> Door challenge: n <------------------------------------------- response: F(k, n) ------------------------------------------->
The card will respond to this challenge by computing some function F of the nonce and its secret key. We should still assume that an eavesdropper has recorded the whole protocol transcript (potentially many times). What properties does F need to have to prevent this eavesdropper from later opening the door?
1. Given observations of F(k, n) and n for many values of n, it should not be feasible to compute k.
2. Given observations of F(k, n) for many values of n, it should not be feasible to compute F(k, n*) for an unobserved nonce n*
If our function F satisfies these two properties, then the protocol will be secure as long as the door never sends the same challenge n. This can be achieved either by choosing n randomly from a sufficiently large space (say, 128-bit values) that it will never repeat, or using a counter value for n that is always increasing. Either choice has a trade-off: the random approach requires larger challenges and a secure source of randomness,
while the counter approach requires permanent storage at the door. Both approaches are common in practice.
An alternate design: time-varying codes
Notice that our challenge-response protocol above has three steps instead of one. This might be reduced to two, but at least two are needed as the random challenge must be sent from the door to the card before the card’s response can be sent. It’s possible to design a one-step dynamic protocol if the card uses the current time t as the challenge:
ID: jb6395, code: F(k, t)
Card ——————————————-> Door
This approach is sometimes called a rolling code protocol. It’s not common for use with RFID cards like those used at NYU since they don’t have their own source of power and hence cannot maintain a clock. Even if they could, you have to worry about synchronization, ensuring that both sides have the same clock value (or at least don’t drift too far apart). But you probably have used this approach before. It’s common in 2-factor authentication apps like Google Authenticator or Duo:
This approach makes more sense in this setting: mobile devices have an accurate clock and it’s simpler for users to only type one code (rather than having to enter a challenge as well). Of course, synchronization is not perfect. These apps typically round the current time to the nearest 30 seconds and allow for some delay for users to type. The exact details commonly used for second-factor authentication apps are standardized in the TOTP protocol (Time-based One-time Password protocol).
It’s also possible to build a similar protocol without relying on a synchronized clock, but storing a counter which is incremented after every successful authentication. This requires non-volatile storage, but no running clock. Duo’s app implements this protocol as well (which is the version NYU currently uses). The downside is, again, synchronization: what happens if the card attempts to authenticate, causing the counter to be incremented, but the door never hears it due to radio interference? To provide
some resiliency, usually the authentication will allow a range of values (a sliding window) to be used in case some were lost.
One-way functions
If you think about what properties the function F needs for a rolling code protocol to be secure, you should be able to convince yourself it’s exactly the same set of properties needed for a challenge-response protocol! This is a good situation in cryptography (and computer science in general): multiple possible protocols are possible using a common shared function. Cryptographers usually start with desired properties for a function like those outlined above and then try to define a mathematical function that meets them. This is quite like starting with security policy then designing mechanism to realize it. These functions are often called primitives in cryptography: the basic building blocks needed for a variety of secure protocols.
Cryptographic primitives are usually defined by an interactive security game between a challenger and an adversary. For example, let’s start with the first property above, that an adversary cannot learn the key after seeing many outputs of the function computed using that key. This is the basic requirement for a cryptographic one-way function, which can be formalized with the following game:
1. The challenger randomly chooses k.
2. The adversary repeatedly chooses a nonce n and sends it to the challenger, who
responds with F(k, n).
3. The adversary outputs a guess at the key k’
4. If k’=k, the adversary wins.
There are a lot of subtleties here. First, how many queries can the adversary make? How much computation can they perform? Typically we assume an adversary that is computationally bounded. More specifically, we usually require the adversary to run in probabilistic polynomial time (PPT). Polynomial in what exactly? The length of k. This works quite naturally: if we want higher security, choose a larger key size k. But the adversary does not get to run in exponential time. If they did, they could check every possible value of k. This is called a brute-force attack or an exhaustive search. If k is big enough (say, a 128-bit random number) this will be impossible in practice.
Second, is the adversary adaptive? That is, can they choose nonces to query based on the answers to earlier queries? Typically we allow an adaptive adversary.
Third, won’t the adversary always have some chance of guessing the correct key by random chance? Yes, they will. Since we can’t design a function F such that the attacker can never win the game, we instead will require that the adversary has a negligible chance of winning. That is, the probability of winning is a negligible function of the length of the key. Technically speaking, a negligible function is smaller than the reciprocal of any polynomial function. It’s easy to see that the chances of guessing correctly are exponentially small in the length of the key, which is a negligible function. Practically speaking, this means that if we choose a suitably large key it will be impossible in practice to guess correctly at random.
Let’s stop here to note that these properties might seem like overkill for a door-opening protocol. Cryptographers like to build very strong cryptographic primitives so that they can be used as building blocks in higher-level protocols of various types without worry that the crypto will be vulnerable.
Back to the door-though, a one-way function is not enough! Even though it will prevent an adversary from guessing the key, the adversary doesn’t need to guess the key to open the door. They only need to guess a correct response to the door’s challenge.
Pseudorandom functions (PRFs)
Let’s go back to the second property: the adversary shouldn’t be able to guess the output of the function. This requires a stronger property than the above. We’ll define a new game for it:
1. The challenger generates a random bit, b. If b is 0, they generate a completely random function FR. If b is 1, they generate a random key k.
2. The adversary repeatedly chooses a nonce n and sends it to the challenger. The challenger responds with FR(n) if b=0 and F(k, n) if b = 1.
3. The adversary outputs a guess b’.
4. If b is equal to b’, the adversary wins.
In this game the adversary picks a random bit b to decide if the adversary sees real or random output. We’ll still assume the adversary is adaptive and PPT. But now they can win the game half the time by random guessing, so we’ll need to change our security definition to say that F is a PRF if no PPT adversary can win the game with probability greater than 0.5 + neg(k) for a negligible function.
Random oracles and ideal functionalities
We’ll talk more about how to actually build a PRF later. For now, just know that we can build PRFs which are suitable for a secure challenge-response protocol.
The name “pseudorandom function” is telling. We want a function that is “almost” a random function. Why not just use a random function? We could design an explicitly random function by generating the function table (the mapping from every input to every output) completely at random. But, storing this function table would require a huge amount of memory. For example, if a random function took 128-bit inputs and produced 128-bit outputs, it would require 2135 bits to store the function table (2128 rows of 27 bits).
It’s impossible in practice to actually construct a random function in that manner. As a tool for constructing proofs though, it is convenient to assume a magic random oracle exists which is capable of generating and storing the entire random function table and responding to queries for us. It is an oracle because we can only query it, not read out its internal memory. Random oracles don’t exist, but they can be convenient tools for modeling the ideal way cryptographic functions should behave. For example, it is relatively straightforward to prove that if the function F in our challenge-response protocol above is a random oracle, then the protocol is secure.
A related idea, in some sense a generalization, is the ideal functionality model, in which we write down whatever properties we want for a cryptographic function, then simply assume it exists in oracle form and prove that some protocol using it is secure.
Authenticated storage, Hash functions and MACs
In-depth technical reading
● GCAC 8 (Hash functions)
One more essential primitive to introduce is the cryptographic hash function. It might be the most useful primitive in all of cryptography. We’ll start with a simple application to motivate the development of a hash function.
Consider a mobile device that is uploading a large file to a cloud server:
Client ———————————-> Server ….
downloaded file f’ <----------------------------------
The client doesn’t care about keeping the file secret, but wants to be sure that later when it downloads the file, it receives the same file f’=f. Obviously the client could just store the entire original file, but then why use a cloud server?
Instead, we’ll have the client store the hash of the file H(f), and then later check if H(f)=H(f’). Hash functions are similar to PRFs in that they behave somewhat randomly. Hash functions only take one input, not two and that input can be infinitely long. That’s right: hash functions are defined as a function H: {0,1}* → {0,1}t, they take any number of bits as input and then produce a t-bit output, sometimes called a digest.
Our file-storage protocol will be secure as long as the server can’t find any other file with the same hash. This property is called second-preimage-resistance and is defined with the following game:
1. The challenger randomly chooses x and sends it to the adversary.
2. The adversary generates a value x’.
3. The adversary wins if H(x)=H(x’)
You should be able to convince yourself that if this property is true, the above file storage protocol is secure. This approach is widely used in practice. For example, official releases of software packages like Ubuntu Linux often publish the hash of the authentic version of the software. If you obtain the hash of the desired file securely, you
can then download the (often quite large) file from any untrusted service like BitTorrent and be sure you’ve obtained the correct version by checking its hash.
Cryptographic hash functions are also designed to be one-way, but the definition is quite similar to that for one-way functions as above so we’ll skip it here.
Hash functions and collision-resistance
Cryptographic hash functions are actually designed to ensure an even stronger property, collision-resistance, defined in the simplest of all security games:
1. The adversary outputs values x,x’.
2. The adversary wins if x≠x’ and H(x)=H(x’).
To break collision-resistance, the adversary simply has to find any two inputs that have the same hash (they “collide” under this function).
Note that collisions must exist by the pigeon-hole principle, since the input space is infinite and the output space is finite. If we just start hashing random values and storing the results in a table, eventually we will find a collision.1
In fact, for a hash function with t-bit outputs, we expect to find a collision after about 2t/2 trials due to the Birthday Paradox. This type of exhaustive search is called a birthday attack. But as long as t is big enough, we’ll be okay in practice. The rule of thumb is that your hash function output should be twice as long as symmetric keys, because of the birthday paradox.
Relation between properties
Collision-resistance is a stronger property than second-preimage-resistance: any collision-resistant hash function is guaranteed to be second-preimage-resistant, but not necessarily the other way around. One way to show this is that a collision-resistance hash function must be second-preimage-resistant, since second-preimages are a (special type of) collision so if collisions are infeasible to find then second preimages must also be infeasible to find.
1 This collision-finding approach requires a lot of storage in addition to requiring many evaluations of the hash function. It’s possible to find collisions with only a small amount of storage by adapting cycle-finding techniques like Floyd’s “tortoise and hare” algorithm.
Another way to show this relation is to show that, given black-box access (equivalently called oracle access) to an algorithm A which wins the second-preimage finding game, it’s trivial to design an algorithm A’ which wins the collision-finding game. A’ can simply choose a random value x and ask A to find a second preimage for x. If A succeeds in finding some other value x’≠x such that H(x)=H(x’), then A’ outputs x’, x as a collision and stops. Since A will find such an x’ with non-negligible probability (by definition, since it can win the second preimage game), A will therefore win the collision-finding game with non-negligible probability.
These types of relations between assumptions are very important in cryptography and we’ll see several examples.
Because collision resistance implies second-preimage resistance, cryptographic hash functions are almost always built with collision resistance as the goal (with second preimage-resistance coming “for free”). As soon as any collisions are demonstrated in a hash function, cryptographers will call the hash function broken and consider it deprecated for further use. Even though it might still be second-preimage-resistant, conservatively it is best to move on once collisions are found.
An even stronger model for hash functions is the random oracle model. This is somewhat similar to the concept for PRFs. A true random oracle function would choose a completely random (but consistent) output for any given input. This is impossible to actually construct, since the function table would be infinitely large, but it’s a useful conceptual model. You should be able to see that a random oracle is collision-resistant by definition: if the function table is truly random, there can be no strategy for finding collisions better than brute force searching. Brute force is always a possibility of course, and will not succeed with greater than negligible probability as the size of the hash function output grows.
The pseudorandom nature of a hash function also explains why it is called a “hash” function. The randomness and one-wayness is similar to cooking hash browns: after hashing the input (potatoes) you can’t recover the input from the output (hash browns).
Building a hash function
Designing a hash function (or any cryptographic primitive) is extremely challenging and error-prone. It takes many years of careful design work and review, and even after that many algorithms are found to be insecure.
Many designs share a common strategy, though: take a simpler collision-resistant compression function that takes fixed-size inputs and build an arbitrary length hash function out of that. The most common design template is called Merkle-Damgård and it works as follows. Assume F is a collision-resistant compression function that takes 2t bits of input and produces t bits of output. Divide the input up into t-bit blocks x0, x1, ... and compute the output as follows:
y0 = F(IV, xo)
y1 = F(yo, x1)
yn-1 = F(yn-2, xn-1) yn = F(yn-1, xn)
y = finalize(yn)
The value IV is a fixed initialization vector. We’ll ignore details like how IV is chosen,
how to pad the input to be a multiple of t bits, and what the finalize function does.
The important takeaway is that a hash function is built from a simpler component. There are also formal security proofs stating that if F is collision-resistant, then the resulting hash construction is collision-resistant. We’ll talk a bit more about how a compression function like F is designed later.
Practical hash functions
In the early days of the internet, the most common hash function was called MD5. MD5 only outputs 128 bits, which is too small as today’s computers are powerful enough to try 264 values to find a collision by brute force. Even worse, there are many algorithmic weaknesses to MD5. Collisions can be found in a matter of minutes. It has been deprecated since the mid-2000s.
SHA-1 (Secure Hash Algorithm) was a standard hash function for most of the 2000s and
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com