DQ
Cryptography and Network Security: Principles and Practice
Eighth Edition
Chapter 11
Cryptographic Hash Functions
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
Lecture slides prepared for “Cryptography and Network Security”, 8/e, by William Stallings. Chapter 11, “Cryptographic Hash Functions”.
This chapter begins with a discussion of the wide variety of applications for
cryptographic hash functions. Next, we look at the security requirements for such
functions. Then we look at the use of cipher block chaining to implement a cryptographic hash function. The remainder of the chapter is devoted to the most important and widely used family of cryptographic hash functions, the Secure Hash Algorithm (SHA) family.
1
Learning Objectives
Summarize the applications of cryptographic hash functions.
Explain why a hash function used for message authentication needs to be secured.
Understand the differences among preimage resistant, second preimage resistant, and collision resistant properties.
Present an overview of the basic structure of cryptographic hash functions.
Describe how cipher block chaining can be used to construct a hash function.
Understand the operation of SHA-512.
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
Hash Functions
A hash function H accepts a variable-length block of data M as input and produces a fixed-size hash value
h = H(M)
Principal object is data integrity
Cryptographic hash function
An algorithm for which it is computationally infeasible to find either:
(a) a data object that maps to a pre-specified hash result (the one-way property)
(b) two data objects that map to the same hash result (the collision-free property)
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
A hash function H accepts a variable-length block of data M as input and produces a fixed-size result h = H(M), referred to as a hash value or a hash code. A “good” hash function has the property that the results of applying the function to a large set of inputs will produce outputs that are evenly distributed and apparently random. In general terms, the principal object of a hash function is data integrity. A change to any bit or bits in M results, with high probability, in a change to the hash value.
The kind of hash function needed for security applications is referred to as a cryptographic hash function. A cryptographic hash function is an algorithm for which it is computationally infeasible (because no attack is significantly more efficient than brute force) to find either (a) a data object that maps to a pre-specified hash result (the one-way property) or (b) two data objects that map to the same hash result (the collision-free property). Because of these characteristics, hash functions are often used to determine whether or not data has changed.
3
Figure 11.1 Cryptographic Hash Function; h = H(M)
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
Figure 11.1 depicts the general operation of a cryptographic hash function.
Typically, the input is padded out to an integer multiple of some fixed length
(e.g., 1024 bits), and the padding includes the value of the length of the original message in bits. The length field is a security measure to increase the difficulty for an attacker to produce an alternative message with the same hash value.
4
Figure 11.2 Attack Against Hash Function
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
5
Message authentication is a mechanism or service used to verify the integrity of a message. Message authentication assures that data received are exactly as sent (i.e., there is no modification, insertion, deletion, or replay). In many cases, there is a requirement that the authentication mechanism assures that the purported identity of the sender is valid. When a hash function is used to provide message authentication, the hash function value is often referred to as a message digest.
The essence of the use of a hash function for message authentication is as
follows. The sender computes a hash value as a function of the bits in the message and transmits both the hash value and the message. The receiver performs the same hash calculation on the message bits and compares this value with the incoming hash value. If there is a mismatch, the receiver knows that the message (or possibly the hash value) has been altered (Figure 11.2a).
The hash function must be transmitted in a secure fashion. That is, the hash
function must be protected so that if an adversary alters or replaces the message,
it is not feasible for adversary to also alter the hash value to fool the receiver. This
type of attack is shown in Figure 11.2b. In this example, Alice transmits a data block and attaches a hash value. Darth intercepts the message, alters or replaces the data block, and calculates and attaches a new hash value. Bob receives the altered data with the new hash value and does not detect the change. To prevent this attack, the hash value generated by Alice must be protected.
Figure 11.3 Simplified Examples of the Use of a Hash Function for Message Authentication
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
6
Figure 11.3 illustrates a variety of ways in which a hash code can be used to
provide message authentication, as follows:
a. The message plus concatenated hash code is encrypted using symmetric
encryption. Because only A and B share the secret key, the message must
have come from A and has not been altered. The hash code provides the structure or redundancy required to achieve authentication. Because encryption is
applied to the entire message plus hash code, confidentiality is also provided.
b. Only the hash code is encrypted, using symmetric encryption. This reduces the
processing burden for those applications that do not require confidentiality.
c. It is possible to use a hash function but no encryption for message authentication. The technique assumes that the two communicating parties share a common secret value S. A computes the hash value over the concatenation of M and S and appends the resulting hash value to M. Because B possesses S, it can recompute the hash value to verify. Because the secret value itself is not sent, an opponent cannot modify an intercepted message and cannot generate a false message.
d. Confidentiality can be added to the approach of method (c) by encrypting the
entire message plus the hash code.
When confidentiality is not required, method (b) has an advantage over
methods (a) and (d), which encrypts the entire message, in that less computation
is required. Nevertheless, there has been growing interest in techniques that
avoid encryption (Figure 11.3c). Several reasons for this interest are pointed out in
[TSUD92].
• Encryption software is relatively slow. Even though the amount of data to be
encrypted per message is small, there may be a steady stream of messages into
and out of a system.
• Encryption hardware costs are not negligible. Low-cost chip implementations
of DES are available, but the cost adds up if all nodes in a network must have
this capability.
• Encryption hardware is optimized toward large data sizes. For small blocks
of data, a high proportion of the time is spent in initialization/invocation
overhead.
• Encryption algorithms may be covered by patents, and there is a cost associated with licensing their use.
Message Authentication Code (MAC)
Also known as a keyed hash function
Typically used between two parties that share a secret key to authenticate information exchanged between those parties
Takes as input a secret key and a data block and produces a hash value (MAC) which is associated with the protected message
If the integrity of the message needs to be checked, the MAC function can be applied to the message and the result compared with the associated MAC value
An attacker who alters the message will be unable to alter the associated MAC value without knowledge of the secret key
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
More commonly, message authentication is achieved using a message authentication code (MAC), also known as a keyed hash function. Typically, MACs are used between two parties that share a secret key to authenticate information exchanged between those parties. A MAC function takes as input a secret key and a data block and produces a hash value, referred to as the MAC, which is associated with the protected message. If the integrity of the message needs to be checked, the MAC function can be applied to the message and the result compared with the associated MAC value. An attacker who alters the message will be unable to alter the associated MAC value without knowledge of the secret key. Note that the verifying party also knows who the sending party is because no one else knows the secret key.
Note that the combination of hashing and encryption results in an overall function that is, in fact, a MAC (Figure 11.3b). That is, E(K, H(M)) is a function of a variable-length message M and a secret key K, and it produces a fixed-size out- put that is secure against an opponent who does not know the secret key. In prac- tice, specific MAC algorithms are designed that are generally more efficient than an encryption algorithm.
7
Digital Signature
Operation is similar to that of the MAC
The hash value of a message is encrypted with a user’s private key
Anyone who knows the user’s public key can verify the integrity of the message
An attacker who wishes to alter the message would need to know the user’s private key
Implications of digital signatures go beyond just message authentication
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
Another important application, which is similar to the message authentication
application, is the digital signature . The operation of the digital signature is similar
to that of the MAC. In the case of the digital signature, the hash value of a message
is encrypted with a user’s private key. Anyone who knows the user’s public key
can verify the integrity of the message that is associated with the digital signature.
In this case, an attacker who wishes to alter the message would need to know the
user’s private key. As we shall see in Chapter 14, the implications of digital signatures go beyond just message authentication.
8
Figure 11.4 Simplified Examples of Digital Signatures
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
Figure 11.4 illustrates, in a simplified fashion, how a hash code is used to provide
a digital signature.
a. The hash code is encrypted, using public-key encryption with the sender’s private key. As with Figure 11.3b, this provides authentication. It also provides a
digital signature, because only the sender could have produced the encrypted
hash code. In fact, this is the essence of the digital signature technique.
b. If confidentiality as well as a digital signature is desired, then the message
plus the private-key-encrypted hash code can be encrypted using a symmetric
secret key. This is a common technique.
9
Other Hash Function Uses
Commonly used to create a one-way password file
When a user enters a password, the hash of that password is compared to the stored hash value for verification
This approach to password protection is used by most operating systems
Can be used for intrusion and virus detection
Store H(F) for each file on a system and secure the hash values
One can later determine if a file has been modified by recomputing H(F)
An intruder would need to change F without changing H(F)
Can be used to construct a pseudorandom function (PRF) or a pseudorandom number generator (PRNG)
A common application for a hash-based PRF is for the generation of symmetric keys
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
Hash functions are commonly used to create a one-way password file. Chapter 24 explains a scheme in which a hash of a password is stored by an operating system rather than the password itself. Thus, the actual password is not retrievable by a hacker who gains access to the password file. In simple terms, when a user enters a password, the hash of that password is compared to the stored hash value for verification. This approach to password protection is used by most operating systems.
Hash functions can be used for intrusion detection and virus detection. Store H(F) for each file on a system and secure the hash values (e.g., on a CD-R that is kept secure). One can later determine if a file has been modified by recomputing H(F). An intruder would need to change F without changing H(F).
A cryptographic hash function can be used to construct a pseudorandom function (PRF) or a pseudorandom number generator (PRNG). A common application for a hash-based PRF is for the generation of symmetric keys. We discuss this application in Chapter 12.
10
Two Simple Hash Functions
Consider two simple insecure hash functions that operate using the following general principles:
The input is viewed as a sequence of n-bit blocks
The input is processed one block at a time in an iterative fashion to produce an n-bit hash function
Bit-by-bit exclusive-OR (XOR) of every block
Ci = bi1 xor bi2 xor . . . xor bim
Produces a simple parity for each bit position and is known as a longitudinal redundancy check
Reasonably effective for random data as a data integrity check
Perform a one-bit circular shift on the hash value after each block is processed
Has the effect of randomizing the input more completely and overcoming any regularities that appear in the input
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
To get some feel for the security considerations involved in cryptographic hash functions, we present two simple, insecure hash functions in this section. All hash functions operate using the following general principles. The input (message, file, etc.) is viewed as a sequence of n-bit blocks. The input is processed one block at a time in an iterative fashion to produce an n-bit hash function.
One of the simplest hash functions is the bit-by-bit exclusive-OR (XOR) of every block, which can be expressed as shown.
This operation produces a simple parity for each bit position and is known as a longitudinal redundancy check. It is reasonably effective for random data as a data integrity check. Each n-bit hash value is equally likely. Thus, the probability that a data error will result in an unchanged hash value is 2–n. With more predictably formatted data, the function is less effective. For example, in most normal text files, the high-order bit of each octet is always zero. So if a 128-bit hash value is used, instead of an effectiveness of 2–128, the hash function on this type of data has an effectiveness of 2–112.
A simple way to improve matters is to perform a one-bit circular shift, or rotation, on the hash value after each block is processed.
11
Figure 11.5 Two Simple Hash Functions
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
Figure 11.5 illustrates these two types of hash functions for 16-bit hash values.
Although the second procedure provides a good measure of data integrity,
it is virtually useless for data security when an encrypted hash code is used with a
plaintext message, as in Figures 11.3b and 11.4a. Given a message, it is an easy matter to produce a new message that yields that hash code: Simply prepare the desired alternate message and then append an n-bit block that forces the new message plus block to yield the desired hash code.
Although a simple XOR or rotated XOR (RXOR) is insufficient if only the
hash code is encrypted, you may still feel that such a simple function could be useful when the message together with the hash code is encrypted (Figure 11.3a). But you must be careful.
12
Requirements and Security
Preimage
x is the preimage of h for a hash value h = H(x)
Is a data block whose hash function, using the function H, is h
Because H is a many-to-one mapping, for any given hash value h, there will in general be multiple preimages
Collision
Occurs if we have x ≠ y and H(x) = H(y)
Because we are using hash functions for data integrity, collisions are clearly undesirable
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
Before proceeding, we need to define two terms. For a hash value h = H(x), we say that x is the preimage of h. That is, x is a data block whose hash value, using the function H, is h. Because H is a many-to-one mapping, for any given hash value h, there will in general be multiple preimages. A collision occurs if we have x ≠ y and H(x) = H(y). Because we are using hash functions for data integrity, collisions are clearly undesirable.
13
Table 11.1 Requirements for a Cryptographic Hash Function H
| Requirement | Description |
| Variable input size | H can be applied to a block of data of any size. |
| Fixed output size | H produces a fixed-length output. |
| Efficiency | H(x) is relatively easy to compute for any given x, making both hardware and software implementations practical. |
| Preimage resistant (one-way property) | For any given hash value h, it is computationally infeasible to find y such that H(y) = h. |
| Second preimage resistant (weak collision resistant) | For any given block x, it is computationally Infeasible to find y x with H(y) = H(x). |
| Collision resistant (strong collision resistant) | It is computationally infeasible to find any pair (x, y) with x y, such that H(x) = H(y). |
| Pseudorandomness | Output of H meets standard tests for pseudorandomness. |
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
14
Table 11.1 lists the generally accepted requirements for a cryptographic hash function. The first three properties are requirements for the practical application of a hash function.
The fourth property, preimage resistant, is the one-way property: it is easy to generate a code given a message, but virtually impossible to generate a message given a code. This property is important if the authentication technique involves the use of a secret value (Figure 11.3c). The secret value itself is not sent. However, if the hash function is not one way, an attacker can easily discover the secret value: If the attacker can observe or intercept a transmission, the attacker obtains the message M, and the hash code h = H(S } M). The attacker then inverts the hash function to obtain S } M = H-1(MDM). Because the attacker now has both M and S } M, it is a trivial matter to recover S.
The fifth property, second preimage resistant, guarantees that it is infeasible to find an alternative message with the same hash value as a given message. This prevents forgery when an encrypted hash code is used (Figures 11.3b and 11.4a). If this property were not true, an attacker would be capable of the following sequence: First, observe or intercept a message plus its encrypted hash code; second, generate an unencrypted hash code from the message; third, generate an alternate message with the same hash code.
A hash function that satisfies the first five properties in Table 11.1 is referred to as a weak hash function. If the sixth property, collision resistant, is also satisfied, then it is referred to as a strong hash function. A strong hash function protects against an attack in which one party generates a message for another party to sign. For example, suppose Bob writes an IOU message, sends it to Alice, and she signs it. Bob finds two messages with the same hash, one of which requires Alice to pay a small amount and one that requires a large payment. Alice signs the first message, and Bob is then able to claim that the second message is authentic.
The final requirement in Table 11.1, pseudorandomness, has not tradition- ally been listed as a requirement of cryptographic hash functions but is more or less implied. [JOHN05] points out that cryptographic hash functions are commonly used for key derivation and pseudorandom number generation, and that in message integrity applications, the three resistant properties depend on the output of the hash function appearing to be random. Thus, it makes sense to verify that in fact a given hash function produces pseudorandom output.
Figure 11.6 Relationship Among Hash Function Properties
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
Figure 11.6 shows the relationships among the three resistant properties.
A function that is collision resistant is also second preimage resistant, but the
reverse is not necessarily true. A function can be collision resistant but not preimage
resistant and vice versa. A function can be preimage resistant but not second
preimage resistant and vice versa. See [MENE97] for a discussion.
15
Table 11.2 Hash Function Resistance Properties Required for Various Data Integrity Applications
| Blank | Preimage Resistant | Second Preimage Resistant | Collision Resistant |
| Hash + digital signature | yes | yes | yes* |
| Intrusion detection and virus detection | Blank | Blank | Blank |
| Hash + symmetric encryption | Blank | Blank | Blank |
| One-way password file | yes | Blank | Blank |
| MAC | yes | yes | yes* |
*Resistance required if attacker is able to mount a chosen message attack
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
Table 11.2 shows the resistant properties required for various hash function
applications.
16
Attacks on Hash Functions
Brute-Force Attacks
Does not depend on the specific algorithm, only depends on bit length
In the case of a hash function, attack depends only on the bit length of the hash value
Method is to pick values at random and try each one until a collision occurs
Cryptanalysis
An attack based on weaknesses in a particular cryptographic algorithm
Seek to exploit some property of the algorithm to perform some attack other than an exhaustive search
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
As with encryption algorithms, there are two categories of attacks on hash
functions: brute-force attacks and cryptanalysis. A brute-force attack does not depend on the specific algorithm but depends only on bit length. In the case of a hash function, a brute-force attack depends only on the bit length of the hash value. A cryptanalysis, in contrast, is an attack based on weaknesses in a particular cryptographic algorithm.
17
Collision Resistant Attacks (1 of 2)
For a collision resistant attack, an adversary wishes to find two messages or data blocks that yield the same hash function
The effort required is explained by a mathematical result referred to as the birthday paradox
Yuval proposed the following strategy to exploit the birthday paradox in a collision resistant attack:
The source (A) is prepared to sign a legitimate message x by appending the appropriate m-bit hash code and encrypting that hash code with A’s private key
Opponent generates 2m/2 variations x’ of x, all with essentially the same meaning, and stores the messages and their hash values
Opponent prepares a fraudulent message y for which A’s signature is desired
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
18
For a collision resistant attack, an adversary wishes
to find two messages or data blocks, x and y , that yield the same hash function:
H(x) = H(y). This turns out to require considerably less effort than a preimage or
second preimage attack. The effort required is explained by a mathematical result
referred to as the birthday paradox. In essence, if we choose random variables from a uniform distribution in the range 0 through N - 1, then the probability that a
repeated element is encountered exceeds 0.5 after √N choices have been made.
Thus, for an m-bit hash value, if we pick data blocks at random, we can expect to
find two data blocks with the same hash value within √2m = 2m/2 attempts. The
mathematical derivation of this result is found in Appendix E.
Yuval proposed the following strategy to exploit the birthday paradox in a
Collision resistant attack [YUVA79].
The source, A, is prepared to sign a legitimate message x by appending the appropriate m-bit hash code and encrypting that hash code with A’s private key (Figure 11.4a).
2. The opponent generates 2m/2 variations x’ of x, all of which convey essentially
the same meaning, and stores the messages and their hash values.
3. The opponent prepares a fraudulent message y for which A’s signature is
desired.
4. The opponent generates minor variations y’ of y, all of which convey essentially
the same meaning. For each y’, the opponent computes H(y’), checks
for matches with any of the H(x’) values, and continues until a match is found.
That is, the process continues until a y’ is generated with a hash value equal to
the hash value of one of the x’ values.
5. The opponent offers the valid variation to A for signature. This signature can
then be attached to the fraudulent variation for transmission to the intended
recipient. Because the two variations have the same hash code, they will produce
the same signature; the opponent is assured of success even though the
encryption key is not known.
Thus, if a 64-bit hash code is used, the level of effort required is only on the
order of 232 [see Appendix E, Equation (E.7)].
The generation of many variations that convey the same meaning is not difficult.
For example, the opponent could insert a number of “space-space-backspace”
character pairs between words throughout the document. Variations could then
be generated by substituting “space-backspace-space” in selected instances.
Alternatively, the opponent could simply reword the message but retain the
Meaning. Figure 11.7 provides an example.
To summarize, for a hash code of length m, the level of effort required, as we
have seen, is proportional to the following.
Preimage resistant 2m
Second preimage resistant 2m
Collision resistant 2m/2
Collision Resistant Attacks (2 of 2)
Opponent generates minor variations y’ of y, all of which convey essentially the same meaning. For each y’, the opponent computes H (y’), checks for matches with any of the H (x’) values, and continues until a match is found. That is, the process continues until a y’ is generated with a hash value equal to the hash value of one of the x’ values
The opponent offers the valid variation to A for signature which can then be attached to the fraudulent variation for transmission to the intended recipient
Because the two variations have the same hash code, they will produce the same signature and the opponent is assured of success even though the encryption key is not known
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
19
For a collision resistant attack, an adversary wishes
to find two messages or data blocks, x and y , that yield the same hash function:
H(x) = H(y). This turns out to require considerably less effort than a preimage or
second preimage attack. The effort required is explained by a mathematical result
referred to as the birthday paradox. In essence, if we choose random variables from a uniform distribution in the range 0 through N - 1, then the probability that a
repeated element is encountered exceeds 0.5 after √N choices have been made.
Thus, for an m-bit hash value, if we pick data blocks at random, we can expect to
find two data blocks with the same hash value within √2m = 2m/2 attempts. The
mathematical derivation of this result is found in Appendix E.
Yuval proposed the following strategy to exploit the birthday paradox in a
Collision resistant attack [YUVA79].
The source, A, is prepared to sign a legitimate message x by appending the appropriate m-bit hash code and encrypting that hash code with A’s private key (Figure 11.4a).
2. The opponent generates 2m/2 variations x’ of x, all of which convey essentially
the same meaning, and stores the messages and their hash values.
3. The opponent prepares a fraudulent message y for which A’s signature is
desired.
4. The opponent generates minor variations y’ of y, all of which convey essentially
the same meaning. For each y’, the opponent computes H(y’), checks
for matches with any of the H(x’) values, and continues until a match is found.
That is, the process continues until a y’ is generated with a hash value equal to
the hash value of one of the x’ values.
5. The opponent offers the valid variation to A for signature. This signature can
then be attached to the fraudulent variation for transmission to the intended
recipient. Because the two variations have the same hash code, they will produce
the same signature; the opponent is assured of success even though the
encryption key is not known.
Thus, if a 64-bit hash code is used, the level of effort required is only on the
order of 232 [see Appendix E, Equation (E.7)].
The generation of many variations that convey the same meaning is not difficult.
For example, the opponent could insert a number of “space-space-backspace”
character pairs between words throughout the document. Variations could then
be generated by substituting “space-backspace-space” in selected instances.
Alternatively, the opponent could simply reword the message but retain the
Meaning. Figure 11.7 provides an example.
To summarize, for a hash code of length m, the level of effort required, as we
have seen, is proportional to the following.
Preimage resistant 2m
Second preimage resistant 2m
Collision resistant 2m/2
A Letter in 238 Variations
Figure 11.7 A Letter in 238 Variations
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
If collision resistance is required (and this is desirable for a general-purpose
secure hash code), then the value 2m/2 determines the strength of the hash code
against brute-force attacks. Van Oorschot and Wiener [VANO94] presented
a design for a $10 million collision search machine for MD5, which has a 128-bit hash length, that could find a collision in 24 days. Thus, a 128-bit code may be viewed as inadequate. The next step up, if a hash code is treated as a sequence of 32 bits, is a 160-bit hash length. With a hash length of 160 bits, the same search machine would require over four thousand years to find a collision. With today’s technology, the time would be much shorter, so that 160 bits now appears suspect.
20
Figure 11.8 General Structure of Secure Hash Code
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
In recent years, there has been considerable effort, and some successes,
in developing cryptanalytic attacks on hash functions. To understand these, we
need to look at the overall structure of a typical secure hash function, indicated in
Figure 11.8. This structure, referred to as an iterated hash function, was proposed
by Merkle [MERK79, MERK89] and is the structure of most hash functions in use
today, including SHA, which is discussed later in this chapter. The hash function
takes an input message and partitions it into L fixed-sized blocks of b bits each.
If necessary, the final block is padded to b bits. The final block also includes the
value of the total length of the input to the hash function. The inclusion of the
length makes the job of the opponent more difficult. Either the opponent must
find two messages of equal length that hash to the same value or two messages of differing lengths that, together with their length values, hash to the same value.
The hash algorithm involves repeated use of a compression function, f, that
takes two inputs (an n-bit input from the previous step, called the chaining variable, and a b-bit block) and produces an n-bit output. At the start of hashing, the chaining variable has an initial value that is specified as part of the algorithm. The final value of the chaining variable is the hash value.
21
Secure Hash Algorithm (SHA)
SHA was originally designed by the National Institute of Standards and Technology (NIST) and published as a federal information processing standard (FIPS 180) in 1993
Was revised in 1995 as SHA-1
Based on the hash function MD4 and its design closely models MD4
Produces 160-bit hash values
In 2002 NIST produced a revised version of the standard that defined three new versions of SHA with hash value lengths of 256, 384, and 512
Collectively known as SHA-2
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
22
In recent years, the most widely used hash function has been the Secure Hash
Algorithm (SHA). Indeed, because virtually every other widely used hash function
had been found to have substantial cryptanalytic weaknesses, SHA was more or
less the last remaining standardized hash algorithm by 2005. SHA was developed
by the National Institute of Standards and Technology (NIST) and published as a
federal information processing standard (FIPS 180) in 1993. When weaknesses were discovered in SHA, now known as SHA-0, a revised version was issued as FIPS 180-1 in 1995 and is referred to as SHA-1. The actual standards document is entitled “Secure Hash Standard.” SHA is based on the hash function MD4, and its design closely models MD4.
SHA-1 produces a hash value of 160 bits. In 2002, NIST produced a revised
version of the standard, FIPS 180-2, that defined three new versions of SHA, with
hash value lengths of 256, 384, and 512 bits, known as SHA-256, SHA-384, and
SHA-512, respectively. Collectively, these hash algorithms are known as SHA-2 .
These new versions have the same underlying structure and use the same types of modular arithmetic and logical binary operations as SHA-1.
Table 11.3 Comparison of SHA Parameters
| Algorithm | Message Size | Block Size | Word Size | Message Digest Size |
| SHA-1 | < 264 | 512 | 32 | 160 |
| SHA-224 | < 264 | 512 | 32 | 224 |
| SHA-256 | < 264 | 512 | 32 | 256 |
| SHA-384 | < 2128 | 1024 | 64 | 384 |
| SHA-512 | < 2128 | 1024 | 64 | 512 |
| SHA-512/224 | < 2128 | 1024 | 64 | 224 |
| SHA-512/256 | < 2128 | 1024 | 64 | 256 |
Note: All sizes are measured in bits.
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
A revised document was issued as FIP PUB 180-3 in 2008, which added a 224-bit version (Table 11.3). SHA-1 and SHA-2 are also specified in RFC 6234, which essentially duplicates the material in FIPS 180-3 but adds a C code implementation.
In 2005, NIST announced the intention to phase out approval of SHA-1 and move to a reliance on SHA-2 by 2010. Despite this, SHA-1 continued to be used for digital signature and other applications by numerous applications, such as web browsers. The reluctance to go through the expense and effort of transitioning to SHA-2 has been overcome by a breakthrough announced by are search team in 2017[STEV17,CONS17].The team demonstrated that SHA-1 collision attacks have finally become practical by pro- viding the first known instance of a collision. In total, the computational effort spent is equivalent to 263.1 SHA-1 compressions and took approximately 6500 CPU years and 100 GPU years. As a result, Microsoft, Google, Apple, and Mozilla have all announced that their respective browsers have stopped accepting SHA-1 SSL certificates in 2017.
23
Figure 11.9 Message Digest Generation Using SHA-512
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
24
The algorithm takes as input a message with a maximum length of less than 2128 bits and produces as output a 512-bit message digest. The input is processed in 1024-bit blocks. Figure 11.9 depicts the overall processing of a message to produce a digest. This follows the general structure depicted in Figure 11.8. The processing consists of the following steps.
Step 1 Append padding bits. The message is padded so that its length is congruent to 896 modulo 1024 [length = 896(mod 1024)]. Padding is always added, even if the message is already of the desired length. Thus, the number of
padding bits is in the range of 1 to 1024. The padding consists of a single 1
bit followed by the necessary number of 0 bits.
Step 2 Append length. A block of 128 bits is appended to the message. This block is treated as an unsigned 128-bit integer (most significant byte first) and
contains the length of the original message (before the padding).
The outcome of the first two steps yields a message that is an integer
multiple of 1024 bits in length. In Figure 11.9, the expanded message is
represented as the sequence of 1024-bit blocks M1 , M2 , . . . , MN , so that the
total length of the expanded message is N * 1024 bits.
Step 3 Initialize hash buffer. A 512-bit buffer is used to hold intermediate
and final results of the hash function. The buffer can be represented as eight 64-bit registers (a, b, c, d, e, f, g, h). These registers are initialized to the following
64-bit integers (hexadecimal values):
a = 6A09E667F3BCC908 e = 510E527FADE682D1
b = BB67AE8584CAA73B f = 9B05688C2B3E6C1F
c = 3C6EF372FE94F82B g = 1F83D9ABFB41BD6B
d = A54FF53A5F1D36F1 h = 5BE0CD19137E2179
These words were obtained by taking the first sixty-four bits of the frac- tional parts of the square roots of the first eight prime numbers. The val- ues are stored in big-endian format, which is the most significant byte of a word in the low-address (leftmost) byte position. In contrast, in little-endian format, the least significant byte is stored in the lowest address.
Step 4 Process message in 1024-bit (128-word) blocks. The heart of the algorithm is a module that consists of 80 rounds; this module is labeled F in Figure 11.9. The logic is illustrated in Figure 11.10.
Each round takes as input the 512-bit buffer value, abcdefgh, and
updates the contents of the buffer. At input to the first round, the buffer
has the value of the intermediate hash value, Hi-1 . Each round t makes
use of a 64-bit value Wt, derived from the current 1024-bit block being processed
(Mi ). These values are derived using a message schedule described
subsequently. Each round also makes use of an additive constant Kt , where
0 ≤ t ≤ 79 indicates one of the 80 rounds. These words represent the first
64 bits of the fractional parts of the cube roots of the first 80 prime numbers.
Step 5 Output. After all N 1024-bit blocks have been processed, the output from
the Nth stage is the 512-bit message digest.
Figure 11.10 SHA-512 Processing of a Single 1024-Bit Block
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
The heart of the algorithm is a module that consists of 80 rounds; this module is labeled F in Figure 11.9. The logic is illustrated in Figure 11.10.
25
Table 11.4 SHA-512 Constants
| 428a2f98d728ae22 | 7137449123ef65cd | b5c0fbcfec4d3b2f | e9b5dba58189dbbc |
| 3956c25bf348b538 | 59f111f1b605d019 | 923f82a4af194f9b | ab1c5ed5da6d8118 |
| d807aa98a3030242 | 12835b0145706fbe | 243185be4ee4b28c | 550c7dc3d5ffb4e2 |
| 72be5d74f27b896f | 80deb1fe3b1696b1 | 9bdc06a725c71235 | c19bf174cf692694 |
| e49b69c19ef14ad2 | efbe4786384f25e3 | 0fc19dc68b8cd5b5 | 240ca1cc77ac9c65 |
| 2de92c6f592b0275 | 4a7484aa6ea6e483 | 5cb0a9dcbd41fbd4 | 76f988da831153b5 |
| 983e5152ee66dfab | a831c66d2db43210 | b00327c898fb213f | bf597fc7beef0ee4 |
| c6e00bf33da88fc2 | d5a79147930aa725 | 06ca6351e003826f | 142929670a0e6e70 |
| 27b70a8546d22ffc | 2e1b21385c26c926 | 4d2c6dfc5ac42aed | 53380d139d95b3df |
| 650a73548baf63de | 766a0abb3c77b2a8 | 81c2c92e47edaee6 | 92722c851482353b |
| a2bfe8a14cf10364 | a81a664bbc423001 | c24b8b70d0f89791 | c76c51a30654be30 |
| d192e819d6ef5218 | d69906245565a910 | f40e35855771202a | 106aa07032bbd1b8 |
| 19a4c116b8d2d0c8 | 1e376c085141ab53 | 2748774cdf8eeb99 | 34b0bcb5e19b48a8 |
| 391c0cb3c5c95a63 | 4ed8aa4ae3418acb | 5b9cca4f7763e373 | 682e6ff3d6b2b8a3 |
| 748f82ee5defb2fc | 78a5636f43172f60 | 84c87814a1f0ab72 | 8cc702081a6439ec |
| 90befffa23631e28 | a4506cebde82bde9 | bef9a3f7b2c67915 | c67178f2e372532b |
| ca273eceea26619c | d186b8c721c0c207 | eada7dd6cde0eb1e | f57d4f7fee6ed178 |
| 06f067aa72176fba | 0a637dc5a2c898a6 | 113f9804bef90dae | 1b710b35131c471b |
| 28db77f523047d84 | 32caab7b40c72493 | 3c9ebe0a15c9bebc | 431d67c49c100d4c |
| 4cc5d4becb3e42b6 | 597f299cfc657e2a | 5fcb6fab3ad6faec | 6c44198c4a475817 |
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
The constants provide a “randomized” set of 64-bit patterns, which should
eliminate any regularities in the input data. Table 11.4 shows these constants
in hexadecimal format (from left to right).
26
Figure 11.11 Elementary SHA-512 Operation (single round)
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
27
Let us look in more detail at the logic in each of the 80 steps of the processing
of one 512-bit block (Figure 11.11).
Figure 11.12 Creation of 80-word Input Sequence for SHA-512 Processing of Single Block
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
Figure 11.12 Creation of 80-word Input Sequence for SHA-512 Processing of Single Block
28
Figure 11.13 SHA-512 Logic
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
Figure 11.13 summarizes the SHA-512 logic.
The SHA-512 algorithm has the property that every bit of the hash code is a
function of every bit of the input. The complex repetition of the basic function F
produces results that are well mixed; that is, it is unlikely that two messages chosen
at random, even if they exhibit similar regularities, will have the same hash code.
Unless there is some hidden weakness in SHA-512, which has not so far been published, the difficulty of coming up with two messages having the same message digest is on the order of 2256 operations, while the difficulty of finding a message with a given digest is on the order of 2512 operations.
29
SHA-3
SHA-1 has not yet been "broken”
No one has demonstrated a technique for producing collisions in a practical amount of time
Considered to be insecure and has been phased out for SHA-2
SHA-2 shares the same structure and mathematical operations as its predecessors so this is a cause for concern
Because it will take years to find a suitable replacement for SHA-2 should it become vulnerable, NIST decided to begin the process of developing a new hash standard
NIST announced in 2007 a competition for the SHA-3 next generation NIST hash function
Winning design was announced by NIST in October 2012
SHA-3 is a cryptographic hash function that is intended to complement SHA-2 as the approved standard for a wide range of applications
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
As of this writing, the Secure Hash Algorithm (SHA-1) has not yet been “broken.”
That is, no one has demonstrated a technique for producing collisions in a practical amount of time. However, because SHA-1 is very similar, in structure and in the basic mathematical operations used, to MD5 and SHA-0, both of which have been broken, SHA-1 is considered insecure and has been phased out for SHA-2.
SHA-2, particularly the 512-bit version, would appear to provide unassailable
security. However, SHA-2 shares the same structure and mathematical operations
as its predecessors, and this is a cause for concern. Because it will take years to find a suitable replacement for SHA-2, should it become vulnerable, NIST decided to begin the process of developing a new hash standard.
Accordingly, NIST announced in 2007 a competition to produce the next generation NIST hash function, to be called SHA-3. The winning design for SHA-3
was announced by NIST in October 2012. SHA-3 is a cryptographic hash function
that is intended to complement SHA-2 as the approved standard for a wide range
of applications.
NISTIR 7896 (Third-Round Report of the SHA-3 Cryptographic Hash Algorithm Competition) summarizes the evaluation criteria used by NIST to select from among the candidates for SHA-3, plus the rationale for picking Keccak, which was the winning candidate.This material is useful in understanding not just the SHA-3 design but also the criteria by which to judge any cryptographic hash algorithm.
30
The Sponge Construction
Underlying structure of SHA-3 is a scheme referred to by its designers as a sponge construction
Takes an input message and partitions it into fixed-size blocks
Each block is processed in turn with the output of each iteration fed into the next iteration, finally producing an output block
The sponge function is defined by three parameters:
f = the internal function used to process each input block
r = the size in bits of the input blocks, called the bitrate
pad = the padding algorithm
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
The underlying structure of SHA-3 is a scheme referred to by its designers as a
sponge construction [BERT07, BERT11]. The sponge construction has the same general structure as other iterated hash functions (Figure 11.8). The sponge function takes an input message and partitions it into fixed-size blocks. Each block is processed in turn with the output of each iteration fed into the next iteration, finally producing an output block.
The sponge function is defined by three parameters:
f = the internal function used to process each input block
r = the size in bits of the input blocks, called the bitrate
pad = the padding algorithm
31
Figure 11.14 Sponge Function Input and Output
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
A sponge function allows both variable length input and output, making it a
flexible structure that can be used for a hash function (fixed-length output), a pseudorandom number generator (fixed-length input), and other cryptographic functions. Figure 11.14 illustrates this point.
The sponge specification proposes [BERT11] proposes
two padding schemes:
• Simple padding: Denoted by pad10*, appends a single bit 1 followed by the
minimum number of bits 0 such that the length of the result is a multiple of the
block length.
• Multirate padding: Denoted by pad10*1, appends a single bit 1 followed by
the minimum number of bits 0 followed by a single bit 1 such that the length of
the result is a multiple of the block length. This is the simplest padding scheme
that allows secure use of the same f with different rates r . FIPS 202
uses multirate padding.
32
Figure 11.15 Sponge Construction
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
Figure 11.15 shows the iterated structure of the sponge function. The sponge construction operates on a state variable s of b = r + c bits, which is initialized to all zeros and modified at each iteration. The value r is called the bitrate. This value is the block size used to partition the input message. The term bitrate reflects the fact that r is the number of bits processed at each iteration: the larger the value of r, the greater the rate at which message bits are processed by the sponge construction.
The value c is referred to as the capacity. A discussion of the security implications of the capacity is beyond our scope. In essence, the capacity is a measure of the achievable complexity of the sponge construction and therefore the achievable level of security. A given implementation can increase claimed security and reduce speed by increasing the capacity c and decreasing the bitrate r accordingly, or vice versa. The default values for Keccak are c = 1024 bits, r = 576 bits, and therefore b = 1600 bits.
33
Table 11.5 SHA-3 Parameters
| Message Digest Size | 224 | 256 | 384 | 512 |
| Message Size | no maximum | no maximum | no maximum | no maximum |
| Block Size (bitrate r) | 1152 | 1088 | 832 | 576 |
| Word Size | 64 | 64 | 64 | 64 |
| Number of Rounds | 24 | 24 | 24 | 24 |
| Capacity c | 448 | 512 | 768 | 1024 |
| Collision Resistance | 2112 | 2128 | 2192 | 2256 |
| Second Preimage Resistance | 2224 | 2256 | 2384 | 2512 |
Note: All sizes and security levels—are measured in bits.
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
Table 11.5 shows the supported values of r and c. As Table 11.5 shows, the hash function security associated with the sponge construction is a function of the capacity c.
34
Figure 11.16 SHA-3 State Matrix
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
We now examine the iteration function Keccak-f used to process each successive block of the input message. Recall that f takes as input a 1600-bit variable s consisting of r bits, corresponding to the message block size followed by c bits, referred to as the capacity. For internal processing within f, the input state variable s is organized as a 5 * 5 * 64 array a. The 64-bit units are referred to as lanes. For our purposes, we generally use the notation a[x, y, z] to refer to an individual bit with in the state array. When we are more concerned with operations that affect entire lanes, we designate the 5 * 5 matrix as L[x, y], where each entry in L is a 64-bit lane. The use of indices within this matrix is shown in Figure 11.16. Thus, the columns are labeled x = 0 through x = 4, the rows are labeled y = 0 through y = 4, and the individual bits within a lane are labeled z = 0 through z = 63.
The mapping between the bits of s and those of a is
s [64(5y + x ) + z ] = a [x , y , z ]
We can visualize this with respect to the matrix in Figure 11.16. When treat- ing the state as a matrix of lanes, the first lane in the lower left corner, L[0, 0], cor- responds to the first 64 bits of s. The lane in the second column, lowest row, L[1, 0], corresponds to the next 64 bits of s. Thus, the array a is filled with the bits of s starting with row y = 0 and proceeding row by row.
35
SHA-3 Iteration Function f
Figure 11.17 SHA-3 Iteration Function f
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
The function f is executed once for each input block of the message
to be hashed. The function takes as input the 1600-bit state variable and converts
it into a 5 * 5 matrix of 64-bit lanes. This matrix then passes through 24 rounds of
processing. Each round consists of five steps, and each step updates the state matrix by permutation or substitution operations. As shown in Figure 11.17, the rounds are identical with the exception of the final step in each round, which is modified by a round constant that differs for each round.
36
Table 11.6 Step Functions in SHA-3
| Function | Type | Description |
| θ | Substitution | New value of each bit in each word depends on its current value and on one bit in each word of preceding column and one bit of each word in succeeding column. |
| ρ | Permutation | The bits of each word are permuted using a circular bit shift. W[0, 0] is not affected. |
| π | Permutation | Words are permuted in the 5 × 5 matrix. W[0, 0] is not affected. |
| x | Substitution | New value of each bit in each word depends on its current value and on one bit in next word in the same row and one bit in the second next word in the same row. |
| ι | Substitution | W[0, 0] is updated by XOR with a round constant. |
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
Table 11.6 summarizes the operation of the five steps. The steps have a simple
description leading to a specification that is compact and in which no trapdoor
can be hidden. The operations on lanes in the specification are limited to bitwise
Boolean operations (XOR, AND, NOT) and rotations. There is no need for table
lookups, arithmetic operations, or data-dependent rotations. Thus, SHA-3 is easily
and efficiently implemented in either hardware or software.
37
Figure 11.18 Theta and Chi Step Functions
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
Figure 11.18.a illustrates the operation on L[3, 2]. The same operation is performed on all of the other lanes in the matrix.
38
Table 11.7 Rotation Values Used in SHA-3 (1 of 2)
(a) Calculation of values and positions
| t | g(t) | g (t) mod 64 | x, y |
| 0 | 1 | 1 | 1, 0 |
| 1 | 3 | 3 | 0, 2 |
| 2 | 6 | 6 | 2, 1 |
| 3 | 10 | 10 | 1, 2 |
| 4 | 15 | 15 | 2, 3 |
| 5 | 21 | 21 | 3, 3 |
| 6 | 28 | 28 | 3, 0 |
| 7 | 36 | 36 | 0, 1 |
| 8 | 45 | 45 | 1, 3 |
| 9 | 55 | 55 | 3, 1 |
| 10 | 66 | 2 | 1, 4 |
| 11 | 78 | 14 | 4, 4 |
| t | g(t) | g (t) mod 64 | x, y |
| 12 | 91 | 27 | 4, 0 |
| 13 | 105 | 41 | 0, 3 |
| 14 | 120 | 56 | 3, 4 |
| 15 | 136 | 8 | 4, 3 |
| 16 | 153 | 25 | 3, 2 |
| 17 | 171 | 43 | 2, 2 |
| 18 | 190 | 62 | 2, 0 |
| 19 | 210 | 18 | 0, 4 |
| 20 | 231 | 39 | 4, 2 |
| 21 | 253 | 61 | 2, 4 |
| 22 | 276 | 20 | 4, 1 |
| 23 | 300 | 44 | 1, 1 |
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
Table 11.7 shows the calculations that are performed to determine the amount of the bit shift and the location of each bit shift value. Note that all of the rotation amounts are different.
39
Table 11.7 Rotation Values Used in SHA-3 (2 of 2)
(b) Rotation values by word position in matrix
| Blank | x = 0 | x = 1 | x = 2 | x = 3 | x = 4 |
| y = 4 | 18 | 2 | 61 | 56 | 14 |
| y = 3 | 41 | 45 | 15 | 21 | 8 |
| y = 2 | 3 | 10 | 43 | 25 | 39 |
| y = 1 | 36 | 44 | 6 | 55 | 20 |
| y = 0 | 0 | 1 | 62 | 28 | 27 |
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
Table 11.7 shows the calculations that are performed to determine the amount of the bit shift and the location of each bit shift value. Note that all of the rotation amounts are different.
40
Figure 11.19 Pi Step Function
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
Figure 11.19 Pi Step Function
41
Table 11.8 Round Constants in SHA-3
| Round | Constant (hexadecimal) | Number of 1 bits |
| 0 | 0000000000000001 | 1 |
| 1 | 0000000000008082 | 3 |
| 2 | 800000000000808A | 5 |
| 3 | 8000000080008000 | 3 |
| 4 | 000000000000808B | 5 |
| 5 | 0000000080000001 | 2 |
| 6 | 8000000080008081 | 5 |
| 7 | 8000000000008009 | 4 |
| 8 | 000000000000008A | 3 |
| 9 | 0000000000000088 | 2 |
| 10 | 0000000080008009 | 4 |
| 11 | 000000008000000A | 3 |
| Round | Constant (hexadecimal) | Number of 1 bits |
| 12 | 000000008000808B | 6 |
| 13 | 800000000000008B | 5 |
| 14 | 8000000000008089 | 5 |
| 15 | 8000000000008003 | 4 |
| 16 | 8000000000008002 | 3 |
| 17 | 8000000000000080 | 2 |
| 18 | 000000000000800A | 3 |
| 19 | 800000008000000A | 4 |
| 20 | 8000000080008081 | 5 |
| 21 | 8000000000008080 | 3 |
| 22 | 0000000080000001 | 2 |
| 23 | 8000000080008008 | 4 |
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
Table 11.8 lists the 24 64-bit round constants. Note that the Hamming weight, or number of 1 bits, in the round constants ranges from 1 to 6. Most of the bit positions are zero and thus do not change the corresponding bits in L[0, 0].
42
Summary
Summarize the applications of cryptographic hash functions
Explain why a hash function used for message authentication needs to be secured
Understand the operation of SHA-512
Understand the differences among preimage resistant, second preimage resistant, and collision resistant properties
Present an overview of the basic structure of cryptographic hash functions
Describe how cipherblock chaining can be used to construct a hash function
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
43
Chapter 11 summary.
Copyright
This work is protected by United States copyright laws and is provided solely for the use of instructors in teaching their courses and assessing student learning. Dissemination or sale of any part of this work (including on the World Wide Web) will destroy the integrity of the work and is not permitted. The work and materials from it should never be made available to students except by instructors using the accompanying text in their classes. All recipients of this work are expected to abide by these restrictions and to honor the intended pedagogical purposes and the needs of other instructors who rely on these materials.
Copyright © 2020 Pearson Education, Inc. All Rights Reserved.
44
: () = ( + 1)( + 2)/2
011
mod5
230
t
Notegttt
x
y
æöæöæö
=
ç÷ç÷ç÷
èøèøèø
.MsftOfcThm_Text1_Fill { fill:#000000; } .MsftOfcThm_MainDark1_Stroke { stroke:#000000; }