H.W
The Data Link Layer Chapter 3
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
• Data Link Layer Design Issues (in PART-1)
• Error Detection and Correction (in PART-1)
• Elementary Data Link Protocols (in PART-2)
• Sliding Window Protocols (in PART-2)
• Example Data Link Protocols (in PART-2)
Revised: August 2011
The Data Link Layer
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Responsible for delivering frames of
information over a single link
• Handles transmission errors and
regulates the flow of data Physical
Link
Network
Transport
Application
Data Link Layer Design Issues
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
• Frames »
• Possible services »
• Framing methods »
• Error control »
• Flow control »
Frames
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Link layer accepts packets from the network layer, and
encapsulates them into frames that it sends using the
physical layer; reception is the opposite process
Actual data path
Virtual data path
Network
Link
Physical
Possible Services
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Acknowledged connection-oriented service
• Dedicated connection (data channel) is set up (there are specific
protocols for connection set-up and disconnection) before data is
transmitted from transmitter (tx) to receiver (rx); rare.
• Each frame sent by tx is numbered and the rx replies with an ack
when it receives a frame.
Unacknowledged connectionless service (e.g., Ethernet)
• No connection established beforehand.
• No ack sent by rx.
• Frame is sent with no connection / error recovery
Acknowledged connectionless service
• Frame is sent with retransmissions if needed.
• Example is 802.11
Framing Methods
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Key Question: How does the rx know where is the
start of a certain frame and where does it end?
• Byte count »
• Flag bytes with byte stuffing »
• Flag bits with bit stuffing »
• Physical layer coding violations
− Use non-data symbol to indicate frame
Framing – Byte count
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Frame begins with a count of the number of bytes in it
(length field). Note that the number of bytes in a frame
includes the length field.
• Simple, but difficult to resynchronize after an error
Error
case
Expected
case
Framing – Byte stuffing
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Special flag bytes delimit frames; occurrences of flags in
the data must be stuffed (escaped).
• Two consecutive FLAG bytes demarcate the end of a frame
and start of the next frame.
• Longer, but easy to resynchronize after error (look for next
occurrence of two successive FLAG bytes.
Stuffing
examples
Frame
format
Escape byte
stuffed.
Framing – Bit stuffing
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Stuffing done at the bit level:
• Each frame begins and ends with a special bit pattern 0 1 1 1 1 1
1 0 (or 0x7E in hex, the key is 6 consecutive1‟s).
• What if six consecutive 1‟s appear in the main data
stream?
− At tx, after five 1‟s in the data, a 0 is added (stuffed)
− At rx, a 0 after five 1‟s is deleted
Transmitted bits
with stuffing
Data bits
Framing – Practice
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
In practice, combinations of the previously discussed
approaches for used.
For example, in Ethernet and 802.11, a frame begins
with a well defined pattern known as a preamble. The
preamble is followed by a „length (count) field‟ which is
used to locate the end of the frame.
• In 802.11, the preamble can be as long as 72 bits.
Error Control
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Error control repairs frames that are received in error
• Requires errors to be detected at the receiver
• Typical strategy: retransmit a frame which has not
been acknowledged.
• Timer protects against lost acknowledgements (this
is the case when the rx does send an ack, but the
ack is lost).
Detecting errors and retransmissions are next topics.
Flow Control
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Prevents a fast sender from out-pacing a slow receiver
(buffer overflow at the rx)
• Receiver gives feedback on the data it can accept
• Rare in the link layer as NICs (network interface
card) run at “wire speed”
− Receiver can take data as fast as it can be sent
Flow control is a topic in the Link and Transport layers.
Error Detection and Correction
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Error codes add structured redundancy to data so errors can be detected, and possibly also corrected.
Error correction codes:
• Hamming codes (we will only cover this) »
• Binary convolutional codes »
• Reed-Solomon and Low-Density Parity Check codes − Mathematically complex, widely used in real systems
Error detection codes:
• Parity »
• Checksums »
• Cyclic redundancy codes (CRC check) »
Error Models
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
• Isolated single bit errors – the probability that any bit is in error is independent of the probability that other bits in the frame are in
error; e.g, if the frame is [1 0 0 1 1 0 1], and if the bit error
probability is 0.001, the probability that the 7th bit is in error is still
0.001, even if the first 6 bits are in error. (you can generalize it to
isolated 2 bit errors, 3 bit errors, etc.)
• Burst errors (more common) – the probability that any bit is in error is dependent on the location of the first bit which is in error
and the burst length, B; e.g, if the frame is [1 0 0 1 1 0 1], an
error burst of length B = 7 means that if the first bit is in error, all
seven bits in the frame are in error.
• Error detection codes are characterized by how well they can detect isolated and bursty bit errors.
Error Bounds – Hamming distance
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Code turns data of n bits into codewords of n+k bits (i.e., no. of
redundant bits added = k)
The number of 1‟s in a codeword (bit string) is called the Hamming
Weight of the codeword.
Hamming Distance (d) between two codewords (bit strings) is the
number of bit flips required to turn one codeword into the other one
(or, in other words, the number of positions in which the two
codewords differ).
− For example, the Hamming Distance between [01100] and [10101] is 3. You
can find this by performing a bitwise XOR (exclusive OR) and then adding up
the result.
− 0 (XOR) 0 = 0, 1 (XOR) 1 = 0, 0 (XOR) 1 = 1, 1 (XOR) 0 = 1
0 1 1 0 0
1 0 1 0 1
XOR 1 1 0 0 1 sum = 3
Error Bounds – Hamming distance
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
A set of valid codewords is called a dictionary. The Hamming
Distance of a dictionary is the minimum of the Hamming Distances
between any two codewords in the dictionary.
− For example, suppose the dictionary has four codewords {01100, 10101,
11111, 00001}. Verify that the Hamming Distance of this dictionary is 2 (you
have to compute 6 Hamming Distances - between codewords (1,2), (1,3),
(1,4), (2,3), (2,4), and (3,4) - and then find their minimum).
• Example with a dictionary of 4 codewords of 10 bits
(n=2, k=8) each, called a (10,2) code (in general,
(n+k,n) code). Recall that the original number of data
bits = k, which turns into n+k after coding. − Dictionary: {0000000000, 0000011111, 1111100000, 1111111111}
− Hamming Distance is d=5 (take any pair of codewords, perform a
bitwise XOR, and add up – the answer will be at least 5).
Error Bounds – Hamming distance
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Bounds for a code with distance:
• 2d+1 – can correct d errors
− If you want to correct up to „d‟ (say d = 5) number of errors,
the Hamming Distance between any two codewords in the
dictionary must be at least 11.
• d+1 – can detect d errors
− If you want to detect up to „d‟ (say d = 5) number of errors,
the Hamming Distance between any two codewords in the
dictionary must be at least 6. This is a less stringent
requirement than maintaining a Hamming distance of 11 for
error correction.
• Error correction is a heavier task than error
detection. Detection is the first step, then you can
correct.
Error Bounds – Hamming distance
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
An alternate way of stating the contents in the previous
slide is as follows:
• If the minimum Hamming Distance between any two
codewords in a dictionary = d, then:
− you can correct up to number of errors. For our
example, d = 5, therefore you can correct up to 2 errors.
− you can detect up to „d-1‟ number of errors. For our
example, d = 5, therefore you can detect up to 4 errors.
2
1d
Hamming Distance and Error Detection
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Why can a code with distance d+1 detect up to d errors?
• Both tx and rx know the dictionary of valid codewords. In our
previous example, the dictionary is {0000000000, 0000011111,
1111100000, 1111111111}. The Hamming Distance for this
dictionary is d = 5.
• Rx detects an error when it receives an invalid codeword (e.g.
1100000000).
• If no. of errors 5, then one valid codeword may be turned into
another valid codeword and there is no way to detect that an error
has occurred.
− For example, with 5 strategically placed bit flips, codeword 1 (0000000000 )
can be converted into codeword 2 (0000011111). And then, you don‟t know
whether codeword 2 was sent and received with no errors, or codeword 1 was
sent which was received with 5 errors.
Hamming Distance and Error Correction
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Why can a code with distance 2d+1 correct up to d
errors?
• Because errors are corrected by mapping a received invalid codeword to
the nearest valid codeword, i.e., the one that can be reached with the
fewest bit flips (errors). − In other words, if the invalid codeword can be reached from valid codeword 1 with „m‟ bit
flips or from valid codeword 2 with „n‟ bit flips (m < n), we map the invalid codeword back
to valid codeword 1. That is, the probability that the channel made „m‟ (say 4) errors is
higher than it made „n‟ (say 5) errors. The error is therefore „corrected‟ in a probabilistic
sense.
− If the channel flips each bit independently (meaning, what it does to the current bit has no
bearing to what it has done with the previous bits) with a probability equal to 0.001, then
the probability that it has flipped specific 4 bits out of 10 bits (say) = (0.001)4 (1-0.001)6
10-12 and the probability that it has flipped specific 5 bits out of 10 = (0.001)5 (1-0.001)5
10-15. Which one is more likely to have happened?
• If there are more than bit flips, the received codeword may be
closer to another valid codeword than the codeword that was sent. − For example, sending 0000000000 with 2 flips might give 1100000000 which is closest (in
terms of Hamming Distance) to 0000000000, correcting the error. But with 3 flips,
1110000000 might be received, which is closest to 1111100000, which is an error.
2
1d
Hamming Distance and Error Correction
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
The strength of a dictionary (i.e., its error correction
capability) is dictated by the two closest codewords in the
dictionary. Suppose, those two codewords are i and j.
Codeword i Codeword j
with „d‟ bit flips, you end here with „d‟ bit flips, you end here
For error correction, the magenta and yellow (representing
invalid codewords) must still be separated by at least one
Hamming Distance. If they coincide, you won‟t know where
the invalid codeword came from. Therefore, we need a
Hamming Distance of 2d+1 to correct up to d number of bit
errors.
d d1
Odd and Even Parity
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
A bit string has odd parity if the number of 1‟s in the
string is odd. Conversely, the bit string has even parity if
the number of 1‟s in the string is even (0 is an even
number).
− 01100, 000, 11001001 are all even parity strings.
− 1000011, 1, 00010 are all odd parity strings.
− ASCII representations of characters use a parity bit. For
example, the ASCII representation of the character „a‟ is
01100001 (a byte), the rightmost bit being the parity bit. Also,
the ASCII representation of the character „A‟ is 01000001, the
rightmost bit being the parity bit. Similarly, the representations
for „m‟ and „M‟ are 01101101 and 01001101 respectively. So,
what parity (even or odd?) scheme does ASCII implement for
characters?
− The complete ASCII table can be found here:
http://www.ascii-code.com/
Parity Check and Parity Bit (1)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Suppose we are transmitting blocks of n bits and for every block of n
bits, we append a parity bit at the end, such that the encoded
codeword has even parity (could also be odd parity).
If the rx knows that an even parity encoding scheme is employed by
the tx, it can detect any odd number (1,3,5,7, …) of errors (by parity
mismatch) and request retransmission. But any even number of
errors will go undetected (no parity mismatch in this case).
− Example: before encoding [10110] (here n=5), after encoding
[101101], the last bit is the parity bit which ensures even parity of the
codeword. You should experiment with arbitrarily placed 1, 3, or 5
errors (could also affect the parity bit positions) and verify even
parity mismatch. Note that the parity bit would have been 0 if an odd
parity scheme were adopted.
− Example: before encoding [10111], after encoding [101110] for even
parity or [101111] for odd parity.
− You cannot correct for any error, not even one, because you cannot
localize the error.
Parity Check and Parity Bit (2)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Suppose the length of all codewords after adding a parity bit at the end
is „n‟. Then, the probability that an error phenomenon goes undetected
= probability that there are 2 errors + probability that there are 4 errors
+ probability that there are 6 errors + …
Assuming that the channel flips each bit independently of the past (or
future), the probability that there are two (any two) bit flips = ,
where p = probability that the channel flips any bit (bit error probability)
and (n choose 2) is the number of ways you can choose 2
bits out of n. This probability form is known as binomial probability.
Similarly, the probability that there are four bit flips = .
Suppose n = 7 and p = 0.001. Then, the probability that an error
phenomenon (2 errors, or 4 errors, or 6 errors) goes undetected is
The probability that an error phenomenon is detected is:
22 )1(
2
n pp
n
)!2(!2
!
2
n
nn
44 )1(
4
n pp
n
5062442 101.2)001.01(001.0
6
6 )001.01(001.0
4
6 )001.01(001.0
2
6
0.99998)001.01(001.0 6
6 )001.01(001.0
4
6 )001.01(001.0
2
6 1
062442
Multiple Parity Check Bits (1)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Of course, we could add multiple parity check bits to improve error
detection/correction performance. Suppose our data is 4 bits [d1, d2, d3,
d4] = [1 1 1 0], and we choose to add three check bits at the end [c1,c2,c3].
The encoded codeword is [d1,d2,d3,d4,c1,c2,c3] = [1 1 1 0 0 0 1], where the
check bits are computed as follows ( denotes modulo 2 addition or XOR
operation). This is not the only way, but the check bits must „cover‟ all
data bits. Note that the rx knows how the check bits are computed at tx.
− c1 = d1 d3 d4 = 1 1 0 = 0 0 = 0 (since 0 0 = 0, 1 1 = 0 and 0 1 = 1)
− c2 = d1 d2 d4 = 1 1 0 = 0
− c3 = d1 d2 d3 = 1 1 1 = 0 1 = 1
With this scheme, you can now correct one error. Suppose the rx receives [1 0 1 0
0 0 1] - notice that bit 2 has been flipped. When the rx verifies the parities, c1 will
work out fine, c2 will show an error ( either d1 or d2 or d4 is in error, which further
reduces to either d2 or d4 is in error since c1 worked out fine), and c3 will show an
error ( either d1 or d2 or d3 is in error, which further reduces to d2 must be in
error). Can you argue similarly when bit 1 is flipped?
Multiple Parity Check Bits (2)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
A formal prescription for localizing the errored bit involves computing the
syndrome from the received codeword. It will help if we look at the tx
operation in terms of a matrix multiplication (except additions are modulo 2):
321421431,4,3,2,14,3,2,1
,,
0111000
1010100
1100010
1110001
ddddddddddddddddd
4 x 4
identity
matrix 1‟s in 1st,
3rd and 4th
rows
1‟s in 1st,
2nd and 4th
rows
1‟s in 1st,
2nd and 3rd
rows
GENERATOR
MATRIX
G = [I44, P43]
CODEWORD, AFTER ENCODINGDATA VECTOR
Multiple Parity Check Bits (3)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Here‟s another example: data vectors are of length 5 [d1, d2, d3, d4, d5] and
there are two check bits [c1,c2] computed as follows:
− c1 = d1 d3 d5
− c2 = d2 d4
425315,4,3,2,154,3,2,1
,,
0110000
1001000
0100100
1000010
0100001
, ddddddddddddddd
5 x 5
identity
matrix
1‟s in 1st,
3rd and 5th
rows
1‟s in 2nd and
4th rows
GENERATOR
MATRIX
G = [I55, P52]
Multiple Parity Check Bits (4)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Observe that the general rule for computing the generator matrix for data
vectors of length „n‟ is:
− Write down an n n identity matrix, I
− Write down the n k matrix P, where k = no. of check bits. This matrix dictates
which data bits are „covered‟ by which check bits.
− Then, G = [Inn, Pnk] i.e., P is concatenated to I horizontally. The size of matrix G is
therefore n (n+k).
− Since the data vector d is of size 1 n, the encoded data vector is given by d G
(with modulo 2 addition), and is of size 1 (n+k).
The receiver first computes the parity check matrix C from the generator
matrix G (which it knows) as follows: C = [PT, Ikk], where P T denotes the
transpose of P and is of size k n. Note that C is of size k (n+k).
Next, it computes the product s = C rT, where r is the 1 (n+k) row
vector received at the receiver and rT denotes the transpose of r. It is
clear that the size of s is k 1. This vector s is known as the syndrome
vector. If there are no errors, we will have s = 0 (zero vector). If there‟s an
error (one bit), we can figure out its location from the syndrome vector.
Multiple Parity Check Bits (5)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Let‟s do an example. The generator matrix in slide 26 is (recall that n =
length of data vector = 4 and k = no. of check bits = 3):
The parity check matrix is therefore:
0111
1011
1101
011
101
110
111
0111000
1010100
1100010
1110001
T PPG
1000111
0101011
0011101
C
PT Identity
matrix
Identity
matrix
Multiple Parity Check Bits (6)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Let‟s revisit slide 25. The encoded codeword is d = [1 1 1 0 0 0 1].
Case 1 (no error): In this case, the receiver receives r = [1 1 1 0 0 0 1].
The syndrome vector is all zero:
0
0
0
1000111
0000011
0000101
1
0
0
0
1
1
1
1000111
0101011
0011101
s
Instead of performing 1 0 1 0 0 0 0 = 0, you could perform a
regular addition, giving you 2, divide that by 2, and output the remainder
(which is 0) as the final answer.
Similarly, instead of performing 1 1 1 0 1 1 0 = 1 (say), you
could perform a regular addition, giving you 5, divide that by 2, and output the
remainder (which is 1) as the final answer.
Dividing by 2 and outputting the remainder is „modulo 2 addition‟.
Multiple Parity Check Bits (7)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
The encoded codeword is d = [1 1 1 0 0 0 1].
Case 2 (second bit in error): In this case, the receiver receives r = [1 0 1 0
0 0 1]. The syndrome vector is:
Next, the rx looks for that column in the parity check matrix C which is
identical to the syndrome vector. In this case, the second column of C is
identical to the syndrome vector, and therefore, it concludes that the error
is in the second bit of r and simply flips it back to what was sent. That this
logic is always valid can be proven mathematically, but we will not do that.
1
1
0
1000101
0000001
0000101
1
0
0
0
1
0
1
1000111
0101011
0011101
s
Multiple Parity Check Bits (8)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
The encoded codeword is d = [1 1 1 0 0 0 1].
Case 3 (seventh bit in error): In this case, the receiver receives r = [1 1 1
0 0 0 0]. The syndrome vector is:
This syndrome vector appears in the seventh column of the parity check
matrix, and therefore, the error is in the seventh bit. Again, error
localization successfully done.
1
0
0
0000111
0000011
0000101
0
0
0
0
1
1
1
1000111
0101011
0011101
s
2-D Parity Check Bits (1)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Data need not be one dimensional (vector). Images are 2-D, and 2-D
parity check schemes can be used. Alternately, you can even reshape a
data vector (say of length 16) into a 44 matrix and use a 2-D parity
check scheme. The recipe is as follows:
• Add single parity bit to each column.
• Add a final “parity” column
1 0 0 1 0 0
0 1 0 0 0 1
1 0 0 1 0 0
1 1 0 1 1 0
1 0 0 1 1 1
Bottom row consists of
(even) parity check bits for
each column
Last column consists
of (even) parity check
bits for each row
(even) parity bit for row
and column parity bits
2-D Parity Check Bits (2)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
1 0 0 1 0 0
0 0 0 1 0 1
1 0 0 1 0 0
1 0 0 0 1 0
1 0 0 1 1 1
1 0 0 1 0 0
0 0 0 0 0 1
1 0 0 1 0 0
1 0 0 1 1 0
1 0 0 1 1 1
1 0 0 1 0 0
0 0 0 1 0 1
1 0 0 1 0 0
1 0 0 1 1 0
1 0 0 1 1 1
1 0 0 1 0 0
0 0 0 0 0 1
1 0 0 1 0 0
1 1 0 1 1 0
1 0 0 1 1 1
Arrows indicate failed check bits
Two errorsOne error
Three
errors Four errors (undetectable)
1, 2, or 3 errors can
always be detected; Not
all patterns involving > 4
errors can be detected
2-D Parity Check Bits (3)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Can 2-D codes correct any error? Consider an 1- bit error pattern.
1 0 0 1 0
0 1 0 0 1
0 0 1 1 0
1 1 0 1 1
0 0 1 1 0
0 0 0 1 0
0 1 0 0 1
0 0 1 1 0
1 1 0 1 1
0 0 1 1 0
parity mismatch
parity mismatchmust be in error
Error Correction – Hamming Code (1)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
A Hamming code prescribes a certain way of inserting
multiple parity check bits (the locations and how to
compute them). The check bits are generally interleaved
with the data bits (as opposed to appending at the end).
Interleaving with the data bits affords better protection
against burst errors.
Several popular Hamming Codes are (7,4), (15,11),
(31,26) and (63,57). The notation (15,11) means that 4
check bits are provided for data vectors of length 11 so
that the encoded codeword is of length 15.
Example – (7,4) Hamming Code
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Error Detection – Interleaved Parity Bits
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
As mentioned before, interleaving of N parity bits can
detect burst errors up to length N. Column 1 is
transmitted first, then the second, the third, etc.
Error Detection – Checksums (1)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Several Internet protocols use check bits to detect
errors in the header (or in the header and data).
A checksum is calculated for header contents and
included in a special field.
Checksum recalculated at every router, so algorithm
selected for ease of implementation in software
Let header consist of L number of 16-bit words,
[b0, b1, b2, ..., bL-1]
The algorithm appends a 16-bit checksum bL
[b0, b1, b2, ..., bL-1, bL]
Error Detection – Checksums (2)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
The checksum bL is calculated as follows:
Treating each 16-bit word as an integer, find
x = (b0 + b1 + b2+ ...+ bL-1) modulo 2 16-1
The checksum is then given by:
bL = - x modulo 2 16 -1
Thus, the headers must satisfy the following pattern:
0 = (b0 + b1 + b2+ ...+ bL-1 + bL ) modulo 2 16 -1
The checksum calculation is carried out in software using
one‟s complement arithmetic. N-bit burst errors can be
detected.
Error Detection – Checksums (3)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Using Modulo Arithmetic (can also be done using
binary arithmetic and 1‟s complement) − Assume 4-bit words
− Use modulo 24-1 arithmetic
− Suppose the field to protect has 2 words b0 and b1
− We will add the checksum b2
− Data word b0 = 1100 = 12 (in decimal)
− Data word b1 = 1010 = 10 (in decimal)
− b0+b1 = (12+10) modulo (2 4-1) = 22 modulo15 = 7
− The checksum b2 = -7 modulo 15 = 8 = 1000 (in binary)
− If there‟s no errors, we will always have b0+b1+b2 modulo 15 = 0
Error Detection – CRCs (1)
CRC stands for „Cyclic Redundancy Check‟. Also known as
polynomial codes.
Most data communications standards use polynomial
codes for error detection.
Polynomial codes also basis for powerful error-correction
methods.
Polynomial arithmetic used (modulo 2) : subtraction is the
same as addition, i.e, 0-1 = 0+1 = 1 and 1-1 = 1+1 = 0
Implemented using shift-register circuits.
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Error Detection – CRCs (2)
Binary vectors can be mapped to polynomials.
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
[dk-1 , dk-2 ,…, d2 , d1 , d0] dk-1x k-1 + dk-2x
k-2 + … + d2x 2 + d1x + d0
e.g., [1 1 1] x2 + x + 1 and [1 1 0 1] x3 + x2 + 1 .
Note that the degree of polynomial = length of vector – 1.
(x + 1) (x2 + x + 1) = x(x2 + x + 1) + 1(x2 + x + 1)
= (x3 + x2 + x) + (x2 + x + 1)
= x3 + 1
Addition (or, subtraction):
Multiplication:
(x7 + x6 + 1) + (x6 + x5) = x7 + x6 + x6 + x5 + 1
= x7 + (1+1)x6 + x5 + 1
= x7 + x5 + 1 since 1+1=0 (using modulo 2)
Error Detection – CRCs (3)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Decimal Division:
Polynomial Division:
Error Detection Example – CRCs (4)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Generator polynomial: g(x)= x3 + x + 1, degree = = 3.
Data (k = 4): [1,1,0,0] d(x) = x3 + x2
Step 1: Perform x d(x), which for our example, is x3 d(x) = x6 + x5
(binary = 1 1 0 0 0 0 0). This step is equivalent to appending „‟ (in
this case, 3) number of zeros to the data vector.
Step 2: Divide the polynomial obtained in Step 1 by g(x), such that
the degree of the remainder polynomial is less than the degree of
g(x). Let the remainder be called r(x). In our case, r(x) = x.
Step 3: Add (or subtract) r(x) to the polynomial obtained in Step 1,
to obtain the polynomial b(x) = x6 + x5 + x. b(x) is the polynomial
representation of the encoded data vector (which is transmitted),
the binary representation of which is [1 1 0 0 0 1 0]. This step
ensures that the transmitted codeword is divisible by g(x). Verify!!
Error Detection Example – CRCs (5)
Your book has another example on how the same sequence
of steps can be performed without using polynomials. I don‟t
like this approach.
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Start by adding four
0‟s to the data vector.
Why 4? Because
generator polynomial
is g(x) = x4 + x + 1, so
degree = 4
Add/subtract the
remainder to/from the
dividend
Error Detection – CRCs (6)
So what is the strategy at the rx? By adding/subtracting the
remainder to/from the dividend, we have ensured that the
transmitted codeword/polynomial b(x) will be divisible by
g(x). Alternately, we can view b(x) as: b(x) = q(x)g(x), where
q(x) = quotient and g(x) = divisor.
The rx does just that. It takes the received vector/polynomial
and divides it by g(x). If the remainder turns out to be zero, it
determines that no error has occurred. But will this strategy
always work?
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Error Detection – CRCs (7)
Suppose the received polynomial is z(x) = b(x) + e(x),
where e(x) is the error polynomial, which unfortunately turns
out to be a multiple of g(x), i.e., e(x) = q(x)g(x). Then:
z(x) = b(x) + e(x) = q(x)g(x) + q(x)g(x) = [q(x) + q(x)] g(x),
which is clearly divisible by g(x). Therefore, the receiver will
miss errors which are multiples of g(x).
We will work out an example, but first, note that “channel
flipping bits” “channel adding an error vector”.
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
[1 0 1] Channel
[0 1 1] [1 0 1] [0 1 1] +
[1 1 0] = error vector First two bits flipped by channel
Error Detection – CRCs (8)
Suppose g(x)= x3 + x + 1 and d(x) = x3 + x2 as in slide 45. Then, the
transmitted polynomial is b(x) = x6 + x5 + x (binary is b = [1 1 0 0 0 1 0])
Suppose q(x) = x + 1 e(x) = q(x)g(x) = x4 + x3 + x2 + 1 (binary is e = [0
0 1 1 1 0 1]). We have chosen the error vector/polynomial such that it is
divisible by g(x).
In this case, the rx receives the vector:
b + e = [1 1 0 0 0 1 0] + [0 0 1 1 1 0 1] = [1 1 1 1 1 1]
or alternately, the polynomial b(x) + e(x): x6 + x5 + x4 + x3 + x2 + x + 1
You should verify that the above polynomial is divisible by g(x), the
quotient turns out to be x3 + x2 + 1 Rx fooled, error not detected.
Now repeat the above exercise with e(x) = x + 1 (not divisible by g(x)). Is
this error detected?
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Error Detection – CRCs (9)
CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011
Here are some standard generator polynomials:
• CRC-8:
• CRC-16:
• CCITT-16 (Bluetooth):
• CRC-32 (Ethernet, 802.11):
x8 + x2 + x + 1
x16 + x15 + x2 + 1
= (x + 1)(x15 + x + 1)
x16 + x12 + x5 + 1
x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x2 + x + 1
Reliable Data Transfer
Basics
Suppose you want to transmit 1000 bits over a 100Mbits/sec Ethernet link.
This implies that the transmission time/delay is given by 1000/(100 106 ) = 10-5 sec.
Also, suppose the transmitter and receiver are separated by a distance of 200 m and the transmission medium is copper (the rate at which waves can travel in copper is roughly 2 108 m/s)
This implies that the propagation time/delay is given by 200/(2 108 ) = 10-6 sec
Basics
Purpose of reliable data transfer is to ensure that data is delivered accurately to the upper layer despite errors that might occur during transmission.
Error correction codes are not strong enough to correct all errors. Plus, it won‟t work if a frame is lost completely.
Need mechanisms to recover from errors that are not correctable and from lost frames.
Collectively called ARQ (Automatic Repeat Request) protocols.
Basics
Essential components of any ARQ protocol:
- Receiver acknowledgements : a small control frame sent by the receiver to let the transmitter know that it has received an earlier frame. An ACK packet can be sent individually or piggybacked on a data frame that the receiver might be sending to the transmitter.
- Timeouts : if the transmitter does not receive an ACK after a reasonable amount of time after sending a frame, it retransmits the frame.
Basics
Three ARQ protocols:
- Stop-and-Wait
- Go Back N
- Selective repeat
We will study only the Stop-and-Wait and Go Back N ARQ protocols in class.
Acknowledgements can be either positive (“OK, got that”) or negative (“Please repeat that”). Can also be positive but implicit (“Send me the next one”)
Positive acknowledgements – ACK
Negative acknowledgements - NAK
Stop-and-Wait
We will use timelines to illustrate the operation of the protocol.
frame
ACK
time sender receiver
timeout
• Everything works correctly: the ACK is received before the timeout period.
• Even if other frames are ready at the transmitter, it will not send those before hearing about the fate of the last frame.
Stop-and-Wait
“The case of the lost data frame”.
If a data frame is lost, the transmitter waits for the first timeout period (no ACK can come since the receiver isn‟t even aware of the frame) and retransmits the frame.
frame
ACK
time sender receiver
timeout x
frame
timeout
Stop-and-Wait
“The case of the lost ACK frame”.
If the ACK frame is lost, the transmitter waits for the first timeout period and retransmits the frame.
frame
ACK
time sender receiver
timeout x
frame
timeout
ACK
Stop-and-Wait
“The case of the inadequate timer”.
The timer times out before the ACK comes in. The transmitter retransmits the frame, and the receiver again sends an ACK.
frame
ACK
time sender receiver
timeout
frame
timeout
ACK
Stop-and-Wait
Consider the last two cases where the receiver gets two copies of the data frame (due to lost ACK or delayed ACK).
How does the receiver know that the second frame is a copy of a previously transmitted frame or a new frame? If it cannot make that decision, it will pass on both the frames to the IP layer.
Soln: Add a 1-bit sequence number (0 or 1) in a sequence number field, for both the data frame and the ACK frame.
Sequence numbers included by the transmitter and receiver are incremented (when incremented) modulo 2.
Stop-and-Wait SNT = seq. number at the transmitter
SNR = seq. number at the receiver (implicit ACK, sends the seq. number of the frame it is expecting)
SNT / SNR state transition diagram
Request for frame 1 arrives at transmitter
[SNT,SNR] = [0,0] [SNT,SNR] = [0,1]
[SNT,SNR] = [1,0] [SNT,SNR] = [1,1]
Frame 0 arrives error free at receiver
Frame 1 arrives error free at receiver
Request for frame 0 arrives at transmitter
Stop-and-Wait Ex 1: Using a 1-bit sequence number (perfect
scenario)
Frame 0
Send 0
time sender receiver
timeout
Frame 1
timeout
Send 1
Frame 0
Send 1
Frame 1
Send 0
these two frame 0‟s are different
these two frame 1‟s are different
timeout
timeout
Stop-and-Wait
Ex 2: Using a 1-bit sequence number
Frame 0
Send 1
time sender receiver
timeout
Frame 0
timeout
Send 1 x
Frame 1
Send 0
copy of previous frame
new frame
first frame
Stop-and-Wait
Ex 3: Using a 1-bit sequence number
Frame 0
Send 1
time sender receiver
timeout
Frame 0
timeout
Send 1 x
Frame 0
Send 1
copy of original frame
second copy of original frame
original frame
x
Frame 1
Send 0
new frame
Stop-and-Wait
Ex 4: Using a 1-bit sequence number
What action will the transmitter take if the channel changes “Send 1” to “Send 0” ?
receiver
Frame 0
Send 0
time sender
timeout
Frame 1 timeout
Send 1
Send 1
Frame 0
Send 0 timeout
Stop-and-Wait
How much time is spent from the instant the first bit enters the channel to the instant the last ACK bit is received by the transmitter?
(1) AB = prop. delay (2) BC = data frame trans. delay (3) CD = proc. delay
(4) DE = ACK frame trans. delay (5) EF = prop. delay (6) FG = proc. delay
transmitter
receiver
1st bit in last bit in last ACK bit in
A B C D E F G
Transmitter/Forward channel idle (think (in)efficiency!!)
Stop-and-Wait
Expression for Total Delay (TD)
TD = (2 proc. delay) + (2 prop. delay) + LD/R + LA/R
where R = link data rate, LD = no. of bits in data frame, LA = no. of bits in ACK frame, prop. delay = propagation delay, proc. delay = processing delay.
Expression for Effective Information Transmission Rate (EITR)
EITR = number of information bits delivered / TD
= (LD – LO) / TD
where LO = number of overhead bits in data frame
Stop-and-Wait
Expression for Transmission Efficiency (TE)
- LO term in numerator loss in efficiency due to addition of overhead bits
LA term in denominator loss in efficiency due to necessity for ACK frame
2 (proc. delay + prop. delay) R reflects delay bandwidth product
Processing delays usually negligible.
Stop-and-Wait extremely inefficient if propagation delays are large (why??).
RLL LL
AD
OD
delayproc.delayprop.
2 TE = EITR / R
Stop-and-Wait
Expression for Transmission Efficiency (TE) in the presence of errors, TEerror - Let Pf be the probability that a frame has errors.
- On average, a frame needs to be transmitted 1/(1-Pf) number of times.
- On average, effective total delay to transmit 1 frame:
Eff. TD = TD / (1-Pf)
- EITR in the presence of errors is given by:
EITRerror = (LD – LO) / Eff. TD = (LD – LO) (1-Pf) / TD
- TEerror = EITRerror / R = TE (1-Pf)
where TE is the transmission efficiency without any errors, as derived in slide 17.
Basics of Go-Back-N protocol
Falls under the category of „sliding window‟ protocols.
The receiving DLC (data link control) operates in essentially the same way as Stop-and-wait ARQ. It accepts packets only in the correct order and sends request numbers back to the transmitting DLC.
The ACK sent is an implicit ACK (SNR = 2 send me packet with sequence no. 2, implicitly acknowledging received all packets up to sequence no. 1)
In contrast to Stop-and-Wait, the transmitting DLC (data link control) can send several successive packets without waiting for the next packet to be requested. The packets are numbered sequentially.
Basics of Go-Back-N protocol
The go-back number, N, is a parameter that determines how many successive packets can be sent in the absence of a request for a new packet.
Specifically, the transmitting DLC is not allowed to
send packet (i + N - 1) before packet (i - 1) has been
acknowledged, or, packet i has been requested.
The transmitter therefore maintains a window of the frames it can transmit. The window is of length N.
- when the transmitter receives a packet with
SNR = i, its window is [i, i + N - 1].
- when the transmitter receives a packet with
SNR = i + 1, its window slides upward to [i + 1, i + N].
Example of Sliding Window Opn. at Transmitter (no error) Generally, data frames are sent in both directions; so, the
receiver ACK frames are piggybacked. For simplicity, we assume that the receiver has no data frames and therefore sends only ACK frames.
We assume that SNT and SNR (shown in blue) can grow without bound.
76543210
[0, 3] [1, 4] [2, 5] [3, 6]
1 2 3
sender
receiver
………
window
4Go Back 4
Ex. (1) of Sliding Window Opn. at Transmitter (forward error) Frame 0 is received in error.
The receiver keeps on requesting frame 0. Frames 1, 2 and 3 are discarded, even though they have been received correctly.
The transmitter exhausts its first window, times out and goes back to frame 0 since it is yet to receive a request for frame 1.
3210
[0, 3] [1, 4]
0 0 0
sender
receiver
………
window
0
3210
[0, 3]
1
4
Go Back 4
T.O
Ex. (2) of Sliding Window Opn. at Transmitter (forward error) Frame 2 is received in error. The receiver keeps on requesting
frame 2. Frames 3, 4 and 5 are discarded, even though they have been received correctly.
The transmitter exhausts its window, times out and goes back to frame 2 since it is yet to receive a request for frame 3.
Go Back 4
3210
[0, 3] [2, 5]
1 2 2
sender
………
window
2
32
[2, 5]
3
44
2
[1, 4]
5
2
5 6
[3, 6]
receiver
T.O
Ex. (3) of Sliding Window Opn. at Transmitter (lost ACK) ACK for frame 0 is lost. However, if the sender receives an ACK
for frame 1 (requesting for frame 2), it can deduce that both frame 0 and frame 1 must have been received properly, as otherwise, it would not be asking for frame 2. Consequently, when the request for frame 2 arrives, the sender can slide its window upward to [2, 5]. Note that the lower end of the window moves from 0 to 2.
Go Back 4
3210
[0, 3]
1 2 3
sender
………
window
4
4
[2, 5]
5
[3, 6]
receiver
x
6
Do we have an idea for an improvement?
In the previous example (slide 24), the receiver has to discard (by definition of a Go Back N protocol) frames 3, 4 and 5, even though they have been received correctly.
Wouldn‟t it be better if the receiver could buffer those frames so that the transmitter could only selectively repeat frame 0 (in general, those frames that are not received correctly)?
Questions
What is the maximum number of frames the transmitter needs to store in its buffers in a go back N protocol?
How should you choose N?
- very high N (a) excessive retransmissions may result even if a single frame within a window is received in error (b) need very high buffer capacity.
- very low N link underutilization (N = 1 is effectively stop-and-wait)
- N should be chosen high enough so that the link is properly utilized.
Are retransmissions triggered by errors only?
We consider data transfer in both directions.
What if some data frames from B A are really long?
The sequence numbers sent by B are piggybacked on the data frames.
06543210
6300
SNA
SNB
why is B still asking for frame 0?
A
B
Go Back 7
A Lower Bound on N
(N-1) frame transmission time round trip time (RTT)
Ideally, you would like the ACK for frame 1 come in right when the sender is just finishing transmission of frame N
…… sender
receiver
Transmission time for (N-1)
packets
RTT
1 2 N3
time when last bit of frame 1 leaves tx
time when last bit of frame 1 arrives at rx
time when ACK for frame 1 is received at tx
Questions
If the transmitter wants to reuse sequence numbers - i.e., the sequence numbers are computed modulo M (e.g, for M = 4, the frames are numbered 0, 1, 2, 3, 0, 1, 2, 3, etc.):
what should M be w.r.t N ?
Key fact: M N + 1
The proof is involved, we will not go over it.
Instead, we will provide an example to illustrate the contradictions faced by the receiver if M N
Typically, M = 8 (3 bit sequence numbers) is used in the standard DLC protocols. An alternative M = 128 (7 bit sequence numbers) is also provided.
Problems with M N
Assume that data frames are sent in one direction. The receiver responds with only ACK frames. Receiver sequence numbers shown in “blue”. Also, let M=N=4
3210
[0, 3]
1 0
sender
receiver
………
window [0, 3]
x x2 x x3
3210
can this frame be properly identified by the receiver?
T.O
Effect of very high M (and N) Suppose N = M-1 and M >> 1
If one forward frame is in error, it could potentially involve retransmission of N frames (if the sending DLC waits to exhaust its window of N frames before retransmitting)
Waiting to go back and start retransmission when a forward frame error occurs is not a good idea (in terms of link utilization and delay).
However, waiting to start retransmission is good if
(a) the propagation delay is large, or
(b) the receiver has particularly long data frame(s), or
(c) if there is an error in the ACK frame or receiver data frame with the ACK piggybacked
Effect of very high M (and N)
A possible solution: Use a timer with each packet. If a packet is not acknowledged within a fixed time-out period after transmission, it is retransmitted. How do we calculate this time-out period? (next slide)
Another possible solution: The receiver sends back a short supervisory frame upon receiving a frame in error (assuming that it typically sends ACK‟s piggybacked on data frames). This allows the sending DLC to go back much sooner in the event of a forward error, thereby avoiding excessive retransmissions.
Calculation of time-out period for one data frame We assume that data frames are transmitted in both directions.
We also assume that Y‟s frame transmission starts just before the first bit from X‟s data frame comes in.
(1) AB = propagation delay (2) BC & CD = transmission delay for B‟s data frames (where‟s X‟s transmission delay?) (3) DE = propagation delay (4) EF = processing delay (where‟s Y‟s processing delay?)
X
Y
A B C D E F
the ACK for X‟s data frame goes in this frame sent by Y
time-out period
Performance of Go Back N
We will analyze the performance of the Go back N using a simplistic assumption.
We assume that the window size and ACK delivery times are such that the channel is full all the time.
Let any data frame (say the green frame) be received successfully on the kth attempt we will have (k-1) blocks of N frames, followed by a successful green frame.
(k-1) = 2 blocks of N = 3 frames
3rd time successful (k = 3)
Performance of Go Back N
Let LO be the number of overhead bits in a data frame of length LD bits.
Let Pf be the probability that a frame has errors and tf be the transmission time (= LD/R) for each frame, where R is the link transmission rate in bps.
You expect to transmit a frame 1/(1 - Pf) number of times. Of these, the last one is successful, and the previous attempts, which is one less than 1/(1 - Pf), resulted in N frames (see previous slide).
Expected Total Delay (ETD) for delivering a frame is:
f
fD
f
DD
P
P N
R
L
P N
R
L
R
L
1 11
1
1 ETD
Performance of Go Back N
Expression for Effective Information Transmission Rate in the presence of frame error (EITRerror) :
EITRerror = number of information bits delivered / ETD
= (LD – LO) / ETD
Expression for Transmission Efficiency (TE) in the presence of errors, TEerror
f
OD
D
f
PN
LL
L
P
R )1(1
-1EITR TE errorerror
Go Back N ARQ / Flow Control
Till now, we have discussed how Go Back N can be used for reliable data transfer. Now, we will discuss how the same protocol can provide some flow control.
In a Go Back N protocol, at most N frames are outstanding in the network at any given time, ignoring errors and retransmissions. As congestion builds in the network and delay increases, acknowledgements are delayed and the source is effectively slowed down.
Also, if the destination wishes to receive packets less rapidly, it can simply delay sending its ACK‟s (see next slide). But is it always a good idea?
Go Back N ARQ / Flow Control
Example: Receiver sends ACK‟s only after the transmitter‟s window (length „2‟ in our example) is exhausted.
Let tf be the transmission time for each frame.
2 frames are sent over tcycle seconds the effective
frame transmission rate = 2/tcycle frames/sec.
What if the ACK is unduly delayed due to congestion (no errors) or lost (no congestion)
tcycle
Go Back N ARQ / Flow Control
In practice, the receiving DLC sends a special supervisory frame, called the Receive-not-Ready (RNR) frame.
The RNR frame provides some flow control by informing the transmitter that the receiver is unable to accept any more frames. There‟s also a Receive-Ready (RR) frame to indicate the willingness of the receiver to accept more frames.
The RNR frame is sent in addition to the ACK frames. Essentially, therefore, flow control and acknowledgement functions are decoupled.