DEPARTMENT OF COMPUTER SCIENCE
CS 11C8S Computer Communications Error Recovery for PointtoPoint Links
Fall
This is the third of probably lectures on the Data Link Layer In the last two lectures we studied the two b ottom sublayers of any Data Link framing and error detection Framing was used to transform a stream of bits into units called frames and error detection added checksums to these frames to detect whether any bits in a frame are corrupted In this lecture we will study
error
each hop
recovery which go es b eyond error detection by correcting for lost and corrupted packets on
Why Error Recovery
In our sublayer mo del of a Data Link we said that error recovery was an optional sublayer of some older p ointtop oint links Thus in this lecture we will restrict ourselves to p ointtop oint Data Links However if hopbyhop error recovery is largely historical why b other to study it Here are some reasons
required to understand error recovery will not b e wasted transp ort proto cols use
The eort
essentially
will only talk ab out the dierences in endtoend error recovery
the same ideas for endtoend recovery When we come
to transp ort proto cols we
Many older proto cols like HDLC use these techniques and are Error recovery furnishes us with the rst nontrivial example
still b eing used
of a proto col and its p ossible loss and no de crashes
Endtoend recovery is no longer p opular b ecause of the endtoend argument and the fact that mo dern b er links have very low error rate This may change on certain lowbandwidth links b etween mobile computers b ecause the error rate may b e quite high on such links When the wheel of time moves around again you should b e ready In fact in this years SIGCOMM there were several prop osals for doing hopbyhop error recovery on mobile links
What must Error Recovery Guarantee
problems due to varying message
delay and errors eg frame
As in the gure error recovery must guarantee that the packets given for transmission to the sending data link must b e delivered to the receiver without duplication loss or misordering Duplication means including extra copies ie delivering a a b c Loss means missing a data item ie
delivering b c Misordering means p ermuting
at the transp ort layer means much the same
and receiver we will talk ab out no de crashes next lecture
UCLA WASHINGTON UNIVERSITY
the sending order ie delivering b c a d Recovery thing So far we are ignoring crashes at the sender
a,b,c,d
a,b,c,d
SENDER
RECEIVER
Figure Reliable inorder transmission
Avoiding loss is of course useful on very high errorrate physical links Avoiding reordering and duplication can avoid extra work for the endtoend transp ort proto col to reject duplicates and to reorder the stream of packets Some sp ecial applications like video and audio may prefer not to
run
ab ove a transp ort proto col for eciency and denitely cannot tolerate misordering
Assumptions
We
connected by two p ointtop oint physical links in each direction as shown in the gure We assume
assume that error recovery is b eing done b etween a single sender and a receiver that are
that error recovery is
The undetected is small enough
layered ab ove the error detection sublayer We assume that
error rate probability of a corrupted frame b eing passed by error detection
Whole frames can bits in a frame are
to
b e ignored
b e lost in a way that may not b e detected by error detection ie if all lost
However the physical layer is FIFO If bit is sent b efore b then if b oth bits are received bit
b is received b efore b
The delay on the physical links is arbitrary and can vary from packettopacket
The last assumption needs to b e explained The delay on a single brelink is pretty constant However if the physical link is going through another level of switching there is a way of switching bre links known as SONET then the delay can change Also if there is multiplexing going on at the physical level the delay may vary Second the timers on most op erating systems are fairly inaccurate and the result is that one cannot measure time very precisely Third its nice if you can have a proto col that do es not dep end on constant delays to work correctly
Proto col Play
Supp ose we try the following sequence of proto cols
First the sending Data Link just sends the sequence of received packets as frames Clearly if a frame is lost we can have omissions
Second we have the receiver send back a sp ecial ACK frame with no other information for every data message it receives If the sender do es not get an ACK within a timeout p erio d the sender retransmits This can cause duplication If a is sent and is received but the ACK is lost then the sender will resend a and the receiver will output aa
Third we can try and detect duplicates by rejecting any frames whose data is the same as that of the previous frame But the data sent may contain the same value in successive frames
and so this would
cause deadlo ck
sequence numb er that is attached to each frame The sender starts with
Fourth we add a
sequence numb er
resent by the sender carries the current sender
frames that have the same sequence numb er as the receiver numb er On successful
and the receiver starts with sequence numb er Each frame
sequence numb er The receiver only accepts receipt numb er and sends an ACK On receipt of an ACK the sender increments its numb er and sends the next packet This proto col can
the receiver accepts the frame
increments its
dead lock if the
accept the new packet and send an ACK
rst ACK is lost the sender will keep retransmitting but the receiver will not
The obvious way to x the previous problem is to have the receiver send back an ACK frame even if it gets a duplicate However now the proto col may lose a packet Supp ose the sender wants to send a b The sender sends a and the receiver sends an ACK The sender times out b efore the ACK is received and resends a The receiver gets the duplicate and do es not accept it but sends a second ACK Assume now that the rst ACK is received by the sender and the sender sends b If b is lost and the second ACK arrives the sender will assume b was successfully sent and move on to c Thus we have an omission
The last problem is b ecause an ACK for an older frame is b eing mistaken for an ACK for a recent frame An obvious way to x this last problem is to add a sequence numb er to ACKs as well an ACK should carry the receiver numb er The sender should only accept ACKs with
further
transmission of frames and wait for an ack The co de for
the proto col
is given b elow
numb er greater than
The proto col plays are
its own sequence numb er
shown schematically in Figure Figure and Figure
Thus we conclude that we need retransmission for loss sequence numb ers to detect duplicates and misordering we need to send acks back even on receipt of duplicate data and the acks must
also b e
around The resulting proto col is called the
numb ered Assume the sequence numb ers are large integers say bits that never wrap stopandwait proto col b ecause the sender has to stop
Stop and Wait Co de Here is the sender co de
Sender
Sender
has a variable SN for sender repeats the following loop
number
initially
sent or
Node S
a, b, c
Time
Ret
Figure No retransmission leads to loss retransmission without checks leads to duplications
Accept a new packet from the higher layer if available and store it in Buffer B
Transmit a frame SN B
If an errorfree ACKR frame is received and R is not equal to SN then
SN R
Go to Step
Otherwise if the previous condition does not occur with an arbitrary timeout period go to Step after the timeout period
a b
Loss
a, b, c
a
b
Node R
Node S
Node R
a
Here is
the receiver co de
Receiver
Receiver
When an errorfree data frame S D is
If S RN then
Pass D to higher layer RN RN
Send Ack RN
a
Duplication
has a variable RN for receiver number initially does the following code
a ack
received
Node S
Node R
a, a,
a
Ret
ack
a
a
Reject
Reject Reject
Time
Ret
ack a,0
a,0
Livelock
a
Node S
a, b 0
a,0
a
Livelock
Node R
0 a,0 1a
Figure contains
Stop and Wait Example
Trying
the same data
duplicates by rememb ering previous data item leads
to livelo ck
leads to livelo ck
to
b e sent
to suppress
item twice Not sending an ack for a message already received
when data
Figure shows a timespace diagram that depicts an example of stop and wait b ehavior increases downwards in the gure as in all such gures in this course unless explicitly said so Thus a message sent from the sender S to the receiver R will b e a line from left to right and vice versa In the gure notice that the second data item is lost and then retransmitted The convention for lost frames is that the line do es not quite reach the other end sometimes I add a cross to indicate the message was lost The third data item is retransmitted to o early causing an extra ack However the rst ack is lost but the second one is received
Reject
Reject Reject
We notice some interesting prop erties from the gure
gets to numb er n there are no frames with numb er n or acks with
and the receiver is at numb er n ie exp ecting
that frame n will correctly b e received This is
a copy of the data corresp onding to numb er n is
send an ack for n Since the receiver keeps sending acks for every sender retransmission the sender will eventually get an ack and go onto numb er n
Another thing to
receives frame n the
must have ushed out the reverse link of all lower numb ered acks
the right numb er b ecause the sender
Notice that by the time the sender rst numb er n in the channel
If this is so it is easy to see will keep retransmitting until received until that happ ens the receiver will not
notice is that b ecause the physical channels are FIFO when the receiver
receiver can b e sure that the sender has received
a n numb ered ack which Similarly the data frame
Time
Time
a,0
a,0 1a
Node S
a, b, c 0
Node R
0
Ret
ack
b,1
ack
c,2
Loss
Reject
Figure
Not
numb ering acks
with a
sequence
Time
2
A,3
3
numb er
leads to loss as
old acks can
b e
mistaken
for more
recent
acks
Node S Node R
0 D,0 0
A,1
1
D,1 D,1
A,2
1
2
D,2
D,2 3
Figure An
example
of Stop
and Wait
Behavior
numb ered n the receiver lower numb ers values that are
Thus there
the sender and receiver
or unequal But if the two numb ers b eing compared are always consecutive
check their Least Signicant Bit LSB But in that case we can replace the large sequence numb er by a single bit that is incremented mo d This is indeed true the resulting proto col is called the alternating bit proto col
Our argument of the correctness of stopandwait was based on the statement that when the sender rst reaches numb er n there are no data frames numb ered n in the forward link and no acks numb ered n in the reverse link and the receiver is at n But how do we prove such statements esp ecially across the innite numb er of p ossible executions recall that retransmission and message delivery times and losses are arbitrary leading to an innite numb er of p ossible combinations Similarly we argued informally that there are only two sequence numb ers present in any state and
ushed the forward link of all lower numb ered frames Thus
at the instant numb er n all like n all
must have
receives a data frame with numb er n the entire system only contains
must have disapp eared Thus when the
receiver creates a new value system
n or
are only p ossible consecutive sequence
less have b een ushed from the
proto cols when we compare two numb ers we only
notice that in they are equal numb ers it suces to
numb ers in any state Now
check if
so we can get by with a single bit Can we prove such a statement with more condence than the casual errorprone way we argued in the last two paragraphs There is a b etter way known as invariants which we study next
Invariants
In order to understand what an invariant is we need to understand what a global state of a proto col is To do so we consider a simple proto col b etween a sender S and a receiver R The sender has only one variable i and the receiver has only one variable j Initially b oth variables are This is
not a Data Link Proto col The proto col starts by S
arrives at R R increments
this simple proto col example that no frames ever get lost frame M and the cycle continues
as in
usual time ows downwards
However there is another way to lo ok at executions of a proto cols
terms of
transitions b etween
Node S
global
states This
Node R j=0
shown on the right in the same gure
00 M
its
just a trivial example
sending a frame M to R Frames and acks are never lost When M
j
Then R sends back an ack to
S When the ack arrives we assume in at S S increments i and sends another
A sample execution of this proto col is shown on the left in Figure This is a timespace gure
to help you understand invariants
i=0
Time
i=1
Figure
M
Ack M
TIME VIEW
j=1
j=2
0
0
A
0
1
M 11
GLOBAL STATE VIEW
to of
the state of R
TimeSpace and State
Transition Views of a Simple Proto col
The state the proto col
j The state of a link is the sequence of frames assuming the link is
of a
For example the state of no de S is the value of
no de at any instant
variables that are
is simply the values of all no de i and
relevant is the value FIFO that have b een sent
on the link but not received Thus b etween the time a frame M is sent by S and the time frame M is received by R M is stored on the link During this time the state of the link is the frame M If you like you can think of a link as a queue variable a send inserts at the rear of the queue a receive removes the head of the queue The state of the queue is just the sequence of items ie frames stored in the queue Finally the global state of a proto col at any instant is the state of all no des and all links in the network On the right of Figure we show the global states corresp onding
to the time space picture on the left Notice that during large actions o ccur the global state remains unchanged
p ortions of time when no proto col
Now let us ask ourselves a question Is it ever p ossible to have a global state in which i and j when the proto col is working correctly After some thinking you will probably answer no Thus some global states are possible while some are not This is where invariants come in The
invariants you really
In the
of a proto col describ e the p ossible states of a proto col when it is working correctly If
understand a proto col you should b e able to write down its example ab ove a p ossible invariant is
invariants
If there is a frame M in the channel from S to R OR if both channels are empty
ij
Else if there is an ack in the channel from R to S THEN
ji
could use For a frame M on the link from S to R and an Ack on the link from R to S This is clearly imp ossible and if p ossible
would cause a contradiction in the invariant stated ab ove So we need one more invariant
The channel from S to R is either empty or contains a frame M The channel from R to S is either empty or contains a frame A If a channel contains a frame then the other channel is empty
This is correct as far as it go es but it is not the strongest p ossible invariant one
example the previous invariant allows us to have a state in which there is b oth
Now we have stated a
complete a job as we do here we just settle for describing some imp ortant invariants Similarly when you do the problem in HW just ask yourself whether certain global states are p ossible These and similar questions should help you write down some useful invariants
Once you have written down invariants you can prove them by induction You assume the
complete description of the proto cols invariants Often we cannot do as
invariant
is true in a state s and show it is
transition must b e the result of a
is where the p ower of invariants
actions we argue ab out the eect of a nite set of actions
is initially true This is what proto col initialization must guarantee Then assume it true in any p ossible state that can follow s But any such state proto col action so we need only consider proto col actions This come in Instead of arguing ab out a p otentially innite set of
For example in our example the only interesting actions are that of initially sending a frame receiving a frame and receiving an ack For example consider receiving an ack by the rst invariant we know that j i in the previous state and by the second invariant we know that there is
no frame in the link from S to
R But after receiving an ack we remove the ack from the channel It is easy to see that this action leaves the invariant true b ecause
from S to R and the reverse channel is empty
increment i and
after the action i j
Band Invariant
for
can b e used to show that the proto col guarantees reliable FIFO
StopandWait
send
a frame
there is a frame
Finally what use are invariants
more imp ortantly they can external users which is all
the band invariant shown next delivery
They do help us understand how a proto col b ehaves Far prove prop erties ab out the services the proto col provides to that users care ab out For instance in the stopandwait bit proto col
b e used to
First you should notice that the
quite complex When drawing the global states consider only
acks If you lo ok at Figure you see that from a single global state where a frame is still in transit we can go to three dierent global states We can lose a frame retransmit a frame recall that the retransmit timers are arbitrary or deliver the rst frame Each takes us to a dierent global state
of copies
of a
frame and
arbitrary numb ers of acks
global states of even a simple
proto col like stopand wait can b e the numb ers in transit frames and
With arbitrary retransmission timers we can get to states where there are an
unb ounded numb er
Retransmission
STOP AND WAIT
11
D(1) 11
Error
Reception
global see Figure
D(1) D(1) 111112
Figure
states are p ossible After some thought
as two bands of equal numb ers The rst band
the p ossible the receiver and go es clo ckwise assuming the lower link is the reverse link on which acks are sent until it either go es all way around or meets a second band of numb ers The numb ers in the rst band are always equal to or greater than the numb er in the second band The second band grows larger until it swallows up the entire ring and then starts again as a second band Its a little like the song Band
on the Run
However global states
not all
we can
describ e starts at
We can prove this invariant by induction We assume it is true initially b oth numb ers equal to and the links empty We need to show that all proto col actions preserve this invariant There are ve cases sending or resending a frame receiving a frame sending an ack receiving an ack and losing a frame or ack The most interesting cases are receiving a frame or ack Consider the
Getting the strongest invariant needed
One can often state to o weak an invariant that cannot b e proved
For example
S While this is true its hard
the band invariant we
to prove by induction
the receiver is at R
S k which will cause S RN
Now in the actual proto col this case
k cannot o ccur acks are only old receiver can only use the inductive
previous state
where an ack with numb er k is on the link while the
Common
numb ers But we cannot use common sense in an inductive proof
receiver is at
sense tells us that
the receiver numb er always increases
and We the
hyp othesis which is just
the assumption that all invariants hold in
1
D(1) D(0) A(1)
1 Example Global State
y
yyy yyxxx
x
Two Bands of equal values
y =x or x = y+1
Figure Band Invariant
case when the receiver gets a frame If the numb er in the frame s is not equal to receiver numb er R N then the frame is rejected This only corresp onds to removing a numb er from either band which preserves the invariant The more interesting case is when s RN In that case if the invariant is true in the previous state the entire band must b e equal to s Thus when the receiver increments to r and sends an ack we now create bands A smaller one starting at the receiver
with values equal to s and the rest of the circle equal to s We can
other cases Try to do so yourself for practice You can easily disp ose of
by observing that removing a numb er or duplicating a numb er do es not aect the invariant thus frame loss and frame sending do es not aect the invariant
stated only the weaker invariant that RN
Consider the case when the sender receives
k If this case is p ossible then the sender will receive this ack and change
Thus in the ab ove example we would at the very least have to add another invariant ie that any acks on the reverse link have numb ers that are no more than the receiver numb er which rules out the crazy case which was causing us problems The band invariant of course is a more succinct invariant that includes these two invariants
In general please note two things a In doing a pro of you can only use the fact that all your listed invariants hold in the previous state b If you nd that there is some case in which your pro of fails you may need to add another invariant to rule out the case that fails
argue many
similarly ab out all of the simple cases
supp ose instead of
an ack with numb er k while
So what go o d is the invariant
The result of the inductive pro of is as follows If the invariant is true initially and every action preserves the invariant assuming it was true b efore it is true in all states Notice that the invariant immediately implies the two prop erties we had argued informally ab out earlier
When the sender rst go es to numb er n on receiving an ack with numb er n the invariant
implies that the receiver is at n any acks in transit in transit have numb er n But this is what we want to
have numb er n and any ensure that frame n will
data frames b e received numb ers in
by the receiver Similarly the invariant guarantees any state which justies the replacement of a large
Alternating Bit Proto col
This proto col is the same as stopandwait except
that there at most two sequence numb er by a
consecutive single bit
Here
is the sender co de
Sender has a bit SN for sender number initially
Sender
Accept a new packet from the higher layer if available and store it
repeats the following loop
in Buffer B
Transmit a frame SN B
If an errorfree ACKR frame is received and R is not equal to SN then
SN R
Go to Step
Otherwise if the previous condition does not occur with an timeout period go to Step after the timeout period
arbitrary
Here is
the receiver co de
Receiver
Receiver
When an errorfree data frame S D is
If S RN then
Pass D to higher layer RN RN mod
Send Ack RN
has a bit RN for receiver number initially does the following code
it uses a bit instead of an integer
received
Rememb er that the use of a single bit works only on FIFO links
other hand have to work over nonFIFO networks and hence cannot get away with only a single bit
Transp ort
proto cols
on the
Retransmission Timers
We can prove that stopandwait works correctly regardless of retransmission timer values That is a nice prop erty b ecause timer settings are often incorrect and if you set it one way and the link changes the proto col will still work However retransmission timer values aect performance Long timers will mean unnecessary delays in recovering from errors short delays will mean extra unnecessary retransmissions
What Next
It is easy to see that stopandwait is inecient on links with large delays like satellite links as you cannot send a second frame until one roundtrip delay elapses You can do b etter by pip elining several frame transmissions b efore a single ack arrives We will study this in the next lecture
Second we assumed that no des and links did not crash and the link was initially empty a bad assumption In the next lecture we will show to initialize an error recovery proto col in the face of
no de
Here
and link crashes All of this discussion applies equally well to transp ort proto cols
Some Proto col Design Lessons
are some ways of thinking that we used in this lecture
Reduction We started with a simpler proto col stopandwait and reduced it to alternating bit by showing some relations among the state variables Often its a go o d lesson to design a simple proto col rst and then optimize later after understanding the simple proto col
Invariants Rather than reason in time and say That must have o ccurred b ecause that must have o ccurred and keep trying to track multiple cases backwards in time we use invariants Invariants are facts that are frozen in time Rather than deal with an innite numb er of p ossible executions we deal with a small xed numb er of p ossible proto col actions that could aect the invariant