3rd Edition: Chapter 3
Advanced Network Technologies
Multimedia 1/2
School of Computer Science
Dr. | Lecturer
1
Multimedia
Multimedia
Streaming stored video
Voice-over-IP
RTP/SIP
Multimedia
20min
3
Multimedia networking: 3 application types
streaming, stored audio, video
streaming: can begin playout before downloading entire file
stored (at server): can transmit faster than audio/video will be rendered (implies storing/buffering at client)
e.g., YouTube, Netflix, Hulu
conversational voice/video over IP
interactive nature of human-to-human conversation limits delay tolerance
e.g., Skype
streaming live audio, video
e.g., live sporting event
4
Stored Is by opposition to ‘live’
Hulu: on demand streaming video service
Multimedia audio
analog audio signal sampled at constant rate
telephone: 8,000 samples/sec
CD music: 44,100 samples/sec
each sample quantized, i.e., rounded
e.g., 28=256 possible quantized values
each quantized value represented by bits, e.g., 8 bits for 256 values
time
audio signal amplitude
analog
signal
quantized value of
analog value
quantization error
sampling rate
(N sample/sec)
Rate=44100 samples/sec * 8bit/sample = 352800 bps
Video: sequence of images displayed at constant rate
e.g. 24 images/sec
Each image: array of pixels: Resolution: e.g. 480*640
each pixel: 3 colors
Red, Green, Blue (RGB)
Each color has 28=256 possible quantized values (8 bit)
Data rate: 8*3*480*640*24 = 177 Mbps. Too large!
Video
6
coding: use redundancy within and between images to decrease # bits used to encode image
spatial (within image)
temporal (from one image to next)
examples:
MPEG 1 (CD-ROM) 1.5 Mbps
MPEG2 (DVD) 3-6 Mbps
MPEG4 (often used in Internet, < 1 Mbps)
MPEG: Moving Picture Experts Group
……………………...…
spatial coding example: instead of sending N values of same color (all purple), send only two values: color value (purple) and number of repeated values (N)
……………………...…
frame i
frame i+1
temporal coding example: instead of sending complete frame at i+1, send only differences from frame i
7
Streaming Stored Video
20min
8
video
recorded (e.g., 30 frames/sec)
2. video
sent
Cumulative data
streaming: at this time, client
playing out early part of video,
while server still sending later
part of video
network delay
(fixed in this example)
time
3. video received
Streaming stored video
4. video played out
Frame 1
Frame 2
Frame 3
Frame 4
…
9
Streaming stored video: challenges
continuous playout constraint: once client playout begins, playback must match original timing
… but network delays are variable (jitter), so will need client-side buffer to match playout requirements
other challenges:
client interactivity: pause, fast-forward, jump through video
video packets may be lost, retransmitted
constant bit
rate video
transmission
Cumulative data
time
variable
network
delay
client video
reception
constant bit
rate video
playout at client
client playout
delay
buffered
video
Streaming stored video: revisited
client-side buffering and playout delay: compensate for network-added delay, delay jitter
buffered
video
11
constant bit
rate video
transmission
Cumulative data
time
variable
network
delay
client video
reception
constant bit
rate video
playout at client
client playout
delay
buffered
video
Streaming stored video: revisited
Buffer underflow!
Cannot be played on time
12
constant bit
rate (CBR) video
transmission
Cumulative data
time
variable
network
delay
client video
reception
constant bit
rate video
playout at client
larger client playout
delay
Streaming stored video: revisited
Increase playout delay: fewer buffer underflows
initial playout delay tradeoff
13
variable fill
rate, x(t)
client application
buffer, size B
playout rate,
e.g., CBR r
buffer fill level, Q(t)
video server
client
Client-side buffering, playout
variable fill
rate, x(t)
client application
buffer, size B
playout rate,
e.g., CBR r
buffer fill level, Q(t)
video server
client
Initial fill of buffer until playout begins at tp
playout begins at tp,
buffer fill level Q(t) varies over time as fill rate x(t) varies and playout rate r is constant
Q(t+1)=Q(t)+x(t), t ≤ tp; Q(t+1)=max[Q(t)+x(t)-r, 0], t> tp
Q(t)+x(t)-r<0: buffer underflow
Client-side buffering, playout
Client-side buffering, playout
playout buffering: average fill rate E(x), playout rate r
E(x) < r: buffer eventually empties (causing freezing of video playout until buffer fills again)
E(x) ≥ r: buffer will not empty, provided initial playout delay is large enough to absorb variability in x(t)
initial playout delay tradeoff: buffer starvation less likely with larger delay, but larger delay until user begins watching
variable fill
rate, x(t)
client application
buffer, size B
playout rate,
e.g., CBR r
buffer fill level, Q(t)
video server
Streaming multimedia: UDP
server sends at rate appropriate for client
often: send rate = encoding rate = constant rate
transmission rate can be oblivious to congestion levels
short playout delay (2-5 seconds) to remove network jitter
error recovery: application-level, time-permitting
RTP [RFC 2326]: multimedia payload types
UDP may not go through firewalls
17
Streaming multimedia: HTTP
multimedia file retrieved via HTTP GET
send at maximum possible rate under TCP
fill rate fluctuates due to TCP congestion control, retransmissions (in-order delivery)
larger playout delay: smooth TCP delivery rate
HTTP/TCP passes more easily through firewalls
variable rate, x(t)
TCP send buffer
video
file
TCP receive buffer
application playout buffer
server
client
18
Streaming multimedia: DASH
DASH: Dynamic, Adaptive Streaming over HTTP
server:
divides video file into multiple chunks
each chunk stored, encoded at different rates
manifest file: provides URLs for different chunks
client:
periodically measures server-to-client bandwidth
consulting manifest, requests one chunk at a time
chooses maximum coding rate sustainable given current bandwidth
can choose different coding rates at different points in time (depending on current available bandwidth)
19
YouTube does not employ adaptive streaming (DASH) but requires the user to manually select a version.
Streaming multimedia: DASH
Chunk1
Chunk2
Chunk3
…
ChunkN
Chunk1
Chunk2
Chunk3
…
ChunkN
Chunk1
Chunk2
Chunk3
…
ChunkN
Low quality
High quality
Bandwidth
Chunk1
Chunk2
Chunk3
…
20
Streaming multimedia: DASH
DASH: Dynamic, Adaptive Streaming over HTTP
“intelligence” at client: client determines
when to request chunk (so that buffer starvation does not occur)
what encoding rate to request (higher quality when more bandwidth available)
where to request chunk (can request from URL server that is “close” to client or has high available bandwidth)
21
Content distribution network
challenge: how to stream content (selected from millions of videos) to hundreds of thousands of simultaneous users?
option 1: single, large “mega-server”
single point of failure
point of network congestion
long path to distant clients
multiple copies of video sent over outgoing link
….quite simply: this solution doesn’t scale
22
Content distribution network
challenge: how to stream content (selected from millions of videos) to hundreds of thousands of simultaneous users?
option 2: store/serve multiple copies of videos at multiple geographically distributed sites (CDN)
23
PoP point of presence
Bob (client) requests video http://netcinema.com/6Y7B23V
video stored in CDN at http://KingCDN.com/NetC6y&B23V
netcinema.com
KingCDN.com
1
Bob gets URL for video http://netcinema.com/6Y7B23V
from netcinema.com web page
2
2. resolve http://netcinema.com/6Y7B23V
via Bob’s local DNS
netcinema’s
authorative DNS
3
4
4&5. Resolve http://KingCDN.com/NetC6y&B23
via KingCDN’s authoritative DNS, which returns IP address of best KingCDN server with video
5
6. request video from
KINGCDN server,
streamed via HTTP
KingCDN
authoritative DNS
CDN: “simple” content access scenario
3. netcinema’s DNS returns URL
http://KingCDN.com/NetC6y&B23V
Local DNS
0. Store the video in CDN
24
CDN cluster selection strategy
challenge: how does CDN DNS select “good” CDN node to stream to client
pick CDN node geographically closest to client
pick CDN node with shortest delay (or min # hops) to client (CDN nodes periodically ping access ISPs, reporting results to CDN DNS)
alternative: let client decide - give client a list of several CDN servers
client pings servers, picks “best”
Netflix approach
25
IP anycast: multiple machines sharing the IP address, useful to direct to closest datacenters (e.g. CloudFare use 12 datacenters with same IP).
http://blog.cloudflare.com/a-brief-anycast-primer
Case study: Netflix
30% downstream US traffic in 2011
Owns very little infrastructure, uses 3rd party services:
own registration, payment servers
Amazon (3rd party) cloud services:
Create multiple versions of movie (different encodings) in Amazon cloud
Upload versions from cloud to CDNs
Cloud hosts Netflix web pages for user browsing
three 3rd party CDNs host/stream Netflix content: Akamai, Limelight, Level-3
26
As opposed to YouTube that runs on Google’s infrastructure, Netflix uses 3rd-party infrastuctures.
Kankan (china) uses bittorrent but the protocol is proprietary.
1
1. Bob manages Netflix account
Netflix registration,
accounting servers
Amazon cloud
Akamai CDN
Limelight CDN
Level-3 CDN
2
2. Bob browses
Netflix video
3
3. Manifest file
returned for
requested video
4. DASH streaming
upload copies of multiple versions of video to CDNs
Case study: Netflix
Master version -> different formats.
homepage
DNS
27
4 major components: the registration/payment servers, amazon cloud, CDN providers, clients.
Netflix runs its VM in Amazon and upload and compress video there, before sending to CDNs.
Voice over IP
20min
28
Voice-over-IP (VoIP)
VoIP end-end-delay requirement: needed to maintain “conversational” aspect
higher delays noticeable, impair interactivity
< 150 msec: good
> 400 msec: bad
includes application-level (playout), network delays
session initialization: how does callee advertise IP address, port number, encoding algorithms?
value-added services: call forwarding, screening, recording
emergency services: 911/000
29
Screening: service to see who is calling before answering
VoIP characteristics
speaker’s audio: alternating talk spurts, silent periods.
64 kbps during talk spurt
chucks generated only during talk spurts
20 msec: chucks at 8 Kbytes/sec: 160 bytes of data
application-layer header added to each chunk
chunk+header encapsulated into UDP or TCP segment
application sends segment into socket every 20 msec during talkspurt
30
8Kbytes are generated and sent every 20msec by the speaker
Spurt: short period of time
8Kbps * 20msec = 160 bytes per second
VoIP: packet loss, delay
network loss: IP datagram lost due to network congestion (router buffer overflow)
delay loss: IP datagram arrives too late for playout at receiver
delays: processing, queueing in network, transmission, proporgation.
typical maximum tolerable delay: 400 ms
loss tolerance: depending on voice encoding, loss concealment, packet loss rates between 1% and 10% can be tolerated
31
Cumulative data
time
variable
network
delay
(jitter)
client
reception
client playout
delay
Sum delay
buffered
data
Delay jitter
32
Jitter: Phenomenon where the time it takes for each packet to be delivered varied because of queuing delays at network routers.
VoIP: fixed playout delay
receiver attempts to playout each chunk exactly q msecs after chunk was generated.
chunk has time stamp t: play out chunk at t+q
chunk arrives after t+q: data arrives too late for playout: data “lost”
tradeoff in choosing q:
large q: less packet loss
small q: better interactive experience
33
VoIP: fixed playout delay
sender generates packets every 20 msec during talk spurt.
first packet received at time r
first playout schedule: begins at p
second playout schedule: begins at p’
34
The first playout schedule has a fixed playout delay of p-r: the 4th packet does not arrive by its scheduled playout time
Adaptive playout delay
goal: low playout delay, low late loss rate
Cumulative data
time
variable
network
delay
(jitter)
client
reception
To many losses
Unnecessary delay
Best
35
di is a smoothed averaged of the observed network delays r1-t1,…, ri-ti.
More weight on the recently observed network delays than on the observed networks delays of the distant path.
Adaptive playout delay
goal: low playout delay, low late loss rate
approach: adaptive playout delay adjustment:
estimate network delay, adjust playout delay at beginning of each talk spurt
silent periods compressed and elongated
adaptively estimate packet delay: (EWMA – exponentially weighted moving average, recall TCP RTT estimate):
di = (1-a)di-1 + a (ri – ti)
delay estimate after ith packet
small constant, e.g. 0.01
time received –
time sent (timestamp)
measured delay of ith packet
(ri – ti)
36
di is a smoothed averaged of the observed network delays r1-t1,…, ri-ti.
More weight on the recently observed network delays than on the observed networks delays of the distant path.
also useful to estimate average deviation of delay, vi :
Adaptive playout delay (cont’d)
estimates di, vi calculated for every received packet, but used only at start of talk spurt
for first packet in talk spurt, playout time is:
vi = (1-b)vi-1 + b |ri – ti – di|
playout-timei = ti + di + Kvi
37
K is a positive constant, for example K=4.
Cumulative data
time
Delay jitter
talk spurt 1
adjust delay
…
talk spurt 2
ri – ti
Determine di + Kvi
di + Kvi
38
Jitter: Phenomenon where the time it takes for each packet to be delivered varied because of queuing delays at network routers.
Adaptive playout delay (cont’d)
Q: How does receiver determine whether packet is first in a talkspurt?
if no loss, receiver looks at successive timestamps
difference of successive stamps > 20 msec ⇒ talk spurt begins.
with loss possible, receiver must look at both time stamps and sequence numbers
difference of successive stamps > 20 msec and sequence numbers without gaps ⇒ talk spurt begins.
39
Adaptive playout delay (cont’d)
20ms
20ms
20ms
20ms
20ms
Spurt 1
Spurt 2
20ms
20ms
20ms
20ms
Spurt 1
Spurt 2
1
3
4
5
0
20
40
100
120
0
40
100
120
40
VoIP: recovery from packet loss
Challenge: recover from packet loss given small tolerable delay between original transmission and playout
each ACK/NAK takes ~ one RTT
alternative: Forward Error Correction (FEC)
send enough bits to allow recovery without retransmission
simple FEC
for every group of n chunks, create redundant chunk by exclusive OR-ing n original chunks
send n+1 chunks, increasing bandwidth by factor 1/n
can reconstruct original n chunks if at most one lost chunk from n+1 chunks
Send x1, x2, x3, … xn, and y=x1 xor x2 xor x3,…, xor xn,
If x3 is lost, can re-compute x3 from x1, x2, x4, … xn, and y
1
0
1
0
1
0
?
0
1 XOR 0 XOR x3 =0
x3 =1
41
FEC: recall two-dimensional parity in Ch. 5
another FEC scheme:
“piggyback lower
quality stream”
send lower resolution
audio stream as
redundant information
e.g., nominal
stream at 64 kbps
and redundant stream
at 13 kbps
VoIP: recovery from packet loss (cont’d)
42
Cope with consecutive bit losses by interleaving them: packet lost can be reconstructed.
Limited application to VoIP as it increases latency
VoiP: recovery from packet loss (cont’d)
interleaving to conceal loss:
audio chunks divided into smaller units, e.g. four 5 msec units per 20 msec audio chunk
packet contains small units from different chunks
if packet lost, still have most of every original chunk
no redundancy overhead, but worse delay performance
1
2
3
4
5
6
7
8
9
11
12
13
14
15
16
10
43
e.g., word missing
e.g. syllable missing, acceptable
VoiP: recovery from packet loss (cont’d)
1
2
3
4
5
6
7
8
9
11
12
13
14
15
16
10
1
2
3
4
5
6
7
8
13
14
15
16
1
2
4
5
6
8
9
12
13
14
16
10
word
Word 1
Word 2
Word 3
Word 4
Subjective feeling is improved
Real-time Conversational
Applications
20min
45
Real-Time Protocol (RTP)
RTP specifies packet structure for packets carrying audio, video data
RFC 3550
RTP packet provides
payload type identification
packet sequence numbering
time stamping
RTP runs in end systems
RTP packets encapsulated in UDP segments
interoperability: if two VoIP applications run RTP, they may be able to work together
46
RTP: Real-time Transport Protocol
RTP libraries provide transport-layer interface
that extends UDP:
port numbers, IP addresses (already existing)
payload type identification
packet sequence numbering
time-stamping
RTP runs on top of UDP
47
RTP: Real-time Transport Protocol
RTP example
example: sending 64 kbps PCM µ-law encoded voice over RTP
PCM: Pulse-code modulation: a method used to digitally represent sampled analog signals
µ-law: Special quatization
Sample rate 8000samples/second
Quantization 8bit/sample
application collects encoded data in chunks, e.g., every 20 msec = 160 bytes in a chunk
audio chunk + RTP header form RTP packet, which is encapsulated in UDP segment
RTP header indicates type of audio/video encoding in each packet
sender can change encoding during conference
RTP header also contains sequence numbers, timestamps
48
RTP and QoS
RTP does not provide any mechanism to ensure timely data delivery or other QoS guarantees
RTP encapsulation only seen at end systems (not by intermediate routers)
routers provide best-effort service, making no special effort to ensure that RTP packets arrive at destination in timely manner
49
payload type (7 bits): indicates type of encoding currently being used. If sender changes encoding during call, sender informs receiver via payload type field
Payload type 0: PCM m-law, 64 kbps
Payload type 3: GSM, 13 kbps
Payload type 7: LPC, 2.4 kbps
Payload type 26: Motion JPEG
Payload type 31: H.261
Payload type 33: MPEG2 video
sequence # (16 bits): increment by one for each RTP packet sent
detect packet loss, restore packet sequence
payload type
sequence number type
time stamp
Miscellaneous fields
RTP header
Synchronization
Source ID (SSRC)
50
Adaptive encoding
RTP header
timestamp field (32 bits long): sampling instant of first byte in this RTP data packet
for audio, timestamp clock increments by one for each sampling period (e.g., each 125 usecs for 8 KHz sampling clock)
if application generates chunks of 160 encoded samples (20ms), 20ms/125us=160
timestamp increases by 160 for each RTP packet when source is active. Timestamp clock continues to increase at constant rate when source is inactive.
payload type
sequence number type
time stamp
Synchronization
Source ID (SSRC)
Miscellaneous fields
X
X+160
160 samples
20 ms
X+480
480*125us=60 ms
51
1/8000=125.10^-6
RTP header
sequence # + timestamp: knows new spurts
SSRC field (32 bits long): identifies source of RTP stream. Each stream in RTP session has distinct SSRC
payload type
sequence number type
time stamp
Synchronization
Source ID (SSRC)
Miscellaneous fields
52
1/8000=125.10^-6
SIP: Session Initiation Protocol [RFC 3261]
long-term vision:
all telephone calls, video conference calls take place over Internet
people identified by names or e-mail addresses, rather than by phone numbers
can reach callee (if callee so desires), no matter where callee roams, no matter what IP device callee is currently using
53
SIP services
SIP provides mechanisms for call setup:
for caller to let callee know she wants to establish a call
so caller, callee can agree on media type, encoding
to end call
determine current IP address of callee:
maps mnemonic identifier to current IP address
call management:
add new media streams during call
change encoding during call
invite others
transfer, hold calls
54
SIP solves the dynamic IP address thing by retrieving IP address based on user.
Alice’s SIP invite message indicates her port number, IP address, encoding she prefers to receive (PCM mlaw)
Bob’s 200 OK message indicates his port number, IP address, preferred encoding (GSM)
SIP messages can be sent over TCP or UDP; here sent over RTP/UDP
Default SIP port # is 5060
Actually, Bob and Alice talks simultaneoulsy
SIP is out-of-band
Example: setting up call to known IP address
55
AVP0 indicates the encoding at which Alice would like to receive (= PCM encoded mu-law)
38060 is the port Alice would like to receive on.
SIP port 5060 is used for the handshake (Bob request a different format: GSM) before they can talk.
SIP is out-of-band (like FTP): different sockets used for SIP msgs and sending/receiving media data.
Setting up a call (cont’d)
codec negotiation:
suppose Bob doesn’t have PCM mlaw encoder
Bob will instead reply with 606 Not Acceptable Reply, listing his encoders. Alice can then send new INVITE message, advertising different encoder
rejecting a call
Bob can reject with replies “busy,” “gone,” “payment required,” “forbidden”
media can be sent over RTP or some other protocol
56
Name translation, user location
caller wants to call callee, but only has callee’s name or e-mail address.
need to get IP address of callee’s current host:
user moves around
DHCP protocol (dynamically assign IP address)
user has different IP devices (PC, smartphone, car device)
result can be based on:
time of day (work, home)
caller (don’t want boss to call you at home)
status of callee (calls sent to voicemail when callee is already talking to someone)
57
SIP registrar
REGISTER sip:domain.com SIP/2.0
Via: SIP/2.0/UDP 193.64.210.89
From:
To:
Expires: 3600
one function of SIP server: registrar
when Bob starts SIP client, client sends SIP REGISTER message to Bob’s registrar server
register message:
58
Every 3600 seconds (1 hour) the registration should be renewed.
SIP proxy
another function of SIP server: proxy
Alice sends invite message to her proxy server
contains address
proxy responsible for routing SIP messages to callee, possibly through multiple proxies
Bob sends response back through same set of SIP proxies
proxy returns Bob’s SIP response message to Alice
contains Bob’s IP address
SIP proxy analogous to local DNS server
59
1
1. Alice sends INVITE
message to UMass SIP proxy.
2. UMass proxy forwards request
to Poly registrar server
2
3. Poly server returns redirect response,
indicating that it should try
3
5. eurecom registrar forwards INVITE to 197.87.54.21, which is running Bob’s SIP client
5
4
4. Umass proxy forwards request
to Eurecom registrar server
8
6
7
6-8. SIP response returned to Alice
9
9. Data flows between clients
UMass SIP proxy
Poly SIP
registrar
Eurecom SIP
registrar
Bob
197.87.54.21
Alice
128.119.40.186
SIP example: calls
60
packets
time
packets
generated
packets
received
loss
r
p
p’
playout schedule
p’ – r
playout schedule
p – r
packets�
time�
packets
generated�
packets
received�
loss�
r�
p�
p’�
playout schedule
p’ – r�
playout schedule
p – r�
time time
Bob’s
terminal rings
Alice
167.180.112.24
Bob
193.64.210.89
port 5060
port 38060
Law audio
GSM
port 48753
INVITE
c=IN IP4 167.180.112.24
m=audio 38060 RTP/AVP 0
port 5060
200 OK
c=IN IP4 193.64.210.89
m=audio 48753 RTP/AVP 3
ACK
port 5060
Alice�
Bob�
167.180.112.24�
193.64.210.89�
INVITE
c=IN IP4 167.180.112.24
m=audio 38060 RTP/AVP 0
�
Bob’s
terminal rings�
time�
port 38060�
Law audio�
200 OK
c=IN IP4 193.64.210.89
m=audio 48753 RTP/AVP 3�
GSM�
port 48753�
ACK�
time�
port 5060�
port 5060�
port 5060�
/docProps/thumbnail.jpeg