Week 7-audio-video
Advanced Network Technologies
Multimedia 1/2
School of Computer Science
Dr. Wei Bao| Lecturer
Multimedia
› Multimedia
› Streaming stored video
› Voice-over-IP
› RTP/SIP
Multimedia
Multimedia networking: 3 application types
› streaming, stored audio, video
– streaming: can begin playout before downloading entire file
– stored (at server): can transmit faster than audio/video will
be rendered (implies storing/buffering at client)
– e.g., YouTube, Netflix, Hulu
› conversational voice/video over IP
– interactive nature of human-to-human conversation limits
delay tolerance
– e.g., Skype
› streaming live audio, video
– e.g., live sporting event
Multimedia audio
› analog audio signal sampled at
constant rate
– telephone: 8,000 samples/sec
– CD music: 44,100 samples/sec
› each sample quantized, i.e.,
rounded
– e.g., 28=256 possible quantized
values
– each quantized value
represented by bits, e.g., 8 bits
for 256 values
time
au
di
o
si
gn
al
a
m
pl
itu
de
analog
signal
quantized
value of
analog value
quantization
error
sampling rate
(N sample/sec)
Rate=44100 samples/sec * 8bit/sample = 352800 bps
vVideo: sequence of images displayed at constant rate
§e.g. 24 images/sec
vEach image: array of pixels: Resolution: e.g. 480*640
§each pixel: 3 colors
§Red, Green, Blue (RGB)
§Each color has 28=256 possible quantized values (8 bit)
§Data rate: 8*3*480*640*24 = 177 Mbps. Too large!
Video
vcoding: use redundancy within
and between images to
decrease # bits used to encode
image
§ spatial (within image)
§ temporal (from one image to next)
• examples:
• MPEG 1 (CD-ROM) 1.5 Mbps
• MPEG2 (DVD) 3-6 Mbps
• MPEG4 (often used in Internet, < 1
Mbps)
§ MPEG: Moving Picture Experts Group
……………………...…
spatial coding example: instead
of sending N values of same
color (all purple), send only two
values: color value (purple) and
number of repeated values (N)
……………………...…
frame i
frame i+1
temporal coding example:
instead of sending
complete frame at i+1,
send only differences from
frame i
Streaming Stored Video
1. video
recorded
(e.g., 30
frames/sec)
2. video
sentC
um
ul
at
iv
e
da
ta
streaming: at this time, client
playing out early part of video,
while server still sending later
part of video
network
delay
(fixed in
this
example)
time
3. video received
Streaming stored video
4. video played out
Frame 1
Frame 2
Frame 3
Frame 4
…
Streaming stored video: challenges
› continuous playout constraint: once client playout
begins, playback must match original timing
- … but network delays are variable (jitter), so will need
client-side buffer to match playout requirements
›other challenges:
- client interactivity: pause, fast-forward, jump through
video
- video packets may be lost, retransmitted
constant bit
rate video
transmission
C
um
ul
at
iv
e
da
ta
time
variable
network
delay
client video
reception
constant bit
rate video
playout at client
client playout
delay
bu
ffe
re
d
vi
de
o
Streaming stored video: revisited
› client-side buffering and playout delay: compensate for network-
added delay, delay jitter
bu
ffe
re
d
vi
de
o
constant bit
rate video
transmission
C
um
ul
at
iv
e
da
ta
time
variable
network
delay
client video
reception constant bit
rate video
playout at client
client playout
delay
bu
ffe
re
d
vi
de
o
Streaming stored video: revisited
Buffer underflow!
Cannot be played on time
constant bit
rate (CBR) video
transmission
C
um
ul
at
iv
e
da
ta
time
variable
network
delay
client video
reception constant bit
rate video
playout at client
larger client playout
delay
Streaming stored video: revisited
› Increase playout delay: fewer buffer underflows
› initial playout delay tradeoff
variable fill
rate, x(t)
client application
buffer, size B
playout rate,
e.g., CBR r
buffer fill level,
Q(t)
video server
client
Client-side buffering, playout
variable fill
rate, x(t)
client application
buffer, size B
playout rate,
e.g., CBR r
buffer fill level,
Q(t)
video server
client
1. Initial fill of buffer until playout begins at tp
2. playout begins at tp,
3. buffer fill level Q(t) varies over time as fill rate x(t) varies and playout
rate r is constant
4. Q(t+1)=Q(t)+x(t), t ≤ tp; Q(t+1)=max[Q(t)+x(t)-r, 0], t> tp
5. Q(t)+x(t)-r<0: buffer underflow
Client-side buffering, playout
Client-side buffering, playout
playout buffering: average fill rate E(x), playout rate r
›E(x) < r: buffer eventually empties (causing freezing of
video playout until buffer fills again)
›E(x) ≥ r: buffer will not empty, provided initial playout
delay is large enough to absorb variability in x(t)
- initial playout delay tradeoff: buffer starvation less likely with larger delay,
but larger delay until user begins watching
variable fill
rate, x(t)
client application
buffer, size B
playout rate,
e.g., CBR r
buffer fill level,
Q(t)
video server
Streaming multimedia: UDP
› server sends at rate appropriate for client
- often: send rate = encoding rate = constant rate
- transmission rate can be oblivious to congestion levels
› short playout delay (2-5 seconds) to remove network jitter
› error recovery: application-level, time-permitting
› RTP [RFC 2326]: multimedia payload types
› UDP may not go through firewalls
Streaming multimedia: HTTP
› multimedia file retrieved via HTTP GET
› send at maximum possible rate under TCP
› fill rate fluctuates due to TCP congestion control, retransmissions
(in-order delivery)
› larger playout delay: smooth TCP delivery rate
› HTTP/TCP passes more easily through firewalls
variable
rate, x(t)
TCP send
buffer
video
file
TCP receive
buffer
application
playout buffer
server client
Streaming multimedia: DASH
› DASH: Dynamic, Adaptive Streaming over HTTP
› server:
- divides video file into multiple chunks
- each chunk stored, encoded at different rates
- manifest file: provides URLs for different chunks
› client:
- periodically measures server-to-client bandwidth
- consulting manifest, requests one chunk at a time
- chooses maximum coding rate sustainable given current
bandwidth
- can choose different coding rates at different points in time
(depending on current available bandwidth)
Streaming multimedia: DASH
Chunk1 Chunk2 Chunk3 … ChunkN
Chunk1 Chunk2 Chunk3 … ChunkN
Chunk1 Chunk2 Chunk3 … ChunkN
Low quality
High quality
Bandwidth
Chunk1 Chunk2 Chunk3 …
Streaming multimedia: DASH
› DASH: Dynamic, Adaptive Streaming over HTTP
› “intelligence” at client: client determines
- when to request chunk (so that buffer starvation does not occur)
- what encoding rate to request (higher quality when more bandwidth
available)
- where to request chunk (can request from URL server that is “close” to
client or has high available bandwidth)
Content distribution network
› challenge: how to stream content (selected from millions of videos)
to hundreds of thousands of simultaneous users?
› option 1: single, large “mega-server”
- single point of failure
- point of network congestion
- long path to distant clients
- multiple copies of video sent over outgoing link
….quite simply: this solution doesn’t scale
Content distribution network
› challenge: how to stream content (selected from millions of videos)
to hundreds of thousands of simultaneous users?
› option 2: store/serve multiple copies of videos at multiple
geographically distributed sites (CDN)
Bob (client) requests video http://netcinema.com/6Y7B23V
§video stored in CDN at http://KingCDN.com/NetC6y&B23V
netcinema.com
KingCDN.com
1
1. Bob gets URL for video
http://netcinema.com/6Y7B23V
from netcinema.com web page
2
2. resolve http://netcinema.com/6Y7B23V
via Bob’s local DNS
netcinema’s
authorative DNS
3
4
4&5. Resolve http://KingCDN.com/NetC6y&B23
via KingCDN’s authoritative DNS, which returns IP
address of best KingCDN server with video
5
6. request video from
KINGCDN server,
streamed via HTTP
KingCDN
authoritative DNS
CDN: “simple” content access scenario
3. netcinema’s DNS returns URL
http://KingCDN.com/NetC6y&B23V
Local DNS
0. Store the video in CDN
http://netcinema.com/6Y7B23V
http://KingCDN.com/NetC6y&B23V
http://netcinema.com/6Y7B23V
http://netcinema.com/6Y7B23V
http://KingCDN.com/NetC6y&B23
http://KingCDN.com/NetC6y&B23V
CDN cluster selection strategy
› challenge: how does CDN DNS select “good” CDN node to stream
to client
- pick CDN node geographically closest to client
- pick CDN node with shortest delay (or min # hops) to client (CDN nodes
periodically ping access ISPs, reporting results to CDN DNS)
› alternative: let client decide - give client a list of several CDN
servers
- client pings servers, picks “best”
- Netflix approach
Case study: Netflix
› 30% downstream US traffic in 2011
› Owns very little infrastructure, uses 3rd party services:
- own registration, payment servers
- Amazon (3rd party) cloud services:
- Create multiple versions of movie (different encodings) in Amazon cloud
- Upload versions from cloud to CDNs
- Cloud hosts Netflix web pages for user browsing
- three 3rd party CDNs host/stream Netflix content: Akamai, Limelight,
Level-3
1
1. Bob manages
Netflix account
Netflix registration,
accounting servers
Amazon cloud
Akamai CDN
Limelight CDN
Level-3 CDN
2
2. Bob browses
Netflix video
3
3. Manifest file
returned for
requested video
4. DASH
streaming
upload copies of
multiple versions of
video to CDNs
Case study: Netflix
Master version ->
different formats.
homepage
DNS
Voice over IP
Voice-over-IP (VoIP)
› VoIP end-end-delay requirement: needed to maintain
“conversational” aspect
– higher delays noticeable, impair interactivity
– < 150 msec: good - > 400 msec: bad
– includes application-level (playout), network delays
› session initialization: how does callee advertise IP address, port
number, encoding algorithms?
› value-added services: call forwarding, screening, recording
› emergency services: 911/000
VoIP characteristics
› speaker’s audio: alternating talk spurts, silent periods.
– 64 kbps during talk spurt
– chucks generated only during talk spurts
– 20 msec: chucks at 8 Kbytes/sec: 160 bytes of data
› application-layer header added to each chunk
› chunk+header encapsulated into UDP or TCP segment
› application sends segment into socket every 20 msec during
talkspurt
VoIP: packet loss, delay
› network loss: IP datagram lost due to network congestion (router
buffer overflow)
› delay loss: IP datagram arrives too late for playout at receiver
– delays: processing, queueing in network, transmission, proporgation.
– typical maximum tolerable delay: 400 ms
› loss tolerance: depending on voice encoding, loss concealment,
packet loss rates between 1% and 10% can be tolerated
C
um
ul
at
iv
e
da
ta
time
variable
network
delay
(jitter)
client
reception
client playout
delay
Sum delay
bu
ffe
re
d
da
ta
Delay jitter
VoIP: fixed playout delay
› receiver attempts to playout each chunk exactly q
msecs after chunk was generated.
– chunk has time stamp t: play out chunk at t+q
– chunk arrives after t+q: data arrives too late for
playout: data “lost”
› tradeoff in choosing q:
– large q: less packet loss
– small q: better interactive experience
packets
time
packets
generated
packets
received
loss
r
p p’
playout schedule
p’ – r
playout schedule
p – r
VoIP: fixed playout delay
› sender generates packets every 20 msec during talk spurt.
› first packet received at time r
› first playout schedule: begins at p
› second playout schedule: begins at p’
Adaptive playout delay
› goal: low playout delay, low late loss rate
C
um
ul
at
iv
e
da
ta
time
variable
network
delay
(jitter)
client
reception
To many losses Unnecessary delay
Best
Adaptive playout delay
› goal: low playout delay, low late loss rate
› approach: adaptive playout delay adjustment:
– estimate network delay, adjust playout delay at beginning of each talk
spurt
– silent periods compressed and elongated
› adaptively estimate packet delay: (EWMA – exponentially weighted
moving average, recall TCP RTT estimate):
di = (1-a)di-1 + a (ri – ti)
delay estimate
after ith packet
small constant,
e.g. 0.01
time received – time sent
(timestamp)
measured delay of ith packet
(ri – ti)
v also useful to estimate average deviation of delay, vi :
Adaptive playout delay (cont’d)
› estimates di, vi calculated for every received packet, but used only
at start of talk spurt
› for first packet in talk spurt, playout time is:
vi = (1-b)vi-1 + b |ri – ti – di|
playout-timei = ti + di + Kvi
C
um
ul
at
iv
e
da
ta
time
Delay jitter
talk spurt 1
adjust
delay
…
talk spurt 2
ri – ti Determine di + Kvi
di + Kvi
Adaptive playout delay (cont’d)
Q: How does receiver determine whether packet is first in a talkspurt?
› if no loss, receiver looks at successive timestamps
– difference of successive stamps > 20 msec ⇒ talk spurt begins.
› with loss possible, receiver must look at both time stamps and
sequence numbers
– difference of successive stamps > 20 msec and sequence numbers
without gaps ⇒ talk spurt begins.
Adaptive playout delay (cont’d)
20ms 20ms 20ms 20ms 20ms
Spurt 1 Spurt 2
20ms 20ms 20ms 20ms
Spurt 1 Spurt 2
1 3 4 5
0 20 40 100 120
0 40 100 120
VoIP: recovery from packet loss
Challenge: recover from packet loss given small tolerable
delay between original transmission and playout
› each ACK/NAK takes ~ one RTT
› alternative: Forward Error Correction (FEC)
– send enough bits to allow recovery without retransmission
simple FEC
› for every group of n chunks, create redundant chunk by exclusive
OR-ing n original chunks
› send n+1 chunks, increasing bandwidth by factor 1/n
› can reconstruct original n chunks if at most one lost chunk from n+1
chunks
› Send x1, x2, x3, … xn, and y=x1 xor x2 xor x3,…, xor xn,
› If x3 is lost, can re-compute x3 from x1, x2, x4, … xn, and y
1 0 1 0
1 0 ? 0 1 XOR 0 XOR x3 =0 x3 =1
another FEC scheme:
• “piggyback lower
quality stream”
• send lower resolution
audio stream as
redundant information
• e.g., nominal
stream at 64 kbps
and redundant stream
at 13 kbps
VoIP: recovery from packet loss (cont’d)
VoiP: recovery from packet loss (cont’d)
interleaving to conceal loss:
› audio chunks divided into smaller
units, e.g. four 5 msec units per
20 msec audio chunk
› packet contains small units from
different chunks
› if packet lost, still have most of
every original chunk
› no redundancy overhead, but
worse delay performance
1 2 3 4
5 6 7 8
9 11 12
13 14 15 16
10
e.g., word missing e.g. syllable missing, acceptable
VoiP: recovery from packet loss (cont’d)
1 2 3 4
5 6 7 8
9 11 12
13 14 15 16
10
1 2 3 4
5 6 7 8
13 14 15 16
1 2 4
5 6 8
9 12
13 14 16
10
word
Word 1
Word 2
Word 3
Word 4
Subjective feeling is improved
Real-time Conversational
Applications
Real-Time Protocol (RTP)
› RTP specifies packet
structure for packets
carrying audio, video
data
› RFC 3550
› RTP packet provides
– payload type identification
– packet sequence
numbering
– time stamping
› RTP runs in end systems
› RTP packets
encapsulated in UDP
segments
› interoperability: if two
VoIP applications run
RTP, they may be able to
work together
RTP libraries provide transport-layer interface
that extends UDP:
• port numbers, IP addresses (already existing)
• payload type identification
• packet sequence numbering
• time-stamping
RTP runs on top of UDP
RTP example
example: sending 64 kbps PCM µ-
law encoded voice over RTP
PCM: Pulse-code modulation: a
method used to digitally represent
sampled analog signals
µ-law: Special quatization
Sample rate 8000samples/second
Quantization 8bit/sample
application collects encoded data in
chunks, e.g., every 20 msec = 160
bytes in a chunk
› audio chunk + RTP header form
RTP packet, which is
encapsulated in UDP segment
› RTP header indicates type of
audio/video encoding in each
packet
– sender can change encoding
during conference
› RTP header also contains
sequence numbers,
timestamps
RTP and QoS
› RTP does not provide any mechanism to ensure timely data
delivery or other QoS guarantees
› RTP encapsulation only seen at end systems (not by intermediate
routers)
– routers provide best-effort service, making no special effort to
ensure that RTP packets arrive at destination in timely manner
• payload type (7 bits): indicates type of encoding currently
being used. If sender changes encoding during call,
sender informs receiver via payload type field
• Payload type 0: PCM µ-law, 64 kbps
• Payload type 3: GSM, 13 kbps
• Payload type 7: LPC, 2.4 kbps
• Payload type 26: Motion JPEG
• Payload type 31: H.261
• Payload type 33: MPEG2 video
• sequence # (16 bits): increment by one for each RTP
packet sent
• detect packet loss, restore packet sequence
payload
type
sequence
number
type
time stamp Miscellaneous
fields
RTP header
Synchronization
Source ID (SSRC)
RTP header
› timestamp field (32 bits long): sampling instant of first byte in this
RTP data packet
– for audio, timestamp clock increments by one for each sampling period
(e.g., each 125 usecs for 8 KHz sampling clock)
– if application generates chunks of 160 encoded samples (20ms),
20ms/125us=160
– timestamp increases by 160 for each RTP packet when source is active.
Timestamp clock continues to increase at constant rate when source is
inactive.
payload
type
sequence
number
type
time stamp Synchronization
Source ID (SSRC)
Miscellaneous
fields
X X+160
160 samples
20 ms
X+480
480*125us=60 ms
RTP header
› sequence # + timestamp: knows new spurts
› SSRC field (32 bits long): identifies source of RTP stream.
Each stream in RTP session has distinct SSRC
payload
type
sequence
number
type
time stamp Synchronization
Source ID (SSRC)
Miscellaneous
fields
SIP: Session Initiation Protocol [RFC 3261]
long-term vision:
› all telephone calls, video conference calls take place over Internet
› people identified by names or e-mail addresses, rather than by
phone numbers
› can reach callee (if callee so desires), no matter where callee
roams, no matter what IP device callee is currently using
SIP services
› SIP provides
mechanisms for call
setup:
– for caller to let callee
know she wants to
establish a call
– so caller, callee can
agree on media type,
encoding
– to end call
› determine current IP
address of callee:
– maps mnemonic identifier
to current IP address
› call management:
– add new media streams
during call
– change encoding during
call
– invite others
– transfer, hold calls
› Alice’s SIP invite message
indicates her port number, IP
address, encoding she prefers
to receive (PCM µlaw)
› Bob’s 200 OK message
indicates his port number, IP
address, preferred encoding
(GSM)
› SIP messages can be sent
over TCP or UDP; here sent
over RTP/UDP
› Default SIP port # is 5060
› Actually, Bob and Alice talks
simultaneoulsy
› SIP is out-of-bandtime time
Bob’s
terminal rings
Alice
167.180.112.24
Bob
193.64.210.89
port 5060
port 38060
µ Law audio
GSM
port 48753
INVITE .210.89c=IN IP4 167.180.112.24m=audio 38060 RTP/AVP 0
port 5060
200 OK
c=IN IP4 193.64.210
.89
m=audio 48753 RTP
/AVP 3
ACK
port 5060
Example: setting up call to known IP address
Setting up a call (cont’d)
› codec negotiation:
– suppose Bob doesn’t have
PCM µlaw encoder
– Bob will instead reply with
606 Not Acceptable Reply,
listing his encoders. Alice
can then send new
INVITE message,
advertising different
encoder
› rejecting a call
– Bob can reject with
replies “busy,” “gone,”
“payment required,”
“forbidden”
› media can be sent
over RTP or some
other protocol
Name translation, user location
› caller wants to call callee,
but only has callee’s
name or e-mail address.
› need to get IP address of
callee’s current host:
– user moves around
– DHCP protocol (dynamically
assign IP address)
– user has different IP devices
(PC, smartphone, car device)
› result can be based on:
– time of day (work, home)
– caller (don’t want boss to
call you at home)
– status of callee (calls sent to
voicemail when callee is
already talking to someone)
SIP registrar
REGISTER sip:domain.com SIP/2.0
Via: SIP/2.0/UDP 193.64.210.89
From: sip:
To: sip:
Expires: 3600
v one function of SIP server: registrar
v when Bob starts SIP client, client sends SIP
REGISTER message to Bob’s registrar server
register message:
SIP proxy
› another function of SIP server: proxy
› Alice sends invite message to her proxy server
– contains address sip:
– proxy responsible for routing SIP messages to callee, possibly through
multiple proxies
› Bob sends response back through same set of SIP proxies
› proxy returns Bob’s SIP response message to Alice
– contains Bob’s IP address
› SIP proxy analogous to local DNS server
1
1. Alice sends INVITE
message to UMass
SIP proxy.
2. UMass proxy forwards request
to Poly registrar server
2 3. Poly server returns redirect response,
indicating that it should try
3
5. eurecom
registrar
forwards INVITE
to 197.87.54.21,
which is running
Bob’s SIP client
5
4
4. Umass proxy forwards request
to Eurecom registrar server
8
6
7
6-8. SIP response returned to
Alice
9
9. Data flows between clients
UMass
SIP proxy
Poly SIP
registrar
Eurecom SIP
registrar
Bob
197.87.54.21
Alice
128.119.40.186
SIP example: calls