Advanced Network Technologies Multimedia 1/2
Dr. Wei Bao| Lecturer School of Computer Science
Multimedia
› Multimedia
› Streaming stored video › Voice-over-IP
› RTP/SIP
Multimedia
Multimedia networking: 3 application types
› streaming, stored audio, video
– streaming: can begin playout before downloading entire file
– stored (at server): can transmit faster than audio/video will be rendered (implies storing/buffering at client)
– e.g., YouTube, Netflix, Hulu
› conversational voice/video over IP
– interactive nature of human-to-human conversation limits delay tolerance
– e.g., Skype
› streaming live audio, video
– e.g., live sporting event
Multimedia audio
› analog audio signal sampled at constant rate
– telephone: 8,000 samples/sec
– CD music: 44,100 samples/sec
› each sample quantized, i.e.,
rounded
– e.g., 28=256 possible quantized values
– each quantized value represented by bits, e.g., 8 bits for 256 values
quantization error
quantized value of analog value
analog signal
Rate=44100 samples/sec * 8bit/sample = 352800 bps
sampling rate (N sample/sec)
time
audio signal amplitude
Video
Video: sequence of images displayed at constant rate e.g. 24 images/sec
Each image: array of pixels: Resolution: e.g. 480*640 each pixel: 3 colors
Red, Green, Blue (RGB)
Each color has 28=256 possible quantized values (8 bit) Data rate: 8*3*480*640*24 = 177 Mbps. Too large!
coding: use redundancy within and between images to decrease # bits used to encode image
spatial (within image)
temporal (from one image to next)
• examples:
• MPEG 1 (CD-ROM) 1.5 Mbps • MPEG2 (DVD) 3-6 Mbps
• MPEG4 (often used in Internet, < 1 Mbps)
MPEG: Moving Picture Experts Group
spatial coding example: instead of sending N values of same color (all purple), send only two values: color value (purple) and number of repeated values (N)
.............................. ..............................
frame i
temporal coding example:
instead of sending complete frame at i+1, send only differences from frame i
frame i+1
Streaming Stored Video
Streaming stored video
Frame 3 Frame 2
Frame 1
1. recorded
4. video played out
... Frame 4
video
2. video sent
(e.g., 30 frames/sec)
delay (fixed in this
time
3. video received
network
streaming: at this time, client example)playing out early part of video, while server still sending later
part of video
Streaming stored video: challenges
› continuous playout constraint: once client playout begins, playback must match original timing
- ... but network delays are variable (jitter), so will need client-side buffer to match playout requirements
› other challenges:
- client interactivity: pause, fast-forward, jump through
video
- video packets may be lost, retransmitted
Streaming stored video: revisited
constant bit rate video
transmission
variable network delay
client video reception
constant bit rate video
playout at client
client playout delay
time
› client-side buffering and playout delay: compensate for network- added delay, delay jitter
buffered video
buffered video
Streaming stored video: revisited
constant bit rate video
transmission
client video reception
constant bit rate video
playout at client
Cannot be played on time
variable network delay
Buffer underflow!
client playout delay
time
buffered video
Streaming stored video: revisited
constant bit rate (CBR) video
transmission
variable network delay
client video reception
constant bit rate video
playout at client
larger client playout delay
› Increase playout delay: fewer buffer underflows › initial playout delay tradeoff
time
Client-side buffering, playout
variable fill rate, x(t)
playout rate, e.g., CBR r
buffer fill level,
Q(t)
video server
client application buffer, size B
client
Client-side buffering, playout
variable fill rate, x(t)
playout rate, e.g., CBR r
buffer fill level,
Q(t)
video server
client application buffer, size B
client
1. Initial fill of buffer until playout begins at tp
2. playout begins at tp,
3. buffer fill level Q(t) varies over time as fill rate x(t) varies and playout
rate r is constant
4. Q(t+1)=Q(t)+x(t), t ≤ tp; Q(t+1)=max[Q(t)+x(t)-r, 0], t> tp
5. Q(t)+x(t)-r<0: buffer underflow
Client-side buffering, playout
variable fill rate, x(t)
playout rate, e.g., CBR r
buffer fill level,
Q(t)
video server
client application buffer, size B
playout buffering: average fill rate E(x), playout rate r
›E(x) < r: buffer eventually empties (causing freezing of video playout until buffer fills again)
›E(x) ≥ r: buffer will not empty, provided initial playout delay is large enough to absorb variability in x(t)
- initial playout delay tradeoff: buffer starvation less likely with larger delay, but larger delay until user begins watching
Streaming multimedia: UDP
› server sends at rate appropriate for client
- often: send rate = encoding rate = constant rate
- transmission rate can be oblivious to congestion levels
› short playout delay (2-5 seconds) to remove network jitter
› error recovery: application-level, time-permitting
› RTP [RFC 2326]: multimedia payload types
› UDP may not go through firewalls
Streaming multimedia: HTTP
› multimedia file retrieved via HTTP GET
› send at maximum possible rate under TCP
variable rate, x(t)
video file
TCP send buffer
TCP receive buffer
application playout buffer
server
client
› fill rate fluctuates due to TCP congestion control, retransmissions (in-order delivery)
› larger playout delay: smooth TCP delivery rate
› HTTP/TCP passes more easily through firewalls
Streaming multimedia: DASH
› DASH: Dynamic, Adaptive Streaming over HTTP
› server:
- divides video file into multiple chunks
- each chunk stored, encoded at different rates
- manifest file: provides URLs for different chunks
› client:
- periodically measures server-to-client bandwidth
- consulting manifest, requests one chunk at a time
- chooses maximum coding rate sustainable given current bandwidth
- can choose different coding rates at different points in time (depending on current available bandwidth)
Streaming multimedia: DASH
Chunk1
Chunk2
Chunk3
...
ChunkN
Chunk1
Chunk2
Chunk3
...
ChunkN
Chunk1
Chunk2
Chunk3
...
ChunkN
High quality
Low quality
Bandwidth
Chunk1
Chunk2
Chunk3
...
Streaming multimedia: DASH
› DASH: Dynamic, Adaptive Streaming over HTTP › “intelligence” at client: client determines
- when to request chunk (so that buffer starvation does not occur)
- what encoding rate to request (higher quality when more bandwidth
available)
- where to request chunk (can request from URL server that is “close” to client or has high available bandwidth)
Content distribution network
› challenge: how to stream content (selected from millions of videos) to hundreds of thousands of simultaneous users?
› option 1: single, large “mega-server” - single point of failure
- point of network congestion
- long path to distant clients
- multiple copies of video sent over outgoing link ....quite simply: this solution doesn’t scale
Content distribution network
› challenge: how to stream content (selected from millions of videos) to hundreds of thousands of simultaneous users?
› option 2: store/serve multiple copies of videos at multiple geographically distributed sites (CDN)
CDN: “simple” content access scenario
Bob (client) requests video http://netcinema.com/6Y7B23V video stored in CDN at http://KingCDN.com/NetC6y&B23V
1.
Bob gets URL for video
http://netcinema.com/6Y7B23V
from netcinema.com web page
1
6. request video from KINGCDN server, streamed via HTTP
2 5
2. resolve http://netcinema.com/6Y7B23V via Bob’s local DNS
Local DNS
4&5. Resolve http://KingCDN.com/NetC6y&B23 via KingCDN’s authoritative DNS, which returns IP address of best KingCDN server with video
4
netcinema.com
3. netcinema’s DNS returns URL
http://KingCDN.com/NetC6y&B23V
3
0. Store the video in CDN netcinema’s
KingCDN authoritative DNS
authorative DNS
KingCDN.com
CDN cluster selection strategy
› challenge: how does CDN DNS select “good” CDN node to stream to client
- pick CDN node geographically closest to client
- pick CDN node with shortest delay (or min # hops) to client (CDN nodes
periodically ping access ISPs, reporting results to CDN DNS)
› alternative: let client decide - give client a list of several CDN servers
- client pings servers, picks “best” - Netflix approach
Case study: Netflix
› 30% downstream US traffic in 2011
› Owns very little infrastructure, uses 3rd party services: - own registration, payment servers
- Amazon (3rd party) cloud services:
- Create multiple versions of movie (different encodings) in Amazon cloud - Upload versions from cloud to CDNs
- Cloud hosts Netflix web pages for user browsing
- three 3rd party CDNs host/stream Netflix content: Akamai, Limelight, Level-3
Case study: Netflix
Master version -> different formats. homepage
Netflix registration, accounting servers
Amazon cloud
upload copies of multiple versions of video to CDNs
Akamai CDN
2. Bob browses
Netflix video
1
1. Bob manages Netflix account
3. Manifest file returned for requested video
Limelight CDN
2
3
DNS
4. DASH streaming
Level-3 CDN
Voice over IP
Voice-over-IP (VoIP)
› VoIP end-end-delay requirement: needed to maintain “conversational” aspect
– higher delays noticeable, impair interactivity
– < 150 msec: good
- > 400 msec: bad
– includes application-level (playout), network delays
› session initialization: how does callee advertise IP address, port number, encoding algorithms?
› value-added services: call forwarding, screening, recording › emergency services: 911/000
VoIP characteristics
› speaker’s audio: alternating talk spurts, silent periods. – 64 kbps during talk spurt
– chucks generated only during talk spurts
– 20 msec: chucks at 8 Kbytes/sec: 160 bytes of data
› application-layer header added to each chunk
› chunk+header encapsulated into UDP or TCP segment
› application sends segment into socket every 20 msec during talkspurt
VoIP: packet loss, delay
› network loss: IP datagram lost due to network congestion (router buffer overflow)
› delay loss: IP datagram arrives too late for playout at receiver
– delays: processing, queueing in network, transmission, proporgation. – typical maximum tolerable delay: 400 ms
› loss tolerance: depending on voice encoding, loss concealment, packet loss rates between 1% and 10% can be tolerated
Delay jitter
client reception
variable network delay (jitter)
client playout delay
Sum delay
time
buffered data
VoIP: fixed playout delay
› receiver attempts to playout each chunk exactly q msecs after chunk was generated.
– chunk has time stamp t: play out chunk at t+q
– chunk arrives after t+q: data arrives too late for
playout: data “lost”
› tradeoff in choosing q:
– large q: less packet loss
– small q: better interactive experience
VoIP: fixed playout delay
› sender generates packets every 20 msec during talk spurt. › firstpacketreceivedattimer
› firstplayoutschedule:beginsatp
› secondplayoutschedule:beginsatp’
packets
packets generated
loss
packets received
playout schedule p’ – r
playout schedule p-r
time
r
p p’
Adaptive playout delay
› goal: low playout delay, low late loss rate client
reception
variable network delay (jitter)
To many losses
Unnecessary delay Best
time
Adaptive playout delay
› goal: low playout delay, low late loss rate
› approach: adaptive playout delay adjustment:
– estimate network delay, adjust playout delay at beginning of each talk spurt
– silent periods compressed and elongated
› adaptively estimate packet delay: (EWMA – exponentially weighted
moving average, recall TCP RTT estimate):
di = (1−α)di-1 + α (ri – ti)
delay estimate small constant, time received – time sent after ith packet e.g. 0.01 (timestamp)
measured delay of ith packet
Adaptive playout delay (cont’d)
also useful to estimate average deviation of delay, vi : vi = (1−β)vi-1 + β |ri – ti – di|
› estimates di, vi calculated for every received packet, but used only at start of talk spurt
› for first packet in talk spurt, playout time is: playout-timei = ti + di + Kvi
Delay jitter
talk spurt 2
…
talk spurt 1
di + Kvi adjust
ri – ti
Determine di + Kvi
delay
time
Adaptive playout delay (cont’d)
Q: How does receiver determine whether packet is first in a talkspurt? › if no loss, receiver looks at successive timestamps
– difference of successive stamps > 20 msec ⇒ talk spurt begins.
› with loss possible, receiver must look at both time stamps and
sequence numbers
– difference of successive stamps > 20 msec and sequence numbers without gaps ⇒ talk spurt begins.
Adaptive playout delay (cont’d)
20ms
20ms
20ms
20ms
20ms
0
1
20ms
0
Spurt 1
20 40
Spurt 1 3
20ms
40
Spurt 2
100 120
4 Spurt2 5
100 120
20ms
20ms
VoIP: recovery from packet loss
Challenge: recover from packet loss given small tolerable delay between original transmission and playout
› each ACK/NAK takes ~ one RTT
› alternative: Forward Error Correction (FEC)
– send enough bits to allow recovery without retransmission simple FEC
› for every group of n chunks, create redundant chunk by exclusive OR-ing n original chunks
› send n+1 chunks, increasing bandwidth by factor 1/n
› can reconstruct original n chunks if at most one lost chunk from n+1
chunks
› Sendx1,x2,x3,…xn,and y=x1 xorx2 xorx3,…,xorxn, 1 0 1 0
› If x3 is lost, can re-compute x3 from x1, x2, x4, … xn, and y
1 0 ? 0 1 XOR 0 XOR x3 =0 x3 =1
VoIP: recovery from packet loss (cont’d)
another FEC scheme:
• “piggyback lower quality stream”
• sendlowerresolution audio stream as redundant information
• e.g.,nominal
stream at 64 kbps and redundant stream at 13 kbps
VoiP: recovery from packet loss (cont’d)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
interleaving to conceal loss:
› audio chunks divided into smaller units, e.g. four 5 msec units per 20 msec audio chunk
› packet contains small units from different chunks
› if packet lost, still have most of every original chunk
› no redundancy overhead, but worse delay performance
VoiP: recovery from packet loss (cont’d)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Word 1 Word 2
Word 3 Word 4
1
2
3
4
5
6
7
8
1
2
5
6
9
10
13
14
4
8
12
16
13
14
15
16
word
e.g., word missing
e.g. syllable missing, acceptable
Subjective feeling is improved
Real-time Conversational Applications
Real-Time Protocol (RTP)
› RTP specifies packet structure for packets carrying audio, video data
› RFC 3550
› RTP packet provides
– payload type identification
– packet sequence numbering
– time stamping
› RTP runs in end systems
› RTP packets encapsulated in UDP segments
› interoperability: if two VoIP applications run RTP, they may be able to work together
RTP runs on top of UDP
RTP libraries provide transport-layer interface that extends UDP:
• port numbers, IP addresses (already existing) • payload type identification
• packet sequence numbering
• time-stamping
RTP example
example: sending 64 kbps PCM μ- law encoded voice over RTP
› RTP header indicates type of audio/video encoding in each packet
PCM: Pulse-code modulation: a
method used to digitally represent – sender can change encoding
sampled analog signals
μ-law: Special quatization
Sample rate 8000samples/second Quantization 8bit/sample
application collects encoded data in chunks, e.g., every 20 msec = 160 bytes in a chunk
› audio chunk + RTP header form RTP packet, which is encapsulated in UDP segment
during conference
› RTP header also contains sequence numbers, timestamps
RTP and QoS
› RTP does not provide any mechanism to ensure timely data delivery or other QoS guarantees
› RTP encapsulation only seen at end systems (not by intermediate routers)
– routers provide best-effort service, making no special effort to ensure that RTP packets arrive at destination in timely manner
RTP header
payload type
sequence number
time stamp
Synchronization Source ID (SSRC)
Miscellaneous fields
type
• payload type (7 bits): indicates type of encoding currently being used. If sender changes encoding during call, sender informs receiver via payload type field
•
Payload type 0: PCM μ-law, 64 kbps
• • •
Payload type 3: GSM, 13 kbps Payload type 7: LPC, 2.4 kbps Payload type 26: Motion JPEG
• Payload type 31: H.261 Payload type 33: MPEG2 video
•
• sequence # (16 bits): increment by one for each RTP
packet sent
• detect packet loss, restore packet sequence
RTP header
payload type
sequence number
time stamp
Synchronization Source ID (SSRC)
Miscellaneous fields
type
› timestamp field (32 bits long): sampling instant of first byte in this RTP data packet
– for audio, timestamp clock increments by one for each sampling period (e.g., each 125 usecs for 8 KHz sampling clock)
– if application generates chunks of 160 encoded samples (20ms), 20ms/125us=160
– timestamp increases by 160 for each RTP packet when source is active. Timestamp clock continues to increase at constant rate when source is inactive.
X
X+160
X+480
480*125us=60 ms
160 samples 20 ms
RTP header
payload type
sequence number
time stamp
Synchronization Source ID (SSRC)
Miscellaneous fields
type
› sequence # + timestamp: knows new spurts
› SSRC field (32 bits long): identifies source of RTP stream. Each stream in RTP session has distinct SSRC
SIP: Session Initiation Protocol [RFC 3261]
long-term vision:
› all telephone calls, video conference calls take place over Internet
› people identified by names or e-mail addresses, rather than by
phone numbers
› can reach callee (if callee so desires), no matter where callee roams, no matter what IP device callee is currently using
SIP services
› SIP provides mechanisms for call setup:
– for caller to let callee know she wants to establish a call
– so caller, callee can agree on media type, encoding
– to end call
› determine current IP address of callee:
– maps mnemonic identifier to current IP address
› call management:
– add new media streams
during call
– change encoding during call
– invite others
– transfer, hold calls
Example: setting up call to known IP address
› Alice’s SIP invite message indicates her port number, IP
address, encoding she prefers to receive (PCM μlaw)
› Bob’s 200 OK message indicates his port number, IP address, preferred encoding
(GSM)
› SIP messages can be sent over TCP or UDP; here sent
over RTP/UDP
› Default SIP port # is 5060
› Actually, Bob and Alice talks simultaneoulsy
› SIP is out-of-band
Setting up a call (cont’d)
› codec negotiation:
– suppose Bob doesn’t have
PCM μlaw encoder
– Bob will instead reply with 606 Not Acceptable Reply, listing his encoders. Alice can then send new INVITE message, advertising different encoder
› rejecting a call
– Bob can reject with replies “busy,” “gone,” “payment required,” “forbidden”
› media can be sent over RTP or some other protocol
Name translation, user location
› caller wants to call callee, › result can be based on:
but only has callee’s name or e-mail address.
› need to get IP address of callee’s current host:
– user moves around
– DHCP protocol (dynamically
assign IP address)
– user has different IP devices (PC, smartphone, car device)
– time of day (work, home) – caller (don’t want boss to
call you at home)
– status of callee (calls sent to voicemail when callee is already talking to someone)
SIP registrar
one function of SIP server: registrar
when Bob starts SIP client, client sends SIP
REGISTER message to Bob’s registrar server register message:
REGISTER sip:domain.com SIP/2.0
Via: SIP/2.0/UDP 193.64.210.89
From: sip:bob@domain.com
To: sip:bob@domain.com
Expires: 3600
SIP proxy
› another function of SIP server: proxy
› Alice sends invite message to her proxy server
– contains address sip:bob@domain.com
– proxy responsible for routing SIP messages to callee, possibly through
multiple proxies
› Bob sends response back through same set of SIP proxies
› proxy returns Bob’s SIP response message to Alice
– contains Bob’s IP address
› SIP proxy analogous to local DNS server
SIP example: alice@umass.edu calls bob@poly.edu
2. UMass proxy forwards request to Poly registrar server
23
Poly SIP registrar
3. Poly server returns redirect response, indicating that it should try bob@eurecom.fr
UMass SIP proxy
1. Alice sends INVITE message to UMass
SIP proxy. 1
Alice 128.119.40.186
8
4. Umass proxy forwards request to Eurecom registrar server 4
7
6-8. SIP response returned to Alice
9
9. Data flows between clients
6
5
Eurecom SIP registrar
5. eurecom registrar forwards INVITE to 197.87.54.21, which is running Bob’s SIP client
Bob 197.87.54.21