程序代写代做代考 scheme flex ER algorithm cache Design

Design
 and
 Implementa/on
 of
 Next
 
Genera/on
 Video
 Coding
 Systems
 

(H.265/HEVC
 Tutorial)
 

Vivienne
 Sze
 (sze@mit.edu)
 
 
 
 

 Madhukar
 Budagavi
 (m.budagavi@samsung.com)
 

ISCAS
 Tutorial
 2014
 

•  Vivienne
 Sze
 (Assistant
 Professor
 at
 MIT)
 
–  Involved
 with
 video
 implementaBon
 research
 and
 standards
 for
 7+
 years
 

•  Contributed
 over
 70
 technical
 documents
 to
 HEVC.
 
•  Within
 JCT-­‐VC
 CommiNee,
 Primary
 Coordinator
 of
 the
 core
 experiments
 on
 
coefficient
 scanning
 and
 coding;
 chairman
 of
 ad
 hoc
 groups
 on
 topics
 related
 to
 
entropy
 coding
 and
 parallel
 processing.
 

•  Published
 over
 25
 journal
 and
 conference
 papers.
 
 

•  Madhukar
 Budagavi
 (Research
 Director
 at
 Samsung
 Research
 
America)
 
–  Involved
 with
 video
 standards
 and
 product
 development
 for
 15+
 years
 

•  Contributed
 over
 100
 technical
 documents
 to
 HEVC.
 
•  Within
 JCT-­‐VC
 CommiNee,
 Chaired
 and
 co-­‐chaired
 sub-­‐group
 acBviBes
 on
 spaBal
 
transforms,
 quanBzaBon,
 entropy
 coding,
 in-­‐loop
 filtering,
 intra
 predicBon,
 
screen
 content
 coding
 and
 scalable
 HEVC
 (SHVC).
 

•  Published
 over
 40
 journal
 and
 conference
 papers,
 book
 chapters.
 
 

Instructors
 

•  Part
 I:
 Overview
 of
 current
 video
 coding
 technology
 and
 
systems
 
 

•  Part
 II:
 High
 Efficiency
 Video
 Coding
 (HEVC)
 
 
•  Part
 III:
 Video
 Codec
 ImplementaBons
 
 
•  Part
 IV:
 Emerging
 ApplicaBons
 and
 HEVC
 Extensions
 

Outline
 of
 Tutorial
 

Part
 I:
 Overview
 of
 current
 video
 
coding
 technology
 and
 systems
 
 

Growing
 Demand
 for
 Video
 

•  Video
 exceeds
 half
 of
 internet
 traffic
 and
 will
 
grow
 to
 86
 percent
 by
 2016.
 Increase
 in
 
applicaBons,
 content,
 fidelity,
 etc.
 à
 Need
 
higher
 coding
 efficiency!
 

•  Ultra-­‐HD
 4K
 broadcast
 expected
 for
 Japan
 in
 
2014.
 London
 Olympics
 Opening
 and
 Closing
 
Ceremonies
 shot
 in
 Ultra-­‐HD
 8K.
 à
 Need
 
higher
 throughput!
 

•  25x
 increase
 in
 mobile
 data
 traffic
 over
 next
 
five
 years.
 Video
 is
 a
 “must
 have”
 on
 
portable
 devices.
 à
 Need
 lower
 power!
 

Sources:
 Cisco
 Visual
 Networking
 Index
 
Cisco
 Visual
 Networking
 Index:
 Global
 Mobile
 Data
 Traffic
 Forecast
 Update
  5
 

Digital
 Video
 

=
Y
  Cb
  Cr
 

H

W HW ×
22
HW

×
22
HW

×

0 1
  2
  3
 

6
 
4:2:0
 

Video
 Compression
 

•  Uncompressed
 1080p
 high
 definiBon
 (HD)
 video
 at
 24
 frames/
second
 
– Pixels
 per
 frame:
 1920×1080
 
– Bits
 per
 pixel:
 8-­‐bits
 x
 3
 (RGB)
 
– 1.5
 hours:
 806
 GB
 
– Bit-­‐rate:
 1.2
 Gbits/s
 

•  Blu-­‐Ray
 DVD
 
– Capacity:
 25
 GB
 (single
 layer)
 
– Read
 rate:
 36
 Mbits/s
 

•  Video
 Streaming
 or
 TV
 Broadcast
 
– 1
 Mbits/s
 to
 20
 Mbits/s
 

•  Require
 30x
 to
 1200x
 compression
 

7
 

•  Compression
 is
 achieved
 by
 removing
 redundant
 
informaBon
 from
 the
 video
 sequence
 

•  Types
 of
 redundancies
 in
 video
 sequences
 
–  SpaBal
 redundancy
 
– Perceptual
 redundancy
 
–  StaBsBcal
 redundancy
 
 
– Temporal
 redundancy
 

Video
 Compression
 Basics
 

8
 

0 1
  2
  3
 

•  Intra
 predicBon
 

Spa/al
 Redundancy
 Removal
 (1)
 

Frame
 
0
 

current
 block
 
to
 be
 coded
 

horizontally
 
predicted
 block
 

previous
 
block
 

Intra
 
predicBon
 

encode
 
difference
 

9
 

•  Block
 Transforms
 
–  Typically
 matrix
 operaBons
 
 
–  Used
 for
 correlaBon
 
reducBon
 and
 energy
 
compacBon
 in
 the
 block
 

 

 

Spa/al
 Redundancy
 Removal
 (2)
 
151 149 145 140 136 133 128 120

150 147 144 140 136 132 127 118

149 145 142 138 135 129 122 116

147 143 139 136 131 126 120 113

141 139 137 132 127 124 116 109

138 135 133 130 125 120 113 106

135 131 130 128 123 117 111 105

132 130 129 126 120 115 109 105

1037 80 0 9 0 4 0 0

49 1 3 3 0 0 0 1

0 0 1 0 0 0 0 0

0 0 0 1 0 0 0 0

0 0 1 0 0 0 0 0

1 1 1 1 2 0 0 0

0 1 0 0 0 0 0 0

0 0 0 0 0 0 1 0

8×8
 2D
 Discrete
 
Cosine
 Transform
 

(DCT)
 

10
 

•  Not
 all
 video
 data
 are
 equally
 significant
 from
 a
 perceptual
 
point
 of
 view
 

•  Make
 use
 of
 the
 properBes
 of
 the
 Human
 Visual
 System
 (HVS)
 
–  HVS
 is
 more
 sensiBve
 to
 low
 frequency
 informaBon
 

Perceptual
 Redundancy
 Removal
 (1)
 

Low
 
Frequency
 

High
 
Frequency
 

11
 

•  QuanBzaBon
 is
 a
 good
 tool
 for
 perceptual
 redundancy
 
removal
 
–  Most
 significant
 bits
 (MSBs)
 are
 perceptually
 more
 important
 than
 least
 
significant
 bits
 (LSBs)
 

–  Coefficient
 dropping
 (quanBzaBon
 with
 zero
 bits)
 example:
 

Perceptual
 Redundancy
 Removal
 (2)
 

Original
 frame
  Image
 obtained
 by
 retaining
 36
 DCT
 
coefficients
 for
 each
 8×8
 block
 

12
 

•  Not
 all
 pixel
 values
 in
 an
 image
 (or
 in
 the
 transformed
 image)
 
occur
 with
 equal
 probability
 

•  Use
 entropy
 coding
 (e.g.
 variable
 length
 coding)
 
–  Shorter
 codewords
 used
 to
 represent
 more
 frequent
 values
 
–  Longer
 codewords
 used
 to
 represent
 less
 frequent
 value
 

Sta/s/cal
 Redundancy
 Removal
 (1)
 

13
 

•  Original
 image:
 8
 bits/pixel,
 Entropy
 coding:
 7.14
 bits/pixel
 

 

•  Results
 more
 dramaBc
 when
 entropy
 coding
 is
 applied
 on
 
transformed
 and
 quanBzed
 image:
 1.82
 bits/pixel
 

 
 

Sta/s/cal
 Redundancy
 Removal
 (2)
 

Histogram
 

0 50 100 150 200 250
0

200

400

600

800

1000

1200

1400

1600

1800

-500 0 500 1000 1500 2000
0

0.5

1

1.5

2

2.5
x 104

Histogram
 

14
 

•  Inter
 predicBon
 
•  Frame
 difference
 coding
 

–  Difference
 can
 be
 encoded
 
using
 DCT
 +
 QuanBzaBon
 +
 
Entropy
 Coding
 

Temporal
 Redundancy
 Removal
 (1)
 

Frame
 3
 

Frame
 4
 –
 Frame
 3
 

Frame
 4
 

15
 

Temporal
 Redundancy
 Removal
 (2)
 

•  Inter
 predicBon
 using
 MoBon
 compensated
 predicBon
 

–  Divide
 the
 frame
 into
 blocks
 and
 apply
 block
 moBon
 esBmaBon/
compensaBon
 

–  For
 each
 block
 find
 out
 the
 relaBve
 moBon
 between
 the
 current
 block
 
and
 a
 matching
 block
 of
 the
 same
 size
 in
 the
 previous
 frame
 

–  Transmit
 the
 moBon
 vector(s)
 for
 each
 block
 

Frame
 t-­‐1
  Frame
 t
 

mv

16
 

•  Intra
 Picture
 (I)
 
 
– Picture
 is
 coded
 without
 reference
 to
 other
 pictures
 

•  Inter
 picture
 (P,
 B,
 b)
 
– Uni-­‐direcBonally
 predicted
 (P)
 Picture
 
 

•  Picture
 is
 predicted
 from
 one
 prior
 coded
 picture
 
– Bi-­‐direcBonally
 predicted
 (B,
 b)
 Picture
 

•  Picture
 is
 coded
 from
 two
 prior
 coded
 pictures
 

Temporal
 Predic/on
 and
 
 
Picture
 Coding
 Types
 

I
  b B Pb

17
 

Summary
 of
 Key
 Steps
 in
 Video
 Coding
 
•  Intra
 PredicBon
 and
 Inter
 PredicBon
 

18
 

Transform
 
and
 

QuanBzaBon
 

many
 
pixels*
 

few
 
coefficients
 

•  Transform
 and
 QuanBzaBon
 of
 residual
 (predicBon
 error)
 

•  Entropy
 coding
 on
 syntax
 elements
 
e.g.
 predicBon
 modes,
 moBon
 vectors,
 coefficients
 
 

 

previous
  current
 

moBon
 
vector
 

predicBon
 
mode
 

Inter
 PredicBon
 
(MoBon
 
CompensaBon)
 

Intra
 
PredicaBon
 

•  In-­‐loop
 filtering
 to
 reduce
 coding
 arBfacts
 
*
 Residual
 figure
 from
 J.
 Apostolopoulos,
 
 “Video
 Compression,”
 MIT
 6.344
 Lecture,
 Spring
 2004
 

Video
 Compression
 Standards
 

•  Ensures
 inter-­‐operability
 between
 encoder
 and
 decoder
 
•  Support
 mulBple
 use
 cases
 and
 applicaBons
 
 

– Levels
 and
 Profiles
 
•  Video
 coding
 standard
 specifies
 decoder:
 mapping
 of
 bits
 to
 pixels
 
•  ~2x
 improvement
 in
 compression
 every
 decade
 

 

Pre-­‐Processing
  Encoding
 
Source
 

DesBnaBon
 
Post-­‐Processing
  Decoding
 

Scope
 of
 Standard
 
1994
  2003
  2013
 

MPEG-­‐2
 

H.264/AVC
 
HEVC
 

bit-­‐rate
 

19
 19
 

• MPEG:
 
 Moving
 Picture
 Experts
 Group
 (ISO/IEC)
 
• VCEG:
 Video
 Coding
 Experts
 Group
 (ITU-­‐T)
 
• Other
 standards:
 VC1,
 VP8/VP9,
 China
 AVS,
 RealVideo
 

History
 of
 Video
 Coding
 Standards
 

1984
 

VCEG
 

MPEG/
 
VCEG
 

MPEG
 

1986
  1988
  1990
  1992
  1994
  1996
  1998
  2000
  2002
  2004
 

MPEG-­‐1
  MPEG-­‐4
 

MPEG-­‐2/
 
 
H.262
 

H.264/
 
MPEG-­‐4
 Part
 10-­‐AVC
 

H.261
  H.263
  H.263+
  H.263++
 

20
 20
 

Video
 Coding
 Progress
 

Source:
 T.
 Wiegand,
 JVT-­‐W132,
 2007
  21
 21
 

H.264/MPEG-­‐4
 AVC
 

•  Completed
 (version
 1)
 in
 May
 2003
 
•  H.264/AVC
 is
 the
 most
 popular
 video
 standard
 in
 
market
 
–  80%
 of
 video
 on
 the
 internet
 is
 encoded
 with
 H.264/AVC
 

•  ApplicaBons
 include
 
–  HDTV
 broadcast
 satellite,
 cable,
 and
 terrestrial
 
–  video
 content
 acquisiBon
 and
 ediBng
 
–  camcorders,
 security
 applicaBons,
 Internet
 and
 mobile
 
network
 video,
 Blu-­‐ray
 Discs
 

–  real-­‐Bme
 video
 chat,
 video
 conferencing,
 and
 telepresence
 

•  ~50%
 higher
 coding
 efficiency
 than
 MPEG-­‐2
 (used
 
in
 DVD,
 US
 terrestrial
 broadcast)
 

22
 

•  PredicBon
 
–  Intra
 predicBon
 using
 neighboring
 samples
 
–  Temporal
 predicBon
 using
 mulBple
 frames
 
–  MoBon
 compensaBon
 on
 variable
 block
 size,
 quarter-­‐pel
 

•  Transform
 
–  4×4/8×8
 Integer
 transform,
 2×2/4×4
 Secondary
 Hadamard
 

•  QuanBzaBon
 
–  Finer
 quanBzaBon
 supported
 

•  Entropy
 coding
 
–  Context
 adapBve
 variable
 length
 coding
 (CAVLC)
 and
 arithmeBc
 coding
 
(CABAC)
 

•  In-­‐loop
 deblocking
 filter
 

Improvements
 of
 H.264/MPEG-­‐4
 
AVC
 over
 previous
 standards
 

23
 

Part
 II:
 High
 Efficiency
 Video
 Coding
 
(HEVC)
 
 

•  Achieves
 2x
 higher
 compression
 compared
 to
 H.264/AVC
 
 
•  High
 throughput
 (Ultra-­‐HD
 8K
 @
 120fps)
 &
 low
 power
 

–  ImplementaBon
 friendly
 features
 (e.g.
 built-­‐in
 parallelism)
 
•  Benefits
 include
 

–  reduce
 the
 burden
 on
 global
 networks
 
– easier
 streaming
 of
 HD
 video
 to
 mobile
 devices
 
– account
 for
 advancing
 screen
 resoluBons
 (e.g.
 Ultra-­‐HD)
 

 

 

High
 Efficiency
 Video
 Coding
 (HEVC)
 

“HEVC
 will
 provide
 a
 flexible,
 
reliable
 and
 robust
 solu9on,
 
future-­‐proofed
 to
 support
 the
 
next
 decade
 of
 video”
 
 
 

 
 
 
-­‐
 ITU-­‐T
 Press
 Release
 (2013)
 

Samsung
 
Galaxy
 S4
 
 

Live
 delivery
 of
 
French
 Open
 

Neulix
 
Ultra-­‐HD
 4K
 
 

Samsung
 TV
 
Ultra-­‐HD
 4K
 

25
 

Ac/vity
 in
 JCT-­‐VC
 Commi_ee
 

•  Chairs
 
 
–  G.
 J.
 Sullivan
 (Microsov)
 
–  J.
 R.
 Ohm
 (Aachen
 University)
 

•  Meet
 Quarterly
 
–  1st
 meeBng
 (A)
 [January
 2010]
 
…..
 

–  12th
 meeBng
 (L)
 [January
 2013]
 
•  ~250
 aNendees
 per
 meeBng
 
represenBng
 ~70
 companies
 

•  Several
 hundred
 contribuBons
 per
 
meeBng
 

•  Each
 meeBng
 is
 around
 9
 -­‐
 10
 days
 
(14+
 hours/day)
 

•  MulBple
 parallel
 tracks
 

0

200

400

600

800

1000

1200

A B C D E F G H I J

Attendees Contributions

26
 

•  MeeBng
 ContribuBons
 
–  hNp://phenix.int-­‐evry.fr/jct/
 

•  SpecificaBon
 
–  hNp://www.itu.int/ITU-­‐T/recommendaBons/rec.aspx?rec=11885
 

•  Reference
 Sovware
 (HM)
 
–  hNps://hevc.hhi.fraunhofer.de/svn/svn_HEVCSovware/
 

HEVC
 Reference
 Documents
 

•  References
 
 
–  G.
 J.
 Sullivan,
 et
 al.
 “Overview
 of
 the
 High
 Efficiency
 
Video
 Coding
 (HEVC)
 standard,”
 IEEE
 Transac9ons
 
on
 Circuits
 and
 Systems
 for
 Video
 Technology,
 2012
 

–  V.
 Sze,
 M.
 Budagavi,
 G.
 J.
 Sullivan
 (Editors),
 “High
 
Efficiency
 Video
 Coding
 (HEVC):
 Algorithms
 and
 
Architectures,”
 Springer,
 2014
hNp://www.springer.com/engineering/signals/book/
978-­‐3-­‐319-­‐06894-­‐7
  27
 

Coding
 Efficiency
 of
 HEVC
 (Objec/ve)
 

J.
 R.
 Ohm
 et
 al.,
 “Comparison
 of
 the
 
Coding
 Efficiency
 of
 Video
 Coding
 
Standards—Including
 High
 Efficiency
 
Video
 Coding
 (HEVC),”IEEE
 
Transac9ons
 on
 Circuits
 and
 Systems
 
for
 Video
 Technology,
 2012
 

28
 

PSNR =10 log10
(2bitdepth −1)2 *W *H

{Oi −Di}
2

i

Coding
 Efficiency
 of
 HEVC
 (Subjec/ve)
 

J.
 Ohm
 et
 al.,
 “Comparison
 of
 the
 Coding
 Efficiency
 of
 Video
 Coding
 Standards—Including
 High
 Efficiency
 
Video
 Coding
 (HEVC),”IEEE
 Transac9ons
 on
 Circuits
 and
 Systems
 for
 Video
 Technology,
 2012
 

Sequences
  Bit-­‐rate
 Savings
 

BQ
 Terrace
  63.1%
 

Basketball
 Drive
  66.6%
 

Kimono1
  55.2%
 

Park
 Scene
  49.7%
 

Cactus
  50.2%
 

BQ
 Mall
  41.6%
 

Basketball
 Drill
  44.9%
 

Party
 Scene
  29.8%
 

Race
 Horse
  42.7%
 

Average
  49.3%
 

SubjecBve
 Tests
 for
 Entertainment
 ApplicaBons
 
(Random
 Access)
 

29
 

H.265/HEVC
 vs.
 H.264/AVC
 Decoder
 

Entropy
 
Decoder
 

Q-­‐1
 +T-­‐1
 

Intra
 
 
PredicBon
 

MoBon
 
 
Comp.
 

+
  Deblocking
 
 
Filter
 

Picture
 
Buffer
 

Encoded
 
bitstream
 

Decoded
 
pixels
 

In-­‐loop
 Filter
 

Sample
 
 
AdapBve
 
Offset
 

High
 Throughput
 
CABAC
 &
 

Advanced
 MoBon
 
Vector
 PredicBon
  Larger
 Transforms
 

and
 More
 Sizes
 

More
 
PredicBon
 
Modes
 

Larger
 
InterpolaBon
 

Filter
  Fewer
 
Edges
 

Larger
 and
 Flexible
 Coding
 
Block
 Size
 
 

64×64
 

30
 

Key
 Features
 In
 HEVC
 

High
 Coding
 
Efficiency
 

High
 Throughput
 /
 
Low
 Power
 

Larger
 and
 Flexible
 Coding
 Block
 Size
  X
 
More
 SophisBcated
 Intra
 PredicBon
  X
 
Larger
 InterpolaBon
 Filter
 for
 MoBon
 
CompensaBon
 

X
 

Larger
 Transform
 Size
  X
 
Parallel
 Deblocking
 Filter
  X
 
Sample
 AdapBve
 Offset
  X
 
High
 Throughput
 CABAC
  X
  X
 
High
 Level
 Parallel
 Tools
  X
 
Parallel
 Merge/Skip
  X
 

M.
 Zhou,
 V.
 Sze,
 M.
 Budagavi,
 “Parallel
 Tools
 in
 HEVC
 for
 High-­‐Throughput
 Processing,”
 SPIE
 
Op9cal
 Engineering
 +
 Applica9ons,
 Applica9ons
 of
 Image
 Processing
 XXXV,
 2012.
 

31
 

Larger
 Coding
 Blocks
 

•  Each
 frame
 is
 broken
 up
 into
 blocks
 
•  Large
 block
 sizes
 reduce
 signaling
 overhead
 
•  In
 H.264/AVC,
 macroblock
 is
 always
 16×16
 pixels
 

–  Each
 macroblock
 is
 either
 inter
 or
 intra
 coded
 

•  In
 HEVC,
 Coding
 Tree
 Unit
 (CTU)
 can
 have
 up
 to
 64×64
 pixels
 
–  CTU
 can
 have
 a
 combinaBon
 of
 inter
 and
 intra
 coded
 blocks
 

 

N=16,
 32,
 or
 64
 

N
 

N
 

32
 

Flexible
 Coding
 Block
 Structure
 
 

•  BeNer
 adaptaBon
 to
 different
 video
 content
 
•  CTU
 divided
 into
 Coding
 Units
 (CU)
 with
 Quad
 tree
 
•  Coding
 units
 divided
 into
 predicBon
 units
 (PU)
 
•  PU
 have
 different
 moBon
 data
 or
 predicBon
 modes
 

 

Coding
 
Tree
 Unit
 
(CTU)
 

PredicBon
 Unit
 
(PU)
 

skip
 
Coding
 Tree
 

composed
 of
 Coding
 
Units
 (CU)
 

Asymmetric
 
MoBon
 
ParBBon
  33
 

•  Intra-­‐Coded
 CU
 can
 only
 be
 divided
 into
 square
 parBBon
 units
 
 
–  For
 a
 CU,
 make
 decision
 to
 split
 into
 four
 PU
 (8×8
 CUs
 only)
 or
 single
 PU
 
 
 

 

 

 

•  Inter-­‐Coded
 CU
 can
 be
 divide
 into
 square
 and
 non-­‐square
 PU
 
as
 long
 as
 one
 side
 is
 at
 least
 4
 pixels
 wide
 (note:
 no
 4×4
 PU)
 

Predic/on
 Units
 

Two
 methods
 of
 
parBBoning
 for
 
intra-­‐coded
 CU
 

Eight
 methods
 of
 
parBBoning
 for
 
inter-­‐coded
 CU
 

N

N N/2
 

N/2
 

N
 

N
  N
  N/2
  N/2
 

N/2
  N/2
 N
 

3N/4
 

N/4
 
N
  N
 

3N/4
 
N/4
 

N
  N
 

3N/4
 N/4
  3N/4
  N/4
 

34
 

Large
 Transforms
 

 

•  HEVC
 supports
 4×4,
 8×8,
 16×16,
 32×32
 integer
 transforms
 
–  Two
 types
 of
 4×4
 transforms
 (IDST-­‐based
 for
 Intra,
 IDCT-­‐based
 for
 Inter);
 
IDCT-­‐based
 transform
 for
 8×8,
 16×16,
 32×32
 block
 sizes
 

–  Integer
 transform
 avoids
 encoder-­‐decoder
 mismatch
 and
 driv
 caused
 by
 
slightly
 different
 floaBng
 point
 representaBons.
 
 

–  Parallel
 friendly
 matrix
 mulBplicaBon/parBal
 buNerfly
 implementaBon
 
–  Transform
 size
 signaled
 using
 Residual
 Quad
 Tree
 

•  Achieves
 5
 to
 10%
 increase
 in
 coding
 efficiency
 
•  Increased
 complexity
 compared
 to
 H.264/AVC
 

–  8x
 more
 computaBons
 per
 coefficient
 
–  16x
 larger
 transpose
 memory
 
 

Transform
 and
 
QuanBzaBon
 

many
 
 
pixels
 

few
 
coefficients
 

Represent
 residual
 of
 
CU
 with
 TU
 quad
 tree
 

35
 M.
 Budagavi
 et
 al.,
 “Core
 Transform
 Design
 in
 the
 High
 Efficiency
 Video
 Coding
 (HEVC)
 Standard,”
 IEEE
 JSTSP,
 2013
 
 

Intra
 Predic/on
 

•  H.264/AVC
 has
 10
 modes
 
–  angular
 (8
 modes),
 DC,
 planar
 

•  HEVC
 has
 35
 modes
 
–  angular
 (33
 modes),
 DC,
 planar
 

•  Angular
 predicBon
 
–  Interpolate
 from
 reference
 pixels
 
at
 locaBons
 based
 on
 angle
 

•  DC
 
 
–  Constant
 value
 which
 is
 an
 
average
 of
 neighboring
 pixels
 
(reference
 samples)
 

•  Planar
 
–  Average
 of
 horizontal
 and
 
verBcal
 predicBon
 

17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2

18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

0 : Intra_Planar
1 : Intra_DC
35: Intra_FromLuma

Horizontal
 
mode
 

VerBcal
 mode
 

0:
 Planar
 
1:
 DC
 

2..34:
 Angular
 

36
 

Intra
 Predic/on
 Modes
 

J.
 Lainema,
 W.-­‐J.
 Han,
 “Intra
 PredicBon
 in
 HEVC,”
 High
 Efficiency
 Video
 Coding
 (HEVC):
 Algorithms
 and
 
Architectures,
 Springer,
 2014.
 
  37
 

Removing
 Intra
 Ar/facts
 (Pre-­‐Processing)
 

w/o
 pre-­‐filter
  w/
 pre-­‐filter
 

Image
 source:
 M.
 Wien,
 
TCSVT,
 July
 2003
 
 

•  Reference
 Sample
 Smoothing
 
 
–  Smooth
 out
 neighboring
 pixels
 (i.e.,
 reference
 
samples)
 before
 using
 them
 for
 predicBon
 

–  Reduce
 contouring
 arBfacts
 caused
 by
 edges
 in
 
the
 reference
 sample
 arrays
 

–  Two
 modes
 
•  Three-­‐tap
 smoothing
 filter
 
 
•  Strong
 intra
 smoothing
 with
 corner
 reference
 
pixels
 

–  ApplicaBon
 of
 smoothing
 depends
 on
 PU
 size
 
and
 predicBon
 mode
 

38
 

J.
 Lainema,
 W.-­‐J.
 Han,
 “Intra
 
PredicBon
 in
 HEVC,”
 High
 Efficiency
 
Video
 Coding
 (HEVC):
 Algorithms
 
and
 Architectures,
 Springer,
 2014.
 
 

•  Boundary
 Smoothing
 
–  Intra
 predicBon
 may
 introduce
 disconBnuiBes
 along
 block
 boundaries
 
–  Filter
 first
 predicBon
 row
 and
 column
 with
 three-­‐tap
 filter
 for
 DC
 
predicBon,
 and
 two-­‐tap
 for
 horizontal
 and
 verBcal
 predicBon
 

Removing
 Intra
 Ar/facts
 (Post-­‐Processing)
 

Image
 source:
 JCTVC-­‐F172,
 July
 2011
  39
 

Inter
 Predic/on
 

•  MoBon
 vectors
 can
 have
 up
 to
 ¼
 pixel
 accuracy
 (interpolaBon
 required)
 

 

 

•  In
 H.264/AVC,
 luma
 uses
 6-­‐tap
 filter,
 and
 chroma
 uses
 bilinear
 filter
 
•  In
 HEVC,
 luma
 uses
 8/7-­‐tap
 and
 chroma
 uses
 4-­‐tap
 

– Different
 coefficients
 for
 ¼
 and
 ½
 posiBons
 
 
•  Restricted
 predicBon
 on
 small
 PU
 sizes
 

4×4
 block
 in
 current
 
frame
 

Reference
 block
 
in
 previous
 frame
 
Vector
 (1,
 -­‐1)
 

Reference
 block
 
in
 previous
 frame
 
Vector
 (0.5,
 -­‐0.5)
 

40
 

Interpola/on
 Filter
 

Require
 integer
 
pixels
 (highlighted
 in
 
red)
 to
 interpolate
 
fracBonal
 pixels
 

(highlighted
 in
 blue)
 

 

To
 interpolate
 NxN
 
pixels
 requires
 up
 to
 
 

(N+7)x(N+7)
 
reference
 pixels
 

41
 

Use
 1-­‐D
 filters
 
(order
 maNers
 for
 
greater
 than
 8-­‐bit
 

video)
 

Mode
 Coding
 

•  Predict
 modes
 from
 neighbors
 to
 reduce
 syntax
 element
 bits
 
– Intra
 PredicBon
 Mode
 

 

 

– Advance
 MoBon
 Vector
 PredicBon
 (AMVP),
 Merge/Skip
 Mode
 
 

current
 
PU
 

B
 

A
 

Current PU
A1

A0

B1
B0

B2

Co-located PU

CR

H

42
 

current
 
PU
 

co-­‐located
 PU
 

3
 candidates
 

2
 to
 5
 candidates
 

Merge
 Mode
 

Moving
 Object
  Without
 Merge
 
(many
 extra
 moBon
 parameters)
 

With
 Merge
 

B.
 Bross
 et
 al.,
 “Inter
 PredicBon
 in
 HEVC,”
 High
 Efficiency
 Video
 Coding
 (HEVC):
 Algorithms
 and
 
Architectures,
 Springer,
 2014.
 
  43
 

AMVP
  Merge
  Skip
 
Syntax
 
elements
 

mvp_l0_flag,
 
mvp_l1_flag
 

merge_flag,
 
merge_idx
 

cu_skip_flag,
 
merge_idx
 

Use
 of
 
neighbors
 
candidates
 

Predict
 moBon
 
vector
 

Copy
 moBon
 data
 
(moBon
 vector,
 
reference
 index,
 
direcBon)
 

Copy
 moBon
 data
 
(moBon
 vector,
 
reference
 index,
 
direcBon);
 no
 residual
 

Number
 of
 
Candidates
 

Up
 to
 2
  Up
 to
 5
 (signaled
 in
 slice
 header)
 

SpaBal
 
  Up
 to
 2
 of
 5
 
(scaling
 if
 reference
 
index
 different)
 

Up
 to
 4
 of
 5
 (no
 scaling,
 only
 redundancy
 
check)
 

Temporal
  Up
 to
 1
 of
 2
 (if
 <  2   spaBal  candidates)   Up  to  1  of  2  (always  added  to  list  if  available)   AddiBonal   Zero  moBon  vector   (if  <  2  spaBal  or  temp   candidates)   Bi-­‐predicBve  candidates  and  zero  moBon   vector   AMVP,  Merge,  Skip  Mode   44   In-­‐loop  Filtering:  Deblocking  Filter   •  Removes  blocking  arBfacts  due  to  block  based  processing   –  ComputaBonally  intensive  in  H.264/AVC     •  In  H.264/AVC,  performed  on  every  4x4  block  edge   –  Each  macroblock  has  128  pixel  edges,  32  edge  calculaBons   –  Each  4x4  depends  on  neighboring  4x4   •  In  HEVC,  performed  on  every  8x8  block  edge   –  Each  16x16  CTU  has  64  pixel  edges,  8  edge  calculaBons   –  All  8x8  are  independent  (can  be  processed  in  parallel)   w/o  deblocking   w/  deblocking   45   16   16   In-­‐loop  Filtering:  Sample  Adap/ve  Offset   (SAO)       •  Filter  to  address  local  disconBnuiBes     –  Edge  Offset  and  Band  Offset   •  Check  neighbors  in  one  of  4  direcBons  (0,  90,  135,  45  degrees)   •  Based  on  the  values  of  the  neighbors,  apply  one  of  4  offsets   pixel index x-1 x x+1 pi xe l l ev el category 1 pixel index x-1 x x+1 pi xe l l ev el category 2 pixel index x-1 x x+1 pi xe l l ev el pixel index x-1 x x+1 pi xe l l ev el category 3 pixel index x-1 x x+1 pi xe l l ev el pixel index x-1 x x+1 pi xe l l ev el category 4 c c c c 46   In-­‐loop  Filtering:  Sample  Adap/ve   Offset  (SAO)       With  SAO   Without  SAO   C.-­‐M.  Fu  et  al.,  "Sample  AdapBve  Offset  in  the  HEVC  Standard,”  IEEE  Transac9ons  on    Circuits  and   Systems  for  Video  Technology,  2012   47   Entropy  Coding   •  Lossless  compression  of  syntax  elements   •  HEVC  uses  Context  AdapBve  Binary  ArithmeBc  Coding  (CABAC)   –  10  to  15%  higher  coding  efficiency  compared  to  CAVLC   V.  Sze,  D.  Marpe,  “Entropy  Coding  in  HEVC,”  High  Efficiency  Video  Coding  (HEVC):  Algorithms  and   Architectures,  Springer,  2014.     48   CABAC  Throughput  Improvements   •  Reduce  total  number  of  bins   •  Reduce  context  coded  bins   •  Reduce  context  dependencies   •  Grouping  bypass  bins   •  Reduce  parsing  dependencies   •  Reduce  memory  requirements   bits   De-­‐Binarizer     (DB)   ArithmeBc   Decoder  (AD)   Context     Memory   Context     SelecBon    (CS)   syntax   elements   Context  Modeling  (CM)   bins   probability   bypass   V.  Sze,  M.  Budagavi,  “High  Throughput  CABAC  Entropy  Coding  in  HEVC,”  IEEE  TCSVT,  2012        Total   bins     Context   bins   Bypass   bins     H.264/AVC   20861   7805   13056   HEVC   14301   884   13417   RaBo   1.5x   9x   1x   ReducBon  in  worst  case  bins  for  16x16  pixels   •  3x  reducBon  in  context  memory   •  20x  reducBon  in  line  buffer  for   context  selecBon   49   0   1   1   0   1   0   1   0   0   0   1   0   1   0   15  cycles   0   1  1   0  1   0  1   0  0   0  1   0  1   0   9  cycles   1   0   1  cycle   1  cycle   High  Level  Parallel  Tools  (Mul/-­‐Core)   substream  0   substream  1   substream  2   substream  3   Ble  1  Ble  0   Ble  3  Ble  2   Wavefront  Parallel  Processing     (Interleaved  Entropy  Slices*)     Slices   (also  in  H.264/AVC)   Tiles   slice  0   slice  1   slice  2   slice  3   *D.  Finchelstein,  V.  Sze,  A.  P.  Chandrakasan,  “MulB-­‐core  Processing  and  Efficient  On-­‐chip  Caching  for   H.264  and  Future  Video  Decoders,”  IEEE  Trans.  CSVT,  2009   50   Addi/onal  Modes   •  For  wireless  display  and  cloud   compuBng,  screen  content  coding   should  be  considered   •  Screen  content  typically  has  more   edges   •  Lossless   –  Bypass  transform,  quanBzaBon  and  in-­‐ loop  filters   •  Transform  Skip     –  Bypass  transform,  but  conBnue  to   perform  quanBzaBon  and  in-­‐loop  filters   •  I_PCM   –  Signal  raw  pixels       source: www.techprollc.com 51   Profiles,  Levels,  Tiers   •  Profile  defines  set  of  tools  for   different  applicaBons   –  Main,  Main  10,  Main  SBll  Picture   –  8-­‐bits/sample  à  16.78  million  colors   –  10-­‐bits/sample  à  1.07  billion  colors   •  Level  defines  the  maximum   supported  resoluBon  and  frame   rate   –  e.g.  Level  4.0,  1920x1080  @  32  fps   –  Level  5.0,  4096x2160  @  30  fps   •  Bit-­‐rates  defined  by  level  and  Ber   –  Main  and  High  (professional)     52   … … … Main  S/ll  Picture  (Intra  Coding  Only)   •  HEVC  also  provides  improved  compression  for  sBll  images   BD-­‐Rate   Reduc/on   H.264/AVC  (intra  only)   15.8%   JPEG  2000   22.6%   JPEG  XR   30.0%   Web  P   31.0%   JPEG     43.0%   T.  Nguyen,  D.  Marpe,  “Performance  Comparison   of  HM  6.0  with  ExisBng  SBll  Image  Compression   Schemes  Using  a  Test  Set  of  Popular  SBll   Images”  JCTVC-­‐I0595,  2012   53   Part  III:  Video  Codec  Implementa/ons   •  FuncBon   –  Mapping  of  bitstream  to  pixels  fixed  by  the  standard   •  ImplementaBon  Requirements   –  Conformance:  Support  all  tools  for  a  given  profile  in  the  standard   –  Throughput:  Real-­‐Bme  processing  for  video  playback;  level  specifies   pixel-­‐rate  and  bit-­‐rate   Decoder  Design  Considera/ons   10101011   Decoder   bitstream   at  specified   bit-­‐rate   pixels  at   specified   pixel-­‐rate   55   •  FuncBon   –  Mapping  of  pixels  to  standard  compliant  bitstream   –  Flexibility  of  selecBng  which  set  of  encoding  tools  to  use  and  how  to  use   them  (e.g.  how  to  search  for  best  compression  mode)     Encoder  Design  Considera/ons  (1)   56   10101011   Encoder   bitstream   at  specified   bit-­‐rate  or   compression   ra/o   pixels  at   specified   pixel-­‐rate  for   real-­‐/me   applica/ons   •  ImplementaBon  Requirements   –  Conformance:  Must  generate  a  bitstream  that  is  decodable  by  a   standard  compliant  decoder  (for  a  given  profile)   –  Throughput:  For  real-­‐Bme  applicaBons,  need  to  meet  pixel-­‐rate   requirements;  can  be  done  off-­‐line  for  storage  applicaBons   –  Bit-­‐rate/Compression  Ra9o:  For  given  applicaBon,  must  meet  minimum   compression  requirements   –  Compression  ra9o  vs.  Complexity:  Find  compression  mode  that  meets   compression  requirements  under  complexity  constraint     Encoder  Design  Considera/ons  (2)   Decoder  design  requires  architecture  innovaBons,  while  encoder   design  requires  both  algorithm  and  architecture  innovaBons   57   Desktop   CPU  [1]   Mobile   CPU    [1]   GPU+CPU   [2]   DSP   [3]   FPGA   [4]   ASIC   [5,6]   Flexibility   High   High   Med/High   Med   Med   Low   Development  Cost   Low   Low   Low/Med   Med   Med   High   Speed/  Throughput   Low/Med   Low   Med   Med   Med   High   Power  Consump/on   High   Med   High   Med   Med   Low   Mul/media  Plakorms   Examples  of  HEVC  implementa/ons   [1]  F.  Bossen  et  al.,  "HEVC  Complexity  and  ImplementaBon  Analysis,"  IEEE  TCSVT,  2012   [2]  INanim  Systems,  “Compute  accelerated  HEVC  decoder  on  ARM®  MaliTM-­‐T600  GPUs”   [3]  F.  Pescador  et  al.,  "On  an  implementaBon  of  HEVC  video  decoders  with  DSP   technology,”  IEEE  ICCE,  2013   [4]  S.  Cho,  H.  Kim,  “ImplementaBon  of  a  HEVC  Hardware  Decoder,”  JCTVC-­‐L0098,  2013   [5]  C.-­‐T.  Huang  et  al.  "A  249Mpixel/s  HEVC  video-­‐decoder  chip  for  Quad  Full  HD   applicaBons,”  IEEE  ISSCC,  2013.   [6]  S.-­‐F.  Tsai  et  al.  "A  1062Mpixels/s  8192×  4320p  High  Efficiency  Video  Coding  (H.265)   encoder  chip,”  IEEE  VLSIC,  2013.   58   •  Throughput     –  Achieve  target  pixel-­‐rate  and  bit-­‐rate  for  real-­‐Bme  applicaBons   –  Reduce  latency  of  bits  to  pixels  and  pixels  to  bits  for  interacBve  applicaBons   –  Techniques:  parallelism,  pipelining,  eliminate  stalls   •  Energy  and  Power  ConsumpBon   –  Minimize  energy  consumpBon  to  extend  baNery  life  for  portable  devices   –  Minimize  power  consumpBon  to  reduce  heat  dissipaBon   –  Techniques:  voltage  scaling,  frequency  scaling,  power  gaBng,  number  of  ops   •  Plauorm  Cost   –  Reduce  amount  of  data  to  be  stored  in  memory  and  amount  of  logic  (e.g.   gates  in  ASIC,  number  of  cores  for  processors)  to  reduce  size  of  chip   –  Reduce  bandwidth  requirements  such  as  reads/writes  from  memory  to   reduce  demands  on  off-­‐chip  components   –  Techniques:  shared  computaBons,  on-­‐the-­‐fly  processing,  caching   Implementa/on  Requirements   59   •  ARMv7  1.3GHz  (mobile  processor)  [Bossen,  JCTVC-­‐K0327,  2012]   –  Dual  core,  but  decoding  on  single  thread  (other  thread  for  display)   –  1080p  @  24  fps  at  2Mbps  (16  picture  buffer  to  average  workload)   •  Intel  i7  Core  2.6  GHz  (desktop  processor)  [Bossen  et  al.,  TCSVT,  2012]   –  Single  core,  single  thread   –  1080p  @  60  fps  at  7Mbps     •  MulB-­‐thread  Intel  Core  i7  2.7  GHz  [Suzuki  et  al.,  JCTVC-­‐L0098,  2013]     –  4  cores  /  4  threads  (parallel  GOPs)   –  3840x2160  @  76  fps  at  12Mbps  [cropped  8K  content]   •  MulB-­‐thread  Intel  X5680  3.3  GHz  [Chi  et  al.,  TCSVT,  2012]   –  2x6  cores/12  threads  (parallel  Tiles,  WPP)   –  3840x2160  @  24  fps  at  ~12Mbps  (QP=37)   –  3840x2160  @  14  fps  at  ~170Mbps  (QP=22)     Solware  HEVC  Decoder   60   Solware  HEVC  Decoder   Workload  for  different  modules   F.  Bossen  et  al.,  "HEVC  Complexity  and  ImplementaBon  Analysis,"  IEEE  Transac9ons  on  Circuits  and   Systems  for  Video  Technology,  2012   61   Line Buffer for Entropy Decoder Coeff In-loop Filters MC Cache Rec DMA Ref Pixels Line Buffer for Prediction and In-loop Filters Line Buffers Residue Inverse Transform Prediction MV Info Group II Memory Interface Arbiter Top Control ColMV ColMV DMA Group I Entropy Decoder MV Dispatch VPB/Top Info Pixel flow Info flow SRAM Processing Engine DMA flow Legend Hardware  HEVC  Decoder  Architecture   M.  Tikekar  et  al.,  “Decoder  Hardware  Architecture  for  HEVC,”  High  Efficiency  Video  Coding  (HEVC):   Algorithms  and  Architectures,  Springer,  2014.     62   •  Variable-­‐size  pipelining  to  support  a  diverse  set  of  CTU,  CU,  and   PU  sizes  (select  size  to  balance  memory  cost  vs.  data  reuse)   Pipelining  HEVC  Decoder   CTU 64x64 64x32 64x16 64x64 32x32 16x16 Variable-­‐size  Pipeline  Block  (VPB) Source:  C.-­‐T.  Huang  et  al.,  “A  249Mpixels/s  HEVC  Video  Decoder  Chip  for  Quad  Full  HD  ApplicaBons,”  IEEE   ISSCC,  2013.   PPB 0 PPB 1 PPB 2 PPB 3 PPB 0 PPB 1 PPB 0 VPB 64x64 64x32 64x16 PPB (Stage 1) Sub-PPB (Stage 2) 0 1 2 3 4 5 Y U/V 0 1 2 3 4 5 System  level  pipeline   (between  Inv.  Transform,   PredicBon  and  In-­‐Loop  Filters)     Predic/on  level  pipeline     (within  PredicBon  module)   16x16  Pipeline 63   •  Workload  of  entropy  decoding  based  on  bit-­‐rate  (bin-­‐rate),   while  rest  of  decoder  depends  on  pixel-­‐rate   •  Use  FIFO  to  absorb  variaBons  in  workload   –  Higher  FIFO  depth  results  in  less  stalls  due  to  averaging,  but  longer   latency  and  higher  memory  cost   Decoupling  Entropy  Coding   Entropy Decoder MC Dispatch 0 1 2 3 0 1 2 0 1 2 3 0 1 2 0 1 0 Inverse Transform Prediction Deblock REC DMA G ro u p I G ro u p I I Coefficients  in  TU  FIFO 64   Source:  C.-­‐T.  Huang  et  al.,  “A  249Mpixels/s  HEVC  Video  Decoder  Chip  for  Quad  Full  HD  ApplicaBons,”  IEEE   ISSCC,  2013.   Intra  Predic/on   •  Reference  sample  processing   –  Reference  pixel  buffer  to  store  neighboring  pixels  (padding  when  not   available)   –  Apply  smoothing  filter  on  pixels  depending  on  mode   •  Feedback  loop  at  TU  granularity   –  Update  reference  pixel  buffer  accordingly   Intra Prediction Inverse Transform + Intra reference pixels Inter Prediction M.  Tikekar  et  al.,  “Decoder  Hardware  Architecture  for  HEVC,"  High  Efficiency  Video  Coding  (HEVC):   Algorithms  and  Architectures,  Springer,  2014.     65   TU  granularity   feedback   •  Read  samples  from  reference  picture  (typically  stored  in  off-­‐chip   picture  buffer)   –  Use  cache  to  reduce  off-­‐chip  memory  bandwidth   •  InterpolaBon  pixels  used  a  2-­‐D  separable  filter  for  fracBonal   moBon  vectors   –  MulBple  pixels  can  be  interpolated  in  parallel  (share  input  pixels)   •  Smaller  blocks  have  larger  read  overhead  (for  fracBonal  mv)     –  NxN  requires  (N+7)x(N+7)  pixel  reads  à  4x4  inter-­‐PU  not  supported  in  HEVC     Inter  Predic/on   Dispatch MC Cache Fetch 2-D Filter To Reference Picture Buffer (on-chip SRAM/external DRAM) Motion Vectors from Entropy Decoder Inter Predicted Pixels 66   •  Minimize  redundant  reads  from  off-­‐chip  memory  (DRAM)   •  MC  Cache  design  consideraBons   –  Sufficient  throughput  to  support  worst  case  PU   –  Detect  redundant  reads  and  handle  latency  of  DRAM   •  Store  pixels  in  DRAM  to  minimize  row  changes  (cycle  overhead)   –  Avoid  reading  two  rows  from  same  bank  for  a  given  reference  region   MC  Cache  and  Picture  Buffer   20%  reducBon  in   overhead  cycles   0 1 2 4 5 6 7 0 1 4 5 0 1 2 3 0 1 2 3 1 2 3 4 5 6 7 4 5 6 7 0 2 3 5 7 0 1 2 3 4 5 6 7 1 0 2 3 6 7 4 6 #  =  bank  in  DRAM   67   3 M.  Tikekar  et  al.,  “Decoder  Hardware  Architecture  for  HEVC,"  High  Efficiency  Video  Coding  (HEVC):   Algorithms  and  Architectures,  Springer,  2014.     67   •  Larger  transform  à  More  computaBon   –  Share  coefficients  across  transform  sizes  and  within   transform  to  reduce  area  cost   Inverse  Transform   M.  Tikekar  et  al.,  “Decoder  Hardware  Architecture  for  HEVC,”  High  Efficiency  Video  Coding  (HEVC):   Algorithms  and  Architectures,  Springer,  2014.     2x22x2Partial 4x4 Partial 8x8Partial16x16 Even-Odd Index Sort 4x4 add-sub add-sub add-sub 2 IDCT8 IDCT16 add-sub 4 IDCT4 IDST4 IDCT32 2 222 4 4 4 8 32 1616 4 y0 y1 y2 y3 i 18 50 75 89 -50 -89 -18 75 75 18 -89 50 -89 75 -50 18 ui LUT MAC 30%   reducBon  in   area  cost   68   •  Larger  transform  à  Larger  transpose  memory   – Use  SRAM  rather  than  registers  to  reduce  area  cost   –  SRAM  has  limited  read/write  ports  (requires  careful  mapping)   Inverse  Transform   M.  Tikekar  et  al.,  “Decoder  Hardware  Architecture  for  HEVC,”  High  Efficiency  Video  Coding  (HEVC):   Algorithms  and  Architectures,  Springer,  2014.     0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 8 8 8 8 9 9 9 0 16 0 0 1 9 0 0 0 0 0 0 2 10 24 32 016 24 32 016 24 32 016 24 32 017 25 33 017 25 33 017 25 33 017 25 33 018 26 34 3 2 p ix e ls 32 pixels 0 0 0 Bank 0 Bank 1 Bank 2 Bank 3 0 0 7 15 023 31 39 120 120 120 120 121 121 121 121 122 Transform Transpose Memory Dequantize Residue Coeffs row/column select 4 4 4 4 4  pixels/cycle  throughput  per   1-­‐D  transform   4x4   blocks   69   Video  Coding   Standard   HEVC  (HM4)   Technology   TSMC  40-­‐nm   Core  Area   1.33  x  1.33  mm   Gate  Count   715k   On-­‐Chip   Memory  (SRAM)   124  kB   Resolu/on  /   Frame  Rate   4kx2k  @  30fps   (3840x2160)   Frequency   200  MHz   Core  Voltage   0.9  V   Power   76  mW   Hardware  HEVC  Decoder   D is pa tc h   /M C   Ca ch e   En tr op y   D ec od er Predic/on Inverse   Transform Deblock SRAM 2.18  mm 2. 18  m m C.-­‐T.  Huang  et  al.,  “A  249Mpixels/s  HEVC   Video  Decoder  Chip  for  Quad  Full  HD   ApplicaBons,”  IEEE  ISSCC,  2013   70   Area  Breakdown   MC cache 126 Deblock 49.9 Entropy Decoder 94.5 Inverse Transform 121.1 Memory Interface Arbiter 13.7 Prediction 191.9 RegFiles 75.5 Others 42 Pipeline Buffers 447.3 MC-related SRAM 200.4 Line Buffers 337 Others 32.8 Logic   Memory  (SRAM)   M.  Tikekar  et  al.,  “Decoder  Hardware  Architecture  for  HEVC,”  High  Efficiency  Video  Coding  (HEVC):   Algorithms  and  Architectures,  Springer,  2014.     71   [kgates]   [kbits]   Power  Breakdown   Prediction 23% Deblocking 3% MC Cache 26% Inverse Transform 17% Memory Interface Arbiter 2% Entropy Decoder 3% Line Buffers 2% Pipeline Buffers 10% Others 13% M.  Tikekar  et  al.,  “Decoder  Hardware  Architecture  for  HEVC,"  High  Efficiency  Video  Coding  (HEVC):   Algorithms  and  Architectures,  Springer,  2014.     72   Hardware  vs.  Solware   Prediction 23% Deblocking 3% MC Cache 26% Inverse Transform 17% Memory Interface Arbiter 2% Entropy Decoder 3% Line Buffers 2% Pipeline Buffers 10% Others 13% Hardware  (power)   Solware  (cycles)   73   This Work ISSCC'12 [2] ISSCC'10 [3] ISSCC'06 [4] Standard HEVC ("H.265") WD4 H.264/AVC HP/MVC H.264/AVC HP/SVC/MVC H.264/AVC MP Max Specification 3840x2160 @30fps 7680x4320 @60fps 4096x2160 @24fps 1920x1080 @30fps Gate Count 715K 1338K 414K 160K On-Chip SRAM 124KB 80KB 9KB 5KB Technology 40nm/0.9V 65nm/1.2V 90nm/1.0V 0.18µm/1.8V Normalized Core Power* 0.31nJ/pixel 0.21nJ/pixel 0.28nJ/pixel 5.11nJ/pixel Normalized DRAM Power* 0.88nJ/pixel 1.27nJ/pixel N/A N/A Normalized System Power*** 1.19nJ/pixel 1.48nJ/pixel N/A N/A DRAM Configuration 32b DDR3 64b DDR2 N/A 32b DDR + 32b SDR ** ASIC  Decoder  Comparison   Power for max specification Modeled by [5] System Power = Core Power + DRAM Power * ** *** Slide  Source:  C.-­‐T.  Huang  et  al.,  “A  249Mpixels/s  HEVC  Video  Decoder  Chip  for  Quad  Full  HD  ApplicaBons,”   IEEE  ISSCC,  2013.   74   0.0   0.5   1.0   1.5   2.0   2.5   2006   2008   2010   2012   2014   En er gy  p er  p ix el  (n J)   Year   H.264/AVC   H.265/HEVC   D is pa tc h   /M C   Ca ch e   En tr op y   D ec od er Predic/on Inverse   Transform Deblock H.265/HEVC  [WD4]  Decoder  (76mW)   C.T.  Huang  et  al.  (MIT),  ISSCC  2013     H.264/AVC  Decoder  (51mW)   P.K.  Tsung  et  al.  (NTU),  ISSCC  2011     TSMC  40nm,  0.9V  Ultra-­‐HD  4K  @  30  fps   3. 3 m m 3.3 mm MEMORY CONTROLLER DOMAIN CORE DOMAIN SRAM 176 I/O PADS 0.7-­‐V  720p-­‐HD  @  30  fps   H.264/AVC  Decoder  (2mW)    Sze  et  al.  (MIT),  JSSC  2009     Decoder  Power  Comparison   75   Low  Power  Approaches   •  Operate  at  voltage  near   minimum  energy  point   •  UBlize  parallelism  and   pipelining  to  achieve   performance   •  AdapBve/Dynamic  voltage   frequency  scaling     •  OpBmize  access  paNerns   to  reduce  memory  power   Reduce  Cycles  à  Reduce  Freq.  à     Reduce  Voltage  à  Reduce  Power   Delay   Energy  per   operaBon   Supply  Voltage   T   2T   76   V.  Sze  et  al.,  “A  0.7-­‐V  1.8-­‐mW  H.264/AVC  720p  Video  Decoder,”  IEEE  Journal  of  Solid  State   Circuits,  2009.   •  Encoder  must  search  for  mode  that  gives  the  “best”   compression.    Some  of  the  key  decisions  include   –  CU  and  PU  size   –  Inter  or  Intra  CU   –  MoBon  Vector     –  Intra  PredicBon  Mode   •  “Best”  compression  is  defined  using  a  rate-­‐distorBon  cost   •  where     –  D  is  the  distorBon  between  the  original  and  the  compressed  image  (a   measure  of  the  visual  quality  of  the  compression)   –  R  is  a  measure  of  the  number  of  bits  required  to  signal  the  compressed   image   –  λ  is  the  Lagrangian  mulBplier  that  weights  the  distorBon  and  rate  costs     Encoder  Decisions   D+λ ⋅R 77   Perform  rate-­‐distor/on   op/miza/on  (RDO)   •  Full  RDO   –  DistorBon  based  on  sum  of  squared  differences  (SSD),  includes  quanBzaBon   –  Rate  based  on  entropy  coded  bits  of  predicBon  info  and  quanBzed  coefficients   •  Fast  RDO   –  DistorBon  approximaBon  based  on  sum  of  absolute  differences  (SAD)  or  sum  of   absolute  transformed  differences  (SATD)   –  Rate  approximaBon  based  on  predicBon  info  bits  (intra  mode  or  moBon   vector);  Can  include  number  of  non-­‐zero  coefficients  to  predict  coefficient  bits   Full  vs.  Fast  RDO   Intra Prediction Motion Estimation Full RDO Pass Q CABAC Rate T Final Mode Decision T/Q: Transform/Quantization IT/IQ: Inverse Transform / Quantization Fast RDO (30+ modes) ITIQ SSD S.  -­‐F.  Tsai  et  al.,  “Encoder  Hardware  Architecture  for  HEVC,"  High  Efficiency  Video  Coding  (HEVC):   Algorithms  and  Architectures,  Springer,  2014.     RDO  Flow   in  HM   78   •  The  encoder  must  decide  to  how  best  divide  a  CTU  into  CU,   and  how  to  divide  the  CUs  into  PUs  (based  on  full  RDO  in  HM)   •  For  CTU  of  64x64   –  CU  opBons:  64x64,  32x32,  16x16,  8x8   •  For  Inter-­‐coded  CU     –  PU  opBons       •  For  Intra-­‐coded  CU   –   PU  opBons   CU  and  PU  decisions   79   N   N   N   N/2   N/2   N/2   N/2  N   3N/4   N/4   N   N   3N/4   N/4   N   N   3N/4  N/4   3N/4   N/4   N/2   N/2  N   N   •  Search  for  block  in  reference  frame(s)  to  predict  current  block   with  least  rate-­‐distorBon  cost   –  Signal  block  in  previous  frame  using  a  moBon  vector   •  Typically  most  computaBonally  intensive  funcBon  in  encoder   Mo/on  Es/ma/on   Search  algorithm  considera/ons   1.  Number  of  candidates     – Number  of  computaBons   – Number  of  memory  accesses   2.  Off-­‐chip  bandwidth   3.  On-­‐chip  bandwidth     80   •  Integer  pixel  moBon  esBmaBon   – Rate  is  the  bits  required  to  transmit  the  moBon  data   (including  impact  of  moBon  predictor)   – DistorBon  is  calculated  from  the  SAD  of  original  and  moBon-­‐ compensated  predicBon  (subsampled  when  block  size  >
 8)
 

where
 
– MV
 =
 moBon
 vector
 (include
 
impact
 of
 advanced
 mv
 predictor)
 

– REF
 =
 reference
 index
 

 

Mo/on
 Es/ma/on
 in
 HM
 

argmin
MV , REF

Diff (i, j)
i, j
∑ +λ ⋅R(MV, REF)

K.
 McCann
 et
 al
 “High
 Efficiency
 Video
 Coding
 (HEVC)
 Test
 Model
 14
 (HM
 14)
 Encoder
 
DescripBon,”
 JCTVC-­‐P1002,
 2014
 

Current PU
A1

A0

B1
B0

B2

Co-located PU

CR

H

81
 

•  Integer
 pixel
 moBon
 esBmaBon
 
–  Search
 Strategy
 
 
1.  Search
 center
 is
 moBon
 vector
 predictor
 
2.  Diamond
 search
 around
 center
 (search
 range
 

=
 64
 à
 7
 steps
 [1,
 2,
 4..
 64]);
 early
 
terminaBon
 if
 best
 candidate
 doesn’t
 change
 
in
 3
 steps.
 

3.  If
 best
 candidate
 >
 5
 pixels
 away
 from
 search
 
center,
 do
 raster
 scan
 search
 (5
 pixel
 steps).
 

4.  Perform
 diamond
 search
 around
 best
 
candidate
 from
 step
 2
 or
 3.
 
 If
 new
 best
 
candidate
 found
 repeat
 4.
 

Mo/on
 Es/ma/on
 in
 HM
 

Reference
 
•  K.
 McCann
 et
 al
 “High
 Efficiency
 

Video
 Coding
 (HEVC)
 Test
 Model
 
14
 (HM
 14)
 Encoder
 DescripBon,”
 
JCTVC-­‐P1002,
 2014
 

•  M.
 Sinangil,
 PhD
 Thesis,
 MIT,
 2012
 

Image
 Source:
 N.
 Purnachand
 
et
 al.,
 IEEE
 ICCE-­‐Berlin,
 2012
 

82
 

•  Half
 pixel
 moBon
 esBmaBon
 
–  Rate
 is
 the
 bits
 required
 to
 transmit
 the
 moBon
 data
 (including
 impact
 
of
 moBon
 predictor)
 

–  DistorBon
 is
 calculated
 from
 SATD
 
•  Block-­‐wise
 4×4
 or
 8×8
 Hadamard
 transform
 on
 difference
 between
 original
 
and
 moBon-­‐compensated
 predicBon,
 and
 sum
 absolute
 coefficients
 

–  Search
 8
 points
 surrounding
 best
 integer
 moBon
 vector
 

•  Quarter
 pixel
 moBon
 esBmaBon
 
–  Same
 rate
 and
 distorBon
 calculaBon
 as
 half
 pixel
 
–  Search
 8
 points
 surrounding
 best
 half
 pixel
 moBon
 vector
 

•  Also
 do
 search
 for
 merge/skip
 candidates
 
 

 

 

Mo/on
 Es/ma/on
 in
 HM
 

K.
 McCann
 et
 al
 “High
 Efficiency
 Video
 Coding
 (HEVC)
 Test
 Model
 14
 (HM
 14)
 Encoder
 
DescripBon,”
 JCTVC-­‐P1002,
 2014
  83
 

Mul/ple
 Searches
 in
 Parallel
 

M.
 E.
 Sinangil
 et
 al.,
 “Cost
 and
 Coding
 Efficient
 MoBon
 EsBmaBon
 Design
 ConsideraBons
 for
 High
 
Efficiency
 Video
 Coding
 (HEVC)
 Standard,”
 IEEE
 Journal
 of
 Selected
 Topics
 in
 Signal
 Processing,
 2013.
 

Compared
 to
 HM
 
•  2x
 fewer
 candidates
 
•  1%-­‐3%
 coding
 loss
 

84
 

•  Perform
 moBon
 esBmaBon
 for
 each
 PU
 in
 inter-­‐coded
 CU
 
 
•  Process
 CUs
 in
 parallel
 to
 increase
 throughput
 

–  Share
 search
 pixels
 across
 engines
 to
 reduce
 memory
 bandwidth
 by
 8x
 

 

Parallel
 Mo/on
 Es/ma/on
 

M.
 E.
 Sinangil
 et
 al.,
 “Cost
 and
 Coding
 Efficient
 MoBon
 EsBmaBon
 Design
 ConsideraBons
 for
 High
 
Efficiency
 Video
 Coding
 (HEVC)
 Standard,”
 IEEE
 Journal
 of
 Selected
 Topics
 in
 Signal
 Processing,
 2013.
  85
 

Reduce
 Number
 of
 PUs
 Processed
 

M.
 E.
 Sinangil
 et
 al.,
 “Cost
 and
 Coding
 Efficient
 MoBon
 EsBmaBon
 Design
 ConsideraBons
 for
 High
 
Efficiency
 Video
 Coding
 (HEVC)
 Standard,”
 IEEE
 Journal
 of
 Selected
 Topics
 in
 Signal
 Processing,
 2013.
  86
 

0
5

10
15
20
25
30
35
40

0 1 2 3 4 5 6 7 8

C
od

in
g

Lo
ss

(
B

D
-r

at
e)

Area Savings (Mgates)

Number
 of
 Par//on
 Units
 

1
 
2
 

4
  11
 5
 

8
  3
 

6
  7
 

Smallest
 slope
 provides
 
best
 trade-­‐off:
 #3
 

Trade-­‐off
 between
 coding
 efficiency
 (BD-­‐rate)
 and
 complexity
 (area
 cost)
 for
 
different
 number
 of
 inter
 predicted
 parBBons
 units
 

M.
 E.
 Sinangil
 et
 al.,
 “Cost
 and
 Coding
 Efficient
 MoBon
 EsBmaBon
 Design
 ConsideraBons
 for
 High
 
Efficiency
 Video
 Coding
 (HEVC)
 Standard,”
 IEEE
 Journal
 of
 Selected
 Topics
 in
 Signal
 Processing,
 2013.
 

Only
 Square
 
PUs
 

87
 

9
 
10
 

•  In
 HM,
 moBon
 esBmaBon
 done
 serially
 for
 PU
 within
 CU
 to
 get
 
AMVP
 for
 accurate
 rate
 esBmate
 
 

Mo/on
 Es/ma/on
 with
 CU
 

PU2
 
PU1
 

Can’t
 process
 PU1
 and
 PU2
 in
 parallel
 

Current
 PU
 A
 1
 

A0
 

B
 1
  B0
 B2
 

Co
 -­‐
 located
 PU
 

CR
 
 

H
 
 

88
 

Parallel
 Mo/on
 Es/ma/on
 

•  HEVC
 has
 “Parallel
 MoBon
 EsBmaBon”
 feature
 to
 turn
 off
 
dependency
 within
 an
 MoBon
 EsBmaBon
 Region
 (MER)
 

– PU
 within
 region
 cannot
 use
 data
 from
 other
 PU
 in
 region
 
– All
 PUs
 in
 region
 can
 be
 processed
 in
 parallel
 at
 encoder
 

PU2
 
PU1
 

MER
 

Can
 process
 PU1
 and
 PU2
 in
 parallel
 

MER0
  MER1
 

MER2
  MER3
 

X
 X
 

X
 
X
 

X

MulBple
 MERs
 per
 CTU
 

M.
 Zhou,
 “Parallelized
 merge/skip
 mode
 for
 HEVC,”
 JCTVC-­‐F069,
 2011
  89
 

•  In
 HM,
 CTU
 processed
 in
 raster
 scan
 order
 
•  Change
 CTU
 Processing
 Order
 to
 reduce
 reads
 from
 picture
 buffer
 
(off-­‐chip
 memory
 bandwidth)
 due
 to
 increased
 data
 locality
 

•  Requires
 frame
 decoupling
 with
 entropy
 encoder
 (as
 entropy
 
encoder
 must
 generate
 bitstream
 in
 raster
 scan
 order
 to
 be
 
standard
 compliant)
 

CTU
 Processing
 Order
 

n=4

m=2

S.
 -­‐F.
 Tsai
 et
 al.,
 “Encoder
 Hardware
 Architecture
 for
 HEVC,”
 High
 Efficiency
 Video
 Coding
 (HEVC):
 
Algorithms
 and
 Architectures,
 Springer,
 2014.
 
 

Raster
 Scan
  Alterna/ve
 Scan
 

90
 

Addi/onal
 Complexity
 Reduc/ons
 
 

•  BoNoms
 up
 approach
 
–  Derive
 distorBon
 cost
 for
 PU
 from
 
sub-­‐PUs
 (e.g.
 compute
 distorBon
 of
 
16×16
 PU
 from
 four
 8×8
 PU)
 

–  Requires
 storage
 of
 SAD
 sub-­‐PUs
 

•  Reduce
 bit-­‐width
 for
 distorBon
 
calculaBon
 

•  Use
 bilinear
 interpolaBon
 for
 
fracBonal
 moBon
 esBmaBon
 

 

91
 

SAD16(X)
 =
 
 
 
 
 
 
 
 
 
 
SAD8(A)
 +
 SAD8(B)
 +
 
SAD8(C)
 +
 SAD8(D)
 

A B

C D

16
 
8

X

•  Rough
 mode
 decision:
 select
 N
 best
 mode
 out
 of
 35
 
 
–  N
 equals
 8
 for
 4×4,
 8×8
 
–  N
 equals
 4
 for
 16×16,
 32×32,
 64×64
 
–  Hadamard
 Cost
 Ranking
 (SATD
 distorBon
 and
 mode
 bits
 for
 rate)
 

•  Determine
 three
 Most
 Probable
 Modes
 (MPM)
 
 
–  SpaBal
 neighbors
 to
 the
 lev
 (A)
 and
 above
 (B)
 
–  If
 neighbors
 not
 available
 or
 redundant
 (A=B),
 use
 DC,
 Planar,
 verBcal
 or

 adjacent
 angles
 (+/-­‐
 1)
 

•  Decide
 between
 rough
 mode
 +
 MPM
 candidates
 
–  Full
 RDO
 (SSD
 for
 distorBon
 and
 mode
 +
 coefficient
 bits
 for
 rate)
 

Intra
 Predic/on
 Search
 in
 HM
 

current
 
PU
 

B
 

A
 

Y.
 Piao
 et
 al.,
 “Encoder
 Improvement
 of
 Unified
 
Intra
 PredicBon,”
 JCTVC-­‐C207,
 Oct.
 2010.
 

92
 

•  To
 reduce
 search
 space,
 use
 coarse
 search
 with
 angular
 
predicBon,
 then
 refinement
 around
 coarse
 angles
 

•  Skip
 64×64
 PU
 size
 
–  Since
 max
 TU
 is
 32×32,
 predicBon
 done
 at
 32×32;
 thus
 only
 benefit
 of
 
64×64
 intra-­‐PU
 is
 signaling
 

•  To
 increase
 throughput,
 use
 original
 pixels
 for
 intra
 predicBon
 
(rather
 than
 reconstructed
 pixels)
 to
 avoid
 dependence
 on
 
reconstrucBon
 feedback
 loop
 

 Addi/onal
 Complexity
 Reduc/on
 

Above
 techniques
 have
 cumulaBve
 coding
 loss
 of
 1%
 

S.
 -­‐F.
 Tsai
 et
 al.,
 “Encoder
 Hardware
 Architecture
 for
 HEVC,”
 High
 Efficiency
 Video
 Coding
 (HEVC):
 
Algorithms
 and
 Architectures,
 Springer,
 2014.
 
  93
 

Hardware-­‐Friendly
 RDO
 Pipeline
 

S.
 -­‐F.
 Tsai
 et
 al.,
 “Encoder
 Hardware
 Architecture
 for
 HEVC,”
 High
 Efficiency
 Video
 Coding
 (HEVC):
 
Algorithms
 and
 Architectures,
 Springer,
 2014.
 
 

Only
 do
 full
 RDO
 on
 best
 Inter
 and
 Intra
 mode
 for
 each
 CU-­‐depth
 
(6%
 coding
 loss)
 

?

?

Fi
na

l M
od

e
D

ec
is

io
n

CU0

CU1 CU1 CU1 CU1

CU2 CU2CU2 CU2CU2 CU2 CU2 CU2

32X32 CU

64X64 CU

16X16 CU

HCMD
Cost

HCMD
Cost

HCMD
Cost

Intra Pred Dirs.

Inter PU Sizes & MVs

PU-Mode Pre-decision CU-Layer High Complexity Mode Decision

Full
 
 
RDO
 

Full
 
 
RDO
 

Full
 
 
RDO
 

Full
 RDO
 Fast
 RDO
 

94
 

Hardware
 HEVC
 Encoder
 

S.-­‐F.
 Tsai
 et
 al.
 ,
 “A
 1062Mpixels/s
 
8192×4320p
 High
 Efficiency
 Video
 
Coding
 (H.265)
 encoder
 chip,”
 IEEE
 
 
VLSIC,
 2013
 

Video
 Coding
 
Standard
 

HEVC
 (WD4)
 

Technology
  TSMC
 28-­‐nm
 
HPM
 

Core
 Area
  5x5mm2
 
Gate
 Count
  8350k
 
On-­‐Chip
 Memory
 
(SRAM)
 

7.14
 MB
 

Resolu/on
 /
 
Frame
 Rate
 

8192×4320@
 
30fps
 

Frequency
  312
 MHz
 
Power
  708
 mW
 

95
 

ASIC
 Encoder
 Comparison
 

S.-­‐F.
 Tsai
 et
 al.
 ,
 “A
 1062Mpixels/s
 8192×4320p
 High
 Efficiency
 Video
 Coding
 (H.265)
 
encoder
 chip,”
 2013
 Symposium
 on
 VLSIC,
 2013
  96
 

Part
 IV:
 Emerging
 applica/ons
 and
 
HEVC
 extensions
 

What’s
 Next
 

•  More
 compression
 efficiency
 
–  Yes,
 in
 5-­‐10
 years.
 Especially
 since
 video
 delivery
 is
 moving
 from
 tradiBonal
 
 
broadcast
 model
 to
 IP
 delivery
 and
 one-­‐to-­‐one
 streaming
 

–  Analogy:
 Public
 transport
 versus
 individual
 cars
 

 

•  Other
 consideraBons
 have
 become
 important
 too:
 
–  Power
 consumpBon,
 complexity,
 throughput
 
–  Ability
 to
 support
 new
 funcBonaliBes,
 modaliBes
 etc.
 

 

Dallas
 
High
 Five
 

98
 

•  Need
 for
 supporBng
 diverse
 clients
 with
 varying
 capabiliBes
 
(resoluBon,
 computaBonal
 power
 etc.)
 

Changing
 Landscape
 of
 Video
 Coding
 
Applica/ons
 (1)
 

99
 Image source: Samsung, Youtube

•  Immersive
 experience
 
–  MulBple
 cameras
 and
 at
 higher
 video
 
resoluBons
 (1080p
 è
 4K
 è
 8K)
 
 

–  MulBple
 displays,
 Bigger
 displays
 
(1080p
 è
 4K
 è
 8K)
 

–  Free-­‐viewpoint
 video,
 360degree
 
video,
 augmented
 reality,
 3D
 movies
 

–  Demos
 
•  hNp://replay-­‐technologies.com/
 
•  hNp://www.kolor.com/video
 
 

100
 

Changing
 Landscape
 of
 Video
 Coding
 
Applica/ons
 (2)
 

Image source: Cisco, Kolor

•  Growing
 requirement
 to
 support
 mixed
 format
 content
 
consisBng
 of
 natural
 video
 +
 graphics/text
 

 

101
 

Changing
 Landscape
 of
 Video
 Coding
 
Applica/ons
 (3)
 

Scalable
 Video
 Coding
 

Suppor/ng
 Diverse
 Clients
 -­‐
 
Simulcas/ng
 

103
 

Encode
 

640×480
 

1280×960
 

2560×19200
 

Encode
 

Encode
 

Client
 Server
 

Bitstream
 1
 

Bitstream
 3
 

Bitstream
 2
 

Can we do better?

Scalable
 Video
 Coding
 

Quality
 (SNR)
 scalability
 

Temporal
 scalability
 

SpaBal
 
scalability
 

Single Bitstream

… 0110111 …

104
 

Spa/al
 Scalability
 

Figure source: T. Wiegand, JVT-W132 [1].

Layer
 N
 –
 E.g.
 640×480
 
(Base
 layer)
 

Layer
 N+1
 –
 1280×960
 
(Enhancement
 layer)
 

•  Layered
 coding
 
• Higher
 layers
 have
 higher
 
spaBal
 resoluBon
 when
 
compared
 to
 lower
 layers
 

• Upper
 layers
 re-­‐uses
 data
 from
 
lower
 layers
 

 

105
 

Temporal
 Scalability
 

I P P P P P P P P

P I B B P I B B B B P

I p P p P p P p P

IPPP
 coding
 

IBBP
 coding
 

Hierarchical
 B-­‐frames
 
I b B b P b B b P

Hierarchical
 P-­‐frames
 

•  p,
 b
 –
 Non-­‐reference
 frames
 
106
 

HEVC
 Scalable
 Extension
 (SHVC)
 

Base
 layer
 
decoder
 

BL
 
Bitstream
 

BL
 decoded
 
pictures
 

BL
 Frame
 
buffer
 

Enhancement
 
layer
 decoder
 

EL
 
Bitstream
 

EL
 decoded
 
pictures
 

Upsampler
 

EL
 Frame
 
buffer
 

•  SHVC:
 Scalable
 extension:
 Expected
 July
 2014
 
•  EL
 –
 Enhancement
 layer,
 BL
 –
 Base
 layer
 

107
 

SHVC
 Performance
 

D.-K. Kwon, M. Budagavi, “Combined scalable and mutiview extension of High Efficiency
Video Coding (HEVC),” IEEE Picture Coding Symposium, pp. 414 – 417, 2013.

•  2x
 scalability
 (i.e.
 base
 layer
 is
 half
 the
 size
 of
 
enhancement
 layer)
 compared
 to
 simulcast
 

• Quality
 (SNR)
 scalability
 compared
 to
 simulcast
 

 

Coding
 configuraBon
  BD-­‐Rate
 savings
 
All
 Intra
 coding
  23%
 
Random
 access
 
(Hierarchical-­‐B)
 

16%
 

Coding
 configuraBon
  BD-­‐Rate
 savings
 
All
 Intra
 coding
  28%
 
Random
 access
 
(Hierarchical-­‐B)
 

20%
 

108
 

Mul/view
 Video
 Coding
 

Mul/view
 Video
 Capture
 

110
 

Stereo,
 3D
 
video
 

360degree
 
video
 

Free
 
viewpoint
 
video
 

Image source: Fuji, Kolor

Stereoscopic
 Video
 Coding
 

Stereo
 
Video
 

encoding
 

Stereo
 
video
 

bitstream
 

Camera
 
modules
 

Lev
 
View
 

Right
 
View
 

Stereo
 
video
 

bitstream
 

Stereo
 
Video
 

decoding
 

Lev
 
View
 

Right
 
View
 

3D
 display
 

Image source: Samsung

Redundancy
 in
 Stereo
 Video
 

Lev
 view
 

Right
 view
 

112
 

Mul/view
 Video
 Coding
 –
 
 
 
 
 
Picture
 Predic/on
 Structures
 (1)
 

•  Linear
 camera
 array
  S0 S1 S2 S3 S4 S5 S6 S7

Simulcast
 

113
 

Interview
 
predicBon
 of
 
anchor
 frames
 

Mul/view
 Video
 Coding
 –
 
 
 
 
 
Picture
 Predic/on
 Structures
 (1)
 

•  Linear
 camera
 array
  S0 S1 S2 S3 S4 S5 S6 S7

114
 

Both
 anchor
 and
 non-­‐anchor
 views
 
predicted
 from
 other
 views
 

•  Linear
 camera
 array
  S0 S1 S2 S3 S4 S5 S6 S7

Mul/view
 Video
 Coding
 –
 
 
 
 
 
Picture
 Predic/on
 Structures
 (1)
 

115
 

HEVC
 Mul/view
 Extension
 (MV-­‐HEVC)
 

116
 

View
 0
 
decoder
 

View
 0
 
Bitstream
 

View
 0
 
 
Framebuffer
 

View
 1
 
decoder
 

View
 1
 
Bitstream
 

View
 1
 decoded
 
pictures
 

View
 1
 
 
Framebuffer
 

View
 0
 decoded
 
pictures
  3D
 display
 

• MV-­‐HEVC
 :
 MulBview
 extension:
 Expected
 July
 2014
 
• View
 0:
 Lev
 view,
 View
 1:
 Right
 view
 

Combined
 Scalable
 and
 Mu/view
 
Extension
 of
 HEVC
 

D.-K. Kwon, M. Budagavi, “Combined Scalable and Mutiview Extension of High Efficiency
Video Coding (HEVC)”, IEEE Picture Coding Symposium, 2013.

•  ApplicaBons
 of
 the
 combined
 scalable
 and
 mulBview
 HEVC
 
coding
 include:
 
–  Scalable
 stereoscopic
 video
 (e.g.
 1080p
 stereo
 to
 the
 emerging
 4K
 
stereo),
 
 

–  Mixed
 resoluBon
 mulBview
 coding
 

•  H.264/AVC
 does
 not
 support
 combined
 scalable
 and
 mulBview
 
coding
 

•  HEVC
 allows
 for
 combined
 scalable
 and
 mulBview
 coding
 

117
 

Combined
 Scalable
 and
 Mu/view
 
Extension
 of
 HEVC
 

D.-K. Kwon, M. Budagavi, “Combined Scalable and Mutiview Extension of High Efficiency
Video Coding (HEVC)”, IEEE Picture Coding Symposium, 2013.

118
 

Combined
 Scalable
 and
 Mu/view
 
Extension
 of
 HEVC
 

D.-K. Kwon, M. Budagavi, “Combined Scalable and Mutiview Extension of High Efficiency
Video Coding (HEVC)”, IEEE Picture Coding Symposium, 2013.

119
 

MV-­‐HEVC
 +
 Depth
 (3D-­‐HTM)
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Lev
 view
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Depth
 map
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Synthesized
 right
 view
 
 

•  StandardizaBon
 in
 on-­‐going
 

 

120
 

MV-­‐HEVC
 +
 Depth
 Encoding
 

Depth
 
esBmaBon
 

Depth
 
coding
 

View
 
coding
 

N
 views
 +
 
 
 
 
 
 
 
 
 
 
 
M
 depth
 maps
 

• Views
 that
 are
 transmiNed
 will
 be
 coded
 using
 MV-­‐
HEVC
 

•  Expect
 addiBonal
 20%
 gain
 

  121
 

MV-­‐HEVC
 +
 Depth
 Decoding
 

View
 
synthesis
 

Depth
 
decoding
 

View
 
decoding
 

MulBple
 
views
 

122
 

Screen
 Content
 Video
 Coding
 

Screen
 Content
 Coding
 

•  ApplicaBons
 such
 as
 automoBve
 infotainment,
 wireless
 displays,
 
remote
 desktop,
 remote
 gaming,
 cloud
 compuBng
 etc.
 are
 
becoming
 popular
 

•  Video
 in
 these
 applicaBons
 oven
 has
 mixed
 content
 consisBng
 of
 
natural
 video,
 text,
 graphics
 etc.
 
–  In
 text
 and
 graphics
 regions,
 paNerns
 (e.g.
 text
 characters,
 icons,
 lines
 etc.)
 
can
 repeat
 within
 a
 picture
 
 

–  Also
 blocks
 with
 limited
 set
 of
 colors
 are
 possible
 
124
 

Intra
 Block
 Copy
 

current CU

Search
area

LCU
(64×64)

current CU

Search
area

LCU
(64×64)

Intra Randomaccess Low delay
SC RGB 444 27.0% 21.5% 17.0%
SC YUV 444 23.5% 20.2% 15.9%

Bit-rate savings

M.
 Budagavi,
 D.-­‐K.
 Kwon,
 “Intra
 moBon
 compensaBon
 and
 entropy
 coding
 improvements
 for
 
HEVC
 screen
 content
 coding”,
 IEEE
 Picture
 Coding
 Symposium,
 2013.
  125
 

Pale_e
 Coding
 

•  Input
 video:
 
 
– 8
 bits
 per
 pixel,
 per
 color
 component
 
– 4×4
 block:
 8*3*16
 =
 384
 bits
 

•  PaleNe
 coding:
 
– Color
 paleNe:
 2
 Colors
 in
 our
 example:
 
2*24
 =
 48
 bits
 

– 
 Color
 index:
 1
 bit
 per
 pixel
 in
 our
 
example:
 16
 bits
 

– Total
 bits:
 64
 bits
 
•  Note:
 This
 slide
 shows
 a
 very
 simple
 example
 for
 
explaining
 purposes.
 Techniques
 being
 evaluated
 
currently
 cab
 use
 more
 colors
 in
 paleNe
 and
 more
 bits
 
for
 color
 index.
 

Color 0
Color 1

i12 i13 i14 i15
i8 i9 i10 i11
i4 i5 i6 i7
i0 i1 i2 i3

126
 

HEVC
 Screen
 Content
 coding
 

• HEVC
 Screen
 content
 coding
 acBvity
 
– Started
 in
 April
 2014
 
– Expected
 compleBon
 early-­‐mid
 2015
 

• Key
 tools
 being
 studied
 
– Intra
 Block
 Copy
 with
 extended
 search
 area
 
– PaleNe
 based
 coding
 

127
 

Summary
 

•  Video
 content
 conBnues
 to
 impose
 a
 severe
 burden
 on
 today’s
 
global
 networks
 
–  Rapid
 growth
 in
 the
 usage
 and
 diversity
 of
 video
 applicaBons
 and
 
services
 

–  Increasing
 popularity
 of
 HD
 video
 and
 emergence
 of
 beyond-­‐HD
 formats
 
accompanied
 by
 stereo
 and
 mulB-­‐view
 content
 

•  HEVC
 is
 the
 latest
 video
 coding
 standard,
 which
 gives
 50%
 
improvement
 in
 coding
 efficiency,
 and
 is
 expected
 to
 support
 
video
 applicaBons
 for
 the
 next
 decade.
 

•  In
 addiBon
 to
 improving
 coding
 efficiency,
 implementaBon
 
challenges
 were
 also
 considered
 to
 maximize
 processing
 speed
 
and
 minimize
 hardware
 cost.
 

128
 

•  V.
 Sze,
 M.
 Budagavi,
 G.
 J.
 Sullivan
 (Editors),
 “High
 Efficiency
 
Video
 Coding
 (HEVC):
 Algorithms
 and
 Architectures,”
 Springer,
 
2014
 

•  G.
 J.
 Sullivan,
 et
 al.
 “Overview
 of
 the
 High
 Efficiency
 Video
 
Coding
 (HEVC)
 standard,”
 IEEE
 Transac9ons
 on
 Circuits
 and
 
Systems
 for
 Video
 Technology,
 2012
 

•  J.
 Ohm
 et
 al.,
 “Comparison
 of
 the
 Coding
 Efficiency
 of
 Video
 
Coding
 Standards—Including
 High
 Efficiency
 Video
 Coding
 
(HEVC),”IEEE
 Transac9ons
 on
 Circuits
 and
 Systems
 for
 Video
 
Technology,
 2012
 

References
 

129
 

•  IntroducBon
 
•  High-­‐Level
 Syntax
 in
 HEVC
 
•  Block
 Structures
 and
 Parallelism
 Features
 in
 HEVC
 
•  Intra-­‐Picture
 PredicBon
 in
 HEVC
 
•  Inter-­‐Picture
 PredicBon
 in
 HEVC
 
•  Transform
 and
 QuanBzaBon
 in
 HEVC
 
•  In-­‐Loop
 Filters
 in
 HEVC
 
•  Entropy
 Coding
 in
 HEVC
 
•  Compression
 Performance
 Analysis
 in
 HEVC
 
•  Decoder
 Hardware
 Architecture
 in
 HEVC
 
•  Encoder
 Hardware
 Architecture
 in
 HEVC
 

HEVC
 Book
 

130
 http://www.springer.com/engineering/signals/book/978-3-319-06894-7

The
 book
 serves
 the
 video
 engineering
 community
 by:
 

•  Providing
 video
 applicaBon
 developers
 an
 
invaluable
 reference
 to
 the
 latest
 video
 standard,
 
High
 Efficiency
 Video
 Coding
 (HEVC);
 

•  Serving
 as
 a
 companion
 reference
 that
 is
 
complementary
 to
 the
 HEVC
 standards
 document
 
produced
 by
 the
 JCT-­‐VC
 –
 a
 joint
 team
 of
 ITU-­‐T
 
VCEG
 and
 ISO/IEC
 MPEG;
 

•  Including
 in-­‐depth
 discussion
 of
 algorithms
 and
 
architectures
 for
 HEVC
 by
 some
 of
 the
 key
 video
 
experts
 who
 have
 been
 directly
 involved
 in
 
developing
 and
 deploying
 the
 standard;
 

•  Giving
 insight
 into
 the
 reasoning
 behind
 the
 
development
 of
 the
 HEVC
 feature
 set,
 which
 will
 aid
 
in
 understanding
 the
 standard
 and
 how
 to
 use
 it.
 

HEVC
 Book
 

131