1 | P a g e
FINAL REPORT ON
HEVC Intra Prediction
A PROJECT UNDER THE GUIDANCE OF
DR. K. R. RAO
COURSE: EE5359 – MULTIMEDIA PROCESSING, SPRING 2016
SUBMITTED BY:
Swaroop Krishna Rao (1001256012)
Nikita Thakur (1001102923)
Srirama Kartik Adavi (1001052277)
DEPARTMENT OF ELECTRICAL ENGINEERING
UNIVERSITY OF TEXAS AT ARLINGTON
2 | P a g e
Table of Contents:
1. Objective………………………………………………………………………………………………………………….5
2. Need for video compression……………………………………………………………………………………….5
3. Fundamental Concepts in Video coding……………………………………………………………………….5
4. HEVC………………………………………………………………………………………………………………………7
4.1 Encoder and Decoder in HEVC……………………………………………………………………………….8
4.2 Features of HEVC………………………………………………………………………………………………….9
4.2.1. Picture Partitioning…………………………………………………………………………………….9
4.2.2 Prediction………………………………………………………………………………………………….11
5. Intra Prediction…………………………………………………………………………………………………………13
5.1 PB Partitioning……………………………………………………………………………………………………..13
5.2 Reference Samples ……………………………………………………………………………………………….14
5.2.1 Reference Sample Generation……………………………………………………………………….14
5.2.2 Reference Sample Substitution……………………………………………………………………..14
5.3 Filtering Process of Reference Samples……………………………………………………………………15
5.4 Angular Prediction…………………………………………………………………………………………………16
5.4.1 Angle Definitions…………………………………………………………………………………………16
5.4.2 Sample Prediction for Angular Prediction Mode………………………………………………18
5.5 DC Prediction…………………………………………………………………………………………………….19
5.6 Planar Prediction…………………………………………………………………………………………………19
5.7 Smoothing Filter…………………………………………………………………………………………………20
6. Intra Mode Coding………………………………………………………………………………………………………20
6.1 Prediction of Luma Intra Modes………………………………………………………………………………21
6.2 Derived Mode for Chroma Intra Prediction……………………………………………………………….22
7. Encoding Algorithm………………………………………………………………………………………………….23
8. Intra Prediction in HM mode………………………………………………………………………………………..24
3 | P a g e
9. Computational Complexity of Intra prediction………………………………………………………25
10. Differences of Intra Prediction techniques between H.264/AVC and HEVC…………………….26
Acknowledgement…………………………………………………………………………………………………………..27
References……………………………………………………………………………………………………………………..28
4 | P a g e
List of Acronyms and Abbreviations:
AHG: Ad Hoc Group.
AVC: Advanced Video Coding.
CABAC: Context Adaptive Binary Arithmetic Coding.
CTU: Coding Tree Unit.
CTB: Coding Tree Block.
CU: Coding Unit.
DCT: Discrete Cosine Transform.
DST: Discrete Sine Transform.
DVD: Digital Video Disk.
HD: High Definition.
HDR: High Dynamic Range.
HEVC: High Efficiency Video Coding.
HM: HEVC Test Model.
IEC: International Electrotechnical Commission.
ISO: International Organization for Standardization.
ITU: International Telecommunication Union.
JCT: Joint Collaborative Team.
JCT-VC: Joint Collaborative Team on Video Coding.
JPEG: Joint Photographic Experts Group.
KTA: Key Technical Areas
MPEG: Moving Picture Experts Group.
MPM: Most Probable Mode
MVC: Multiview Video Coding.
PB: Prediction Block.
PU: Prediction Unit
RD: Rate Distortion
RDO: Rate Distortion Optimization
SATD: Sum of Absolute Transformed Differences
SCC: Screen Content Coding.
TB: Transform Block.
TU: Transform Unit.
UHD: Ultra High Definition
VCEG: Visual Coding Experts Group
WCG: Wide Color Gamut
https://en.wikipedia.org/wiki/International_Telecommunication_Union
5 | P a g e
1. Objective:
The objective of this project is to provide a review of the intra prediction part in recently developed
HEVC standard (ITU -T Video Coding Experts Group (VCEG) and the ISO / IEC Moving Picture
Experts Group (MPEG) standardization organizations)[1]. HEVC (H.265) standard[1][2][3][4] is
the latest enhanced video coding standard which was planned to improve the rendered
specifications of its preceding standard MPEG-4 (H.264)[5][2][3].The main goal of the HEVC
standardization effort is to enable significantly improved compression performance relative to
existing standard H.264. For similar video quality, HEVC bit-streams consume only about half of
the bitrate compared to previous standard H.264. Any video possesses redundant bits in every
frame. To remove this redundancy, prediction process is used. Intra-picture prediction is a tool in
HEVC which uses some prediction of data spatially from region to region within a specific picture,
but has no dependence on other pictures in the video frames. HEVC gives higher compression
compared with previous standard H.264 because of its new features like quadtree structure, more
directional intra-prediction modes. HEVC is suitable for resolutions up to Ultra High Definition
(UHD) video coding.
2. Need for Video Compression:
• Video compression technologies are about reducing and removing redundant video data so
that a digital video file can be effectively sent over a network and stored on computer disks.
• With efficient compression techniques, a significant reduction in file size can be achieved
with little or no adverse effect on the visual quality.
• The video quality, however, can be affected if the file size is further lowered by raising the
compression level for a given compression technique.
3. Fundamental Concepts in Video Coding [1]:
Digital video data consists of a time-ordered sequence of a natural or real-world visual scene in
digital form, sampled spatially and temporally. As shown in Fig .1 a scene is a sampled at a point
in time to produce a complete frame or an interlaced field (namely one field consists of half of the
data in a frame, spatially sampled at odd- or even-numbered interval lines) [1].
Spatial sampling: It means to take samples of the signal at a concrete point in time using a
rectangular grid in the video image plane producing a frame. At each intersection point on the grid
6 | P a g e
is taken a sample on spatial domain and it represents a square picture element (pixel), as shown in
Figure 1.
Temporal sampling: It means to capture a series of frames sampled at periodic intervals in time
producing a moving video signal. Cameras typically generate approximately 24, 25 or 30 frames
per second. This results in a large amount of information that demands the use of compression.
Figure 1: Spatial and temporal sampling of a video sequence [1]
Color space [1]:
The visual information at each spatial-temporal sample (picture element or pixel) must be
described in digital form by means of a number that depict the physical appearance of the sample.
For a scene or a picture in color, it is required at least three color components samples (three
different numbers) to represent a color space accurately for each pixel position. This happens by
means of color spaces, namely models to map physical colors to measurable numeric expressions.
Two well-known color spaces are described here:
RGB:
In the RGB color space, a color image sample is represented with three numbers that indicate the
relative proportions of Red (R), Green (G) and Blue (B) (the three additive primary colors of light).
Colors can be created by combining red, green and blue in varying proportions. These three colors
are equally important and so are usually all stored at the same resolution.
YCbCr [1]:
In the YCbCr color space (sometimes referred to as YUV), a color image sample is represented
with three numbers; one component indicate the brightness (luminance or luma) whereas the other
two components indicate the color. Due to the fact that human visual system is more sensitive to
luminance than to color, sometimes the resolution of the chroma components is down-sampled
with respect to the resolution of the luma component to reduce the amount of information needed
to describe each pixel. This color space with chroma subsampling is an efficient way of
representing color images. The YCbCr color space can be converted to RGB by means of simple
expressions.
Y is the luminance (luma) component, and indicates the brightness in an image, and can be
calculated as a weighted average of R, G and B [1]:
7 | P a g e
where kr, kg and kb are weighting factors. The color information can be represented as color
difference (chrominance or chroma) components, where each chrominance component is the
difference between R, G or B and the luminance Y.
The complete description of a color image is given by Y (the luminance component) and three
color differences Cb, Cg and Cr that represent the difference between the color intensity and the
mean luminance of each image sample. Cg is not transmitted because it is possible to extract it
from the other components [1].
YCbCr Sampling Formats:
Chroma subsampling can be performed in different ways as shown in Fig.2. The most common
formats of sampling the images to obtain the three components are:
4:4:4 sampling: this means that each chroma component (Cb, Cr) have the same resolution as luma
component (Y), preserving the full fidelity of the chrominance components.
4:2:2 sampling: this means that the chroma components have half the horizontal resolution of luma
component. For every four luminance samples in the horizontal direction there are two Cb and two
Cr samples. 4:2:2 video is used for high-quality color reproduction.
4:2:0 sampling: this means that each chroma component has one fourth of the number of samples
of the luma component (half the number of samples in both the horizontal and vertical dimensions).
4:2:0 sampling is widely used for consumer applications such as video conferencing, digital
television and digital versatile disk (DVD) storage [1]
Figure 2: A sample of the YCbCr Sampling Formats [1]
4. HEVC:
High Efficiency Video Coding (HEVC) [1] is an international standard for video compression
developed by a working group of ISO/IEC MPEG (Moving Picture Experts Group) and ITU-T
VCEG (Visual Coding Experts Group). The main goal of HEVC standard is to significantly
improve compression performance compared to existing standards (such as H.264/Advanced
Video Coding [5]) in the range of 50% bit rate reduction at similar visual quality [3].
HEVC is designed to address existing applications of H.264/MPEG-4 AVC and to focus on two
key issues: increased video resolution and increased use of parallel processing architectures [3]. It
primarily targets consumer applications as pixel formats are limited to 4:2:0 8-bit and 4:2:0 10-bit.
The next revision of the standard, finalized in 2014, enables new use-cases with the support of
8 | P a g e
additional pixel formats such as 4:2:2 and 4:4:4 and bit depth higher than 10-bit, embedded bit-
stream scalability, 3D video and multiview video [20].
4.1. Encoder and Decoder in HEVC [21]:
Source video, consisting of a sequence of video frames, is encoded or compressed by a video
encoder to create a compressed video bit stream. The compressed bit stream is stored or
transmitted. A video decoder decompresses the bit stream to create a sequence of decoded frames.
The video encoder performs the following steps as shown in Fig. 3:
Partitioning each picture into multiple units
Predicting each unit using inter or intra prediction, and subtracting the prediction from the
unit
Transforming and quantizing the residual (the difference between the original picture unit
and the prediction)
Entropy encoding transform output, prediction information, mode information and headers
The video decoder performs the following steps as shown in Fig. 4:
Entropy decoding and extracting the elements of the coded sequence
Rescaling and inverting the transform stage
Predicting each unit and adding the prediction to the output of the inverse transform
Reconstructing a decoded video image
Figure 3: Block Diagram of HEVC Encoder [21]
9 | P a g e
Figure 4: Block diagram of HEVC Decoder [21]
4.2. Features of HEVC:
4.2.1 Picture Partitioning [3]:
The previous standards split the pictures in block-shaped regions called Macroblocks and
Blocks. Nowadays we have high-resolution video content, so the use of larger blocks is
advantageous for encoding. To support this wide variety of blocks size in efficient manner
HEVC pictures are divided into so-called coding tree Units (CTUs) as shown in Fig.5 [3].
Depending on the stream parameters, the CTUs in a video sequence can have the size: 64×64,
32×32, or 16×16 as shown in Fig.6 [3].
Figure 5: Frame split in CTUs [3] Figure 6: Possible sizes of the CTUs [3]
10 | P a g e
Coding Tree Unit (CTU) is therefore a coding logical unit, which is in turn encoded into an
HEVC bit-stream. It consists of three blocks, namely luma (Y), that covers a square picture area
of LxL samples of the luma component, and two chroma components (Cb and Cr), that cover
L/2xL/2 samples of each of the two chroma components, and associated syntax elements, as
shown in Fig.7. Each block is called Coding Tree Block (CTB).
Syntax elements describe properties of different types of units of a coded block of pixels and
how the video sequence can be reconstructed at the decoder. This includes the method of
prediction (e.g. inter or intra prediction, intra prediction mode, and motion vectors) and other
parameters [9].
Figure 7: CTU consist of three CTBs and Syntax Elements [8]
Each CTB has the same size (LxL) as the CTU (64×64, 32×32, or 16×16). However, CTB could
be too big to decide whether we should perform inter-picture prediction or intra-picture
prediction. So each CTB can be split recursively in a quad-tree structure, from the same size as
CTB to as small as 8×8. Each block resulting from this partitioning is called Coding Blocks
(CBs) as shown in Fig.8 and becomes the decision making point of prediction type (inter or intra
prediction) [12].
Fig.9 illustrates an example of 64×64 CTBs split into CBs.
Figure 8: Illustration of 64×64 CTBs split into CBs [12]
The prediction type along with other parameters is coded in Coding Unit (CU). So CU is the
basic unit of prediction in HEVC, each of which is predicted from previously coded data. And
11 | P a g e
the CU consists of three CBs (Y, Cb and Cr), and associated syntax elements, as shown in Fig.9
[12].
Figure 9: CU consist in three CBs and associated syntax elements [12]
CBs could still be too large to store motion vectors (inter-picture (temporal) prediction) or intra-
picture (spatial) prediction mode. Therefore, Prediction Block (PB) was introduced. Each CB can
be split into PBs differently depending on the temporal and/or spatial predictability.
4.2.2 Prediction:
Frames of video are coded using intra or inter prediction:
Intra-frame prediction:
In the spatial domain, the redundancy means that pixels (samples) that are close to each other in
the same frame or field are usually highly correlated. This means that the appearance of samples
in an image is often similar to their adjacent neighbor samples; this is called the spatial redundancy
or intra-frame correlation.
This redundant information in the spatial domain can be exploited to compress the image. Note
that when using this kind of compression, each picture is compressed without referring to other
pictures in the video sequence. This technique is called Intra-frame prediction and it is designed
to minimize the duplication of data in each picture (spatial-domain redundancy) [7]. It consists in
forming a prediction frame and subtracting this prediction from the current frame.
12 | P a g e
Figure 10: Spatial (intra-frame) correlation in a video sequence [7]
Several methods can be used to remove this redundant information in the spatial domain.
Typically, the values of the prediction samples are constructed by combining their adjacent
neighbor samples (reference samples) by means of several techniques. In some cases, considerably
prediction accuracy can be obtained by means of efficient intra-prediction techniques.
Inter frame prediction:
In the temporal domain, redundancy means that successive frames in time order are usually high
correlated; therefore parts of the scene are repeated in time with little or no changes. This type of
redundancy is called temporal redundancy or inter-frame correlation [13].
It is clear then that the video can be represented more efficiently by coding only the changes in the
video content, rather than coding each entire picture repeatedly. This technique is called Inter-
frame prediction; it is designed to minimize the temporal-domain redundancy and at the same time
improve coding efficiency to achieve video compression [7].
Figure 11: Temporal (inter-frame) correlation in a video sequence [7]
To remove the redundant information in the temporal domain typically motion compensated
prediction or inter prediction methods are used. Motion compensation (MC) consists of
13 | P a g e
constructing a prediction of the current video frame from one or more previous or future encoded
frames (reference frames) by compensating differences between the current frame and the
reference frame. To achieve this, the motion or trajectory between successive blocks of the image
is estimated. The information regarding motion vectors (describes how the motion was
compensated) and residuals from the previous frames are coded and sent to the decoder.
5. Intra Prediction:
The Intra-picture prediction uses the previously decoded boundary samples from spatially
neighboring TBs in order to predict a new prediction block PB. So the first picture of a video
sequence and the first picture at each clean random access point into a video sequence are coded
using only intra-picture prediction [1].
Several improvements have been introduced in HEVC in the intra prediction module:
Due to the larger size of the pictures, the range of supported coding block sizes has been
increased.
A plane mode that guarantees continuity at block boundaries is desired.
The number of directional orientations has been increased.
For intra mode coding, efficient coding techniques to transmit the mode for each block are
needed due to the increased number of intra modes.
HEVC supports a large variety of block sizes, so it needs consistency across all block sizes.
HEVC employs 35 different intra modes to predict a PB: 33 Angular modes, Planar mode, and DC
mode. Table. 1 shows the mode name with their corresponding intra prediction mode index as by
the convention used throughout the standard [11].
Table 1: Specification of Intra Prediction modes and associated index [11].
At the end, the video encoder will choose the intra prediction mode that provides de best Rate-
Distortion performance.
5.1. PB partitioning:
The CB can be split into size of MxM or M/2xM/2, as shown in Fig.12. The first one means that
the CB is not split, so the PB has the same size as the CB. It is possible to use it in all CUs. The
14 | P a g e
second partitioning means that the CB is split into four equally-sized PBs. This can only be used
in the smallest 8×8 CUs. In this case, a flag is used to select which partitioning is used in the CU.
Each resulting PB has its own intra prediction mode [1]. The prediction blocks size range from
4×4 to 64×64.
Figure 12: Prediction Block for Intra Prediction [1]
5.2. Reference Samples:
5.2.1. Reference Sample Generation:
The intra sample prediction in HEVC is performed by extrapolating sample values from the
reconstructed reference samples as defined by the selected intra prediction mode. Compared to the
H.264/AVC, HEVC introduces a reference sample substitution process which allows HEVC to use
the complete set of intra prediction modes regardless of the availability of the neighboring
reference samples. In addition, there is an adaptive filtering process that can pre-filter the reference
samples according to the intra prediction mode, block size and directionality to increase the
diversity of the available predictors.
5.2.2. Reference Sample Substitution:
Some or all of the reference samples may not be available for prediction due to several reasons.
For example, samples outside of the picture, slice or tile are considered unavailable for prediction.
In addition, when constrained intra prediction is enabled, reference samples belonging to inter-
predicted PUs are omitted in order to avoid error propagation from potentially erroneously
received and reconstructed prior pictures. As opposed to H.264/AVC which allows only DC
prediction to be used in these cases, HEVC allows the use of all its prediction modes after
substituting the non-available reference samples.
For the extreme case with none of the reference samples available, all the reference samples are
substituted by a nominal average sample value for a given bit depth (e.g., 128 for 8-bit data). If
there is at least one reference sample marked as available for intra prediction, the unavailable
reference samples are substituted by using the available reference samples. The unavailable
reference samples are substituted by scanning the reference samples in clock-wise direction and
15 | P a g e
using the latest available sample value for the unavailable ones. More specifically, the process is
defined as follows:
1. When p[-1][2N -1] is not available, it is substituted by the first encountered available
reference sample when scanning the samples in the order of p[-1][2N -2], : : : , p[-1][-1],
followed by p[0][-1], : : : , p[2N -1][-1].
2. All non-available reference samples of p[-1][y] with yD2N -2 : : : -1 are substituted by the
reference sample below p[-1][yC1].
3. All non-available reference samples of p[x][-1] with xD0 : : : 2N -1 are substituted by the
reference sample left p[x-1][-1].
Figure 13 shows an example of reference sample substitution.
Figure 13: An example of reference sample substitution process. Non-available reference samples are marked as
grey: (a) reference samples before the substitution process (b) reference samples after the substitution process [3]
5.3. Filtering Process of Reference Samples:
The reference samples used by HEVC intra prediction are conditionally filtered by a smoothing
filter similarly to what was done in 8×8 intra prediction of H.264/AVC. The intention of this
processing is to improve visual appearance of the prediction block by avoiding steps in the values
of reference samples that could potentially generate unwanted directional edges to the prediction
block. For the optimal usage of the smoothing filter, the decision to apply the filter is done based
on the selected intra prediction mode and size of the prediction block.
• Two types of filtering process of reference samples
• Fig.14 shows a three-tap filtering using two neighboring reference samples. Reference
sample X is replaced by the filtered value using A, X and B while reference sample Y is
replaced by the filtered value using C, Y and D.
• Fig.14 shows a strong intra smoothing process using corner reference samples. Reference
sample X is replaced by a linearly filtered value using A and B while Y is replaced by a
linearly filtered value using B and C [3].
16 | P a g e
Figure 14: Filtering of Reference sample [3]
5.4. Angular Prediction:
Angular intra prediction in HEVC is designed to efficiently model different directional structures
typically present in video and image content. The set of available prediction directions has been
selected to provide a good trade-off between encoding complexity and coding efficiency for typical
video material. The sample prediction process itself is designed to have low computational
requirements and to be consistent across different block sizes and prediction directions. This has
been found especially important as the number of block sizes and prediction directions supported
by HEVC intra coding far exceeds those of previous video codecs, such as H.264/AVC. In HEVC
there are four effective intra prediction block sizes ranging from 4×4 to 32×32 samples, each of
which supports 33 distinct prediction directions. A decoder must thus support 132 combinations
of block sizes and prediction directions.
5.4.1. Angle Definitions:
HEVC defines a set of 33 angular prediction directions at 1/32 sample accuracy as illustrated in
Fig. 15. In natural imagery, horizontal and vertical patterns typically occur more frequently than
patterns with other directionalities. Small differences for displacement parameters for modes close
to horizontal and vertical directions take advantage of that phenomenon and provide more accurate
prediction for nearly horizontal and vertical patterns. The displacement parameter differences
become larger closer to diagonal directions to reduce the density of prediction modes for less
frequently occurring patterns.
Table 2 provides the exact mapping from indicated intra prediction mode to angular parameter A
[3]. That parameter defines the angularity of the selected prediction mode (how many 1/32 sample
grid units each row of samples is displaced with respect to the previous row).
17 | P a g e
Table 2: Angular parameter A defines the directionality of each angular intra prediction mode [3]
Figure 15: Angle definitions of angular intra prediction in HEVC numbered from 2 to 34 and the associated
displacement parameters. H and V are used to indicate the horizontal and vertical directionalities, respectively, while
the numeric part of the identifier refers to the sample position displacements in 1/32 fractions of sample grid
positions [3]
For each octant, eight angles are defined with associated displacement parameters, as shown in
Fig.15. When getting closer to diagonal directions, the displacement parameter become larger in
order to reduce the density of prediction modes for less occurring directions. For modes close to
horizontal and vertical directions, the displacement becomes smaller in order to provide more
accurate prediction for nearly horizontal and vertical patterns.
In order to calculate the value of each sample of the PB, the angular mode extrapolates the samples
from the reference samples, depending on the directional orientation in order to achieve lower
complexity. When the direction selected is between 2 and 17, the samples located in the above row
(red samples and may be green samples, as shown in Fig.16 (a)) are projected as additional samples
located in the left column, extending the left reference column. When the direction selected is
between 18 and 34, the samples located at the left column (blue samples and may be orange
samples, as shown in Fig.16 (b)) are projected as samples located in the above row, extending the
top reference row. In both cases the samples projected would have negative indexes [3].
18 | P a g e
Figure 16: Example of diagonal orientation [3]
5.4.2. Sample Prediction for Angular Prediction Mode [3]:
Predicted sample values p[x][y] are obtained by projecting the location of the sample p[x][y] to
the reference sample array applying the selected prediction direction and interpolating a value for
the sample at 1/32 sample position accuracy[3].
Prediction for horizontal modes (modes 2-17) is given by:
And Sample Prediction for vertical modes (modes 18–34) is given by:
19 | P a g e
where i is the projected integer displacement on row y (for vertical modes) or column x (for
horizontal modes) and calculated as a function of angular parameter A as follows:
f represents the fractional part of the projected displacement on the same row or column and is
calculated as:
5.5. DC Prediction:
This mode is also similar to the DC mode in H.264/MPEG-4 AVC. It is efficient to predict plane
areas of smoothly-varying content in the image, but gives a coarse prediction on the content of
higher frequency components and as such it is not efficient for finely textured areas.
The value of each sample of the PB is an average of the reference samples. As explained before,
for this case the reference samples will be the boundary samples of the top and left neighboring
TBs.
5.6. Planar Prediction:
This mode in HEVC is similar to the planar mode in H.264/MPEG-4 AVC, and is known as
mode 0. In H.264/MPEG-4 AVC this method is a plane prediction mode for textured images, and
may introduce discontinuities along the block boundaries. Conversely, in HEVC this mode was
improved in order to preserve continuities along the block edges.
Planar mode is essentially defined as an average value of two linear predictions using four corner
reference samples. This mode is implemented as follows, with reference to Fig.17; the sample X
is the first sample predicted as an average of the samples D and E, then the right column samples
(blue samples) are predicted using bilinear interpolation between samples in D and X, and the
bottom row samples (orange samples) are predicted using bilinear interpolation between samples
in E and X. The remaining samples are predicted as the averages of bilinear interpolations between
boundaries samples and previously coded samples [14].
20 | P a g e
Figure 17: Planar intra prediction mode [14]
5.7. Smoothing Filter:
HEVC uses a smoothing filter in order to reduce the discontinuities introduced by the intra-
prediction modes. This is applied to the boundary samples, namely the first prediction row and
column for DC mode, or the first prediction row for pure horizontal prediction, or the first
prediction column for pure vertical prediction. The smoothing filter consists of a two-tap finite
impulse response filter for DC prediction or a gradient-based smoothing filter for horizontal (mode
10) and vertical (mode 26) prediction. Due to the fact that chroma components tend to be already
smooth, this filter is not used in this case. Prediction boundary smoothing is only applied to luma
component.
The smoothing filter is applied to the reference samples depending on the size of the blocks and
the directionalities of the prediction. Thanks to using this filter, contouring artefacts caused by
boundaries in the reference samples may be drastically reduced.
6. Intra Mode Coding:
While increasing the number of intra prediction modes provides better prediction, efficient intra
mode coding is required to ensure that the selected mode is signaled with minimal overhead. For
luma component, three most probable modes are derived to predict the intra mode instead of a
single most probable one as in H.264/AVC [5]. Possible redundancies among the three most
probable modes are also considered and redundant modes are substituted with alternative ones to
maximize the signaling efficiency. For chroma intra mode, HEVC introduces a derived mode
which allows efficient signaling of the likely scenario where chroma is using the same prediction
mode as luma. The syntax elements for signaling luma and chroma intra modes are designed by
utilizing the increased number of most probable modes for the luma component and the statistical
behavior of the chroma component.
21 | P a g e
6.1. Prediction of Luma Intra Mode [3]:
HEVC supports a total of 33 angular prediction modes as well as planar and DC prediction for
luma intra prediction for all the PU sizes. Due to the large number of intra prediction modes,
H.264/AVC-like mode coding approach based on a single most probable mode was not effective
in HEVC. Instead, HEVC defines three most probable modes for each PU based on the modes of
the neighboring PUs. The selected number of most probable modes makes it also possible to
indicate one of the 32 remaining modes by a CABAC bypassed fixed-length code, as distribution
of the mode probabilities outside of the set of most probable modes has been found to be relatively
uniform. The selection of the set of three most probable modes is based on modes of two
neighboring PUs, one left and one to the above of the current PU. Let the intra modes of left and
above of the current PU be A and B, respectively. If a neighboring PU is not coded as intra or is
coded with pulse code modulation (PCM) mode, the PU is considered to be a DC predicted one.
In addition, B is assumed to be DC mode when the above neighboring PU is outside the CTU to
avoid introduction of an additional line buffer for intra mode reconstruction.
If A is not equal to B, the first two most probable modes denoted as MPM[0] and MPM[1] are set
equal to A and B, respectively, and the third most probable mode denoted as MPM[2] is determined
as follows:
• If neither of A or B is planar mode, MPM[2] is set to planar mode.
• Otherwise, if neither of A or B is DC mode, MPM[2] is set to DC mode.
• Otherwise (one of the two most probable modes is planar and the other is DC),
MPM[2] is set equal to angular mode 26 (directly vertical). If A is equal to B, the three most
probable modes are determined as follows. In the case they are not angular modes (A and B are
less than 2), the three most probable modes are set equal to planar mode, DC mode and angular
mode 26, respectively. Otherwise (A and B are greater than or equal to 2), the first most probable
mode MPM[0] is set equal to A and two remaining most probable modes MPM[1] and MPM[2]
are set equal to the neighboring directions of A and calculated as [3]:
where% denotes the modulo operator (i.e., a%b denotes the remainder of a divided by b). Fig.18
summarizes the derivation process for the three most probable modes MPM[0], MPM[1] and
MPM[2] from the neighboring intra modes A and B. The three most probable modes MPM[0],
MPM[1] and MPM[2] identified as described above are further sorted in an ascending order
according to their mode number to form the ordered set of most probable modes. If the current
intra prediction mode is equal to one of the elements in the set of most probable modes, only the
index in the set is transmitted to the decoder. Otherwise, a 5-bit CABAC bypassed code word is
used to specify the selected mode outside of the set of most probable modes as the number of
modes outside of the set is equal to 32.
22 | P a g e
Figure 18: Derivation process for the three most probable modes MPM[0], MPM[1] and MPM[2]. A and B indicate
the neighboring intra modes of the left and the above PU, respectively [3]
6.2. Derived Mode for Chroma Intra Prediction:
To enable the use of the increased number of directionalities in the chroma intra prediction while
minimizing the signaling overhead, HEVC introduces the INTRA_DERIVED mode to indicate
the cases when a chroma PU uses the same prediction mode as the corresponding luma PU. More
specifically, for a chroma PU one of the five chroma intra prediction modes: planar, angular 26
(directly vertical), angular 10 (directly horizontal), DC or derived mode is signaled.
This design is based on the finding that often structures in the chroma signal follow those of the
luma. In the case the derived mode is indicated for a chroma PU, intra prediction is performed by
using the corresponding luma PU mode. Angular intra mode 34 is used to substitute the four
chroma prediction modes with individual identifiers when the derived mode is one of those four
modes. The substitution process is illustrated in Table 3.
Table 3: Determination of chroma intra prediction mode according to luma intra prediction mode [3]
23 | P a g e
7. Encoding Algorithms:
Due to the large number of intra modes in HEVC, the computations of the rate distortion costs for
all intra modes are impractical for most of the applications. In the HEVC reference software, the
sum of absolute transformed differences (SATD) between prediction and original samples is used
to reduce the number of luma intra mode candidates before applying rate-distortion optimization
(RDO).
The number of luma intra mode candidates entering full RDO is determined according to the
corresponding PU sizes as eight for PUs of size 4_4 and 8_8 PU, and three for other PU sizes. For
those luma intra mode candidates as well as luma intra modes which are a part of the most probable
mode set, intra sample prediction and transform coding are performed to obtain both the number
of required bits and the resulting distortion. Finally, the luma intra mode which minimizes the rate-
distortion cost is selected. For the chroma intra encoding, all possible intra chroma modes are
evaluated based on their rate-distortion costs as the number of intra chroma modes is much smaller
than that of luma modes. It has been reported that this kind of technique can provide negligible
coding efficiency loss (less than 1%increase in bit rate at aligned objective quality) compared to a
full rate-distortion search while reducing the encoding complexity by a factor of three.
24 | P a g e
8. Intra Prediction in HM code [23]:
Figure 19: Flow chart showing important functions in Intra Prediction
• Library-TLibcommon
• Source file :TComPattern.cpp ,TComPrediction.cpp and TComRDcost.cpp
• Functions: xCompressCu, estIntraPredQT, estIntraPredChromaQT,
xRecurIntraCodingQT, xRecurIntraChromaCodingQT, initAdiPattern,
predIntraLumaAng, calcHAD, xUpdateCandListx, RecurIntraCodingQT,
fillReferenceSamples, xPredIntraPlanar, xPredIntraAng and predIntraGetPredValDC.
25 | P a g e
9. Computational Complexity of Intra prediction [26]:
HEVC encoders are expected to be several times more complex than H.264/AVC encoders and
HEVC decoders does not appear to be significantly different from that of H.264/AVC decoders.
The complexity of some key modules such as transforms, intra picture prediction, and motion
compensation is likely higher in HEVC than in H.264/AVC, complexity was reduced in others
such as entropy coding and deblocking.
HEVC features many more mode combinations as a result of the added flexibility from the quad
tree structures and the increase of intra picture prediction modes. An encoder fully exploiting the
capabilities of HEVC is thus expected to be several times more complex than an H.264/AVC
encoder. This added complexity does however have a substantial benefit in the expected
significant improvement in rate-distortion performance.
Table 4: Encoding time [26]
26 | P a g e
10. Differences of intra prediction techniques between H.264/AVC and HEVC
[3]:
Table 5 shows the comparison of intra prediction techniques between H.264 and HEVC.
Table 5: Differences of intra prediction techniques between H.264 and HEVC [3]
27 | P a g e
ACKNOWLEDGEMENT
We would sincerely like to thank Dr. K. R. Rao and Tuan Ho for their constant support and
guidance throughout the duration of our project.
28 | P a g e
References:
[1] G. J. Sullivan et al, “Overview of the high efficiency video coding (HEVC) standard”, IEEE
Transactions on circuits and systems for video technology, vol. 22, no.12, pp. 1649 – 1668, Dec .
2012.
[2] K. R. Rao, D. N. Kim and J. J. Hwang, “Video Coding Standards: AVS China, H.264/MPEG-
4 Part10, HEVC, VP6, DIRAC and VC-1”, Springer, 2014.
[3] V. Sze, M. Budagavi, and G. J. Sullivan, “High Efficiency Video Coding (HEVC): Algorithms
and Architectures”, Springer, 2014.
[4] M. Wien, “High Efficiency Video Coding: Coding Tools and Specification”, Springer, 2014.
[5] JVT Draft ITU-T recommendation and final draft international standard of joint video
specification (ITU-T Rec. H.264-ISO/IEC 14496-10 AVC), March 2003, JVT-G050 available on
http://ip.hhi.de/imagecom_G1/assets/pdfs/JVT-G050.pdf.
[6] I.E.G.Richardson, “H.264 and MPEG-4 Video Compression Video Coding for Next-
generation Multimedia”, New York, Wiley, 2003
[7] G.J. Sullivan and T.Wiegand, “Rate-Distortion Optimization for Video Compression”. IEEE
Signal Processing Magazine, vol. 15, no. 6, pp. 74-90, Nov. 1998.
[8] J.-R. Ohm et al, “Comparison of the coding efficiency of video coding standards – including
high efficiency video coding (HEVC)”, IEEE Transactions on circuits and systems for video
technology, vol. 22, pp.1669-1684, Dec. 2012. Software and data for reproducing selected results
can be found at ftp://ftp.hhi.de/ieee-tcsvt/2012.
[9] V. Sze and M. Budagavi, “High throughput CABAC entropy coding in HEVC”, IEEE
Transactions on circuits and systems for video technology, vol. 22, pp.1778-1791, Dec. 2012.
[10] I. E. G. Richardson, “The H.264 Advanced Video Compression Standard,” II Edition, Wiley,
2010.
[11] J. Lainema et al, “Intra coding of the HEVC standard”, IEEE Transactions on circuits and
systems for video technology, vol. 22, pp.1792-1801, Dec. 2012.
[12] P. Helle et al, “Block merging for quadtree-based partitioning in HEVC”, IEEE Transactions
on circuits and systems for video technology, vol. 22, pp.1720-1731, Dec. 2012.
[13] G. Hill, “The Cable and Telecommunications Professionals’ Reference: Transport Networks”.
Vol 2, Third Edition, 2008.
ftp://ftp.hhi.de/ieee-tcsvt/2012
29 | P a g e
[14] N. Ling, “High efficiency video coding and its 3D extension: A research perspective,”
Keynote Speech, ICIEA, pp. 2150-2155, Singapore, July 2012
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6361087.
[15] I. E. G. Richardson, “Video Codec Design: Developing Image and Video Compression
Systems”, Wiley, 2002.
[16] HEVC white paper- http://www.ateme.com/an-introduction-to-uhdtv-and-hevc
[17] G. J. Sullivan, et al, “Standardized Extensions of High Efficiency Video Coding (HEVC)”,
IEEE Journal of selected topics in Signal Processing, Vol. 7, No. 6, pp. 1001-1016, Dec. 2013.
[18] D. K. Kwon and M. Budagavi, “Combined scalable and multiview extension of High
Efficiency Video Coding (HEVC)”, IEEE Picture Coding Symposium, pp. 414 – 417 , Dec . 2013.
[19] M. Budagavi, “HEVC/H.265 and recent developments in video coding standards ” , EE
Department Seminar , UT Arlington , Arlington , 21 Nov. 2014.
[20] HEVC tutorial by I.E.G. Richardson: http://www.vcodex.com/h265.html
[21] C. Fogg, “Suggested figures for the HEVC specification”, ITU-T & ISO/IEC JCTVC-
J0292r1, July 2012.
[22] D. Marpe, et al, “Context-based adaptive binary arithmetic coding in the H.264/AVC video
compression standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol.
13, pp. 620–636, July 2003.
[23] HM C++ Code: https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/
[24] HM software manual:
https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/trunk/doc/software-manual.pdf
[25] T.L. da Silva, L.V. Agostini and L.A. da Silva Cruz, “HEVC intra coding acceleration based
on tree intra-level mode correlation”, IEEE Conference 2013 Signal Processing: Algorithms,
Architectures, Arrangements, and Applications, Poznan, Poland, Sept.2013.
[26] H. Zhang and Z. Ma, “Fast intra mode decision for high-efficiency video coding”, IEEE
Transactions on circuits and systems for video technology, Vol.24, pp.660-668, Apr. 2014.
[27] JCT-VC documents can be accessed. [online]. Available: http://phenix.int-
evry.fr/jct/doc_end_user/current_meeting.php?id_meeting=154&type_order=&sql_type=docume
nt_number
http://www.ateme.com/an-introduction-to-uhdtv-and-hevc
http://www.vcodex.com/h265.html
https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/
https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/trunk/doc/software-manual.pdf
https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/trunk/doc/software-manual.pdf
https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/trunk/doc/software-manual.pdf
http://phenix.int-evry.fr/jct/doc_end_user/current_meeting.php?id_meeting=154&type_order=&sql_type=document_number
http://phenix.int-evry.fr/jct/doc_end_user/current_meeting.php?id_meeting=154&type_order=&sql_type=document_number
http://phenix.int-evry.fr/jct/doc_end_user/current_meeting.php?id_meeting=154&type_order=&sql_type=document_number
30 | P a g e
[28] VCEG & JCT documents available from http://wftp3.itu.int/av-arch in the video-site and jvt-
site folders
[29] Test Sequences – download
TS.1 http://media.xiph.org/video/derf/
TS.2 http://trace.eas.asu.edu/yuv/
TS.3 http://media.xiph.org/
TS.4 http://www.cipr.rpi.edu/resource/sequences/
TS.5 HTTP://BASAK0ZTAS.NET