Multimedia Software Systems CS4551
Audio Compression
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
What is Sound?
Copyright By PowCoder代写 加微信 powcoder
• Physics Introduction
– Sound is a waveform like light.
– It involves molecules of air being compressed and expanded under the action of some physical device.
– Without air, there is no sound.
• A speaker in an audio system
– Vibrates back and forth and produces a longitudinal pressure wave that we perceive as sound.
• Recording instruments convert the pressure wave to an electrical waveform signal, which is then sampled and quantized to get a digital signal.
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
What is Sound? (2)
• High frequency sound correspond to high pitch (degree of highness and lowness) sound
• Tone is a quality of a sound. i.e. different musical instrument produces different sounds despite that they play same frequencies.
• Amount energy of sound which spans a given radius is intensity of sound.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
Sound – Sampling and Quantization
• From an analog signal (continuous measurement of pressure wave, e.g. continuous valued voltages produced by microphones) to a digital signal
• Sampling – digitization in time dimension (Nyquist Theorem)
• Quantization – digitization in the amplitude dimension
– Quantization introduces error! Listen to 16, 12, 8, 4 bit music and see the difference.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
Sound – Sampling and Quantization (2)
• Digitalaudioisrepresentedasaonedimensionalsetof samples.
• Digital audio is represented in the form of channels.
– The number of channels describes whether the audio signal is mono, stereo, or even surround sound.
– Each channel consists of a sequence of samples and can be described by a sampling rate and quantization bits.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
Sound – Sampling and Quantization (3)
• Telephone quality speech
– Sampling rate = 8 KHz ,
– Quantization = 16 bits/sample
– Bit rate is = 8 K × 16 = 128 Kbps
• CDs (stereo channels)
– Sampling rate = 44.1 KHz
– Quantization = 16 bits/sample
– Bit rate is 2 × 16 × 44100 1.4 Mbps!
– CD Storage = 10.5 Megabytes/minute
– CD can hold on 70 minutes of audio
• Surround Sound Systems with 5 channels.
– C 3 used in many cinema uses 5.1 channels. “.1” represents subwoofer.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
Non-Linear Quantization
• Assume the speech signal is normalized to the range −1, +1 . If we examine a typical speech signal and its
histogram, we shall see that we rarely use the extreme values −1 and +1.
– In a linear quantization scheme, we assign as many reconstruction levels for larger amplitudes as for smaller amplitudes, which are more probable to occur.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
Non-Linear Quantization
• A law, μ law
• Companding Law –
(COMPression-expANDING) schemes
• Both are used in telephone networks.
• They expand small values and compress large values.
• When a signal goes through a compander, small amplitudes are mapped into a larger interval and larger amplitudes are mapped into a smaller interval.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
Non-Linear Quantization
• More quantization levels are used for the values that originated from small amplitudes.
• This scheme is equivalent to applying non-uniform quantization to the original signal, where smaller quantization levels are used for smaller values and larger quantization levels are used for larger values.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
μ-law/A-Law Coding
• Unlikelinearquantization,thelogarithmicstepspacing represents low-amplitude audio samples with greater accuracy than it does higher amplitude samples.
• A-law is used in European telephone network.
• μ-law is used in America and Japan.
• TheInternationalTelecommunicationUnion- Telecommunication Standardization Sector (ITU-T) Recommendation G.711 codifies the A-law and μ-law encoding scheme.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
μ-law/A-Law Coding
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
It compensates for fact that quantization error much more audible around 0 amplitude.
It results in a SNR that is superior to that obtained by linear encoding for a given number of bits.
μ-law/A-Law Coding
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
A Comparison to the Visual Domain
• Sound is a 1D signal with amplitude at time 𝑡.
• Then,shoulditbesimpletocompresssoundcompared
to a 2D image signal and 3D video signals?
• Considerhumanperceptionfactorshumanauditory system is more sensitive to quality degradation than the visual system. As a result, humans are more prone to compressed audio errors than compressed image and video errors.
• Compressionratiosattainedforvideoandimagesare greater than those attained for audio.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
Audio Compression
• We need to take advantage of redundancy/correlation in the signal by statistically studying the signal but just that is not enough!
• The amount of redundancy that can be removed all through out is very little and hence all the coding methods for audio generally give a lower compression ratio than images or video.
• Anyothertechniques?
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
Types of Audio Compression Techniques
• AudioCompressiontechniquescanbebroadly classified into different categories depending on how sound is understood
• Sound is a Waveform
– Use Statistical Distribution /etc.
– Not a good idea in general by itself
• SoundisPerceived(PerceptionBased)
– Psycho acoustically motivated
– Need to understand the human auditory system
• SoundisProduced
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
Sound as Waveform
• WaveformCompressionTechniques – Uses variants of PCM techniques.
– PCM (Pulse Code Modulation)
– DPCM (Differential PCM)
– DM (Delta Modulation), Adaptive DM – ADPCM (Adaptive DPCM)
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
PCM (Pulse Code Modulation)
• PCM is a formal term for the sampling and quantization. It involves sampling rate and quantizer (uniform or non uniform).
– Speech (8 KHz, 16 bits/sample)
– CDs music (stereo channels, 44.1 KHz, 16 bits/sample)
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
PCM (Pulse Code Modulation)
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
DPCM (Differential PCM)
• Predictivecoding:
– Let 𝑛−1 𝑡h samplebe𝑓 𝑛−1 .
– In general, use 𝑓 𝑛 − 1 as the predicted value for 𝑓 𝑛 .
• DifferentialPCMEncoding(
– Don’t quantize and transmit sample 𝑓 𝑛 directly
– Compute the residual 𝑒 𝑛 = 𝑓 𝑛 − 𝑓 𝑛 − 1
– Quantize 𝑒 𝑛 and transmit 𝑒ǁ 𝑛 (let’s say that the quantized 𝑒 𝑛 is 𝑒ǁ 𝑛 )
• Wecanshowthat𝑆𝑁𝑅𝐷𝑃𝐶𝑀>𝑆𝑁𝑅𝑃𝐶𝑀. CSULA CS451 Multimedia Software Systems by Eun-Young Kang
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
DPCM – Detail 𝑓 : reconstructed signal
𝑓 : predicted signal
𝑓 : input signal 𝑛
defined as a function of previous reconstructed values
𝑛𝑛𝑛 𝑒ǁ𝑛 = 𝑄 𝑒𝑛
Transmit 𝑒ǁ𝑛
The decoder reconstructs the signal using 𝑓 = 𝑓 + 𝑒ǁ
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
Closed-Loop DPCM
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
Closed-Loop DPCM
– Consider the sequence of 130 150 140 200 230.
𝑒ǁ𝑛 =𝑄 𝑒𝑛 =16×trunc 255+𝑒𝑛 /16 −256+8
/2 for prediction
• Assume that the first value will be transmitted without loss. What is the encoded values using the following prediction and quantization scheme? What is the decoded result?
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
መሚሚ 𝑓 = trunc 𝑓 + 𝑓
DM (Delta Modulation)
• Delta Modulation
– Like DPCM but only encodes differences using a single bit suggesting a delta increase or a delta decrease
– Good for signals that don’t change rapidly
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
DM – Detail 𝑓 : reconstructed signal
𝑓 : input signal 𝑛
𝑖𝑓 𝑒𝑛 > 0 , where 𝑑𝑒𝑙𝑡𝑎 is a constant
The decoder reconstructs the signal using 𝑓 = 𝑓 + 𝑒ǁ
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
𝑒ǁ𝑛 = ቊ+𝑑𝑒𝑙𝑡𝑎,
−𝑑𝑒𝑙𝑡𝑎, 𝑜𝑡h𝑒𝑟𝑤𝑖𝑠𝑒
Transmit 𝑒ǁ𝑛
DM (Delta Modulation)
• Considerthesamples10,11,13,and15.Supposedelta = 4.
– Delta encoder uses 1-bit encoder for encoding differences using a fixed delta value.
– Let’s use 1 for delta increase and 0 for delta decrease, then the encoder generates 10, 1, 0, 1.
– The decoder reconstructs 10, 14, 10, 14.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
• AdaptiveDM
Adaptive DM
– If the slope of the actual curve is high, the staircase approximation cannot keep up. Adaptive DM changes step size delta adaptively in response to the signal s current properties.
– One way to change the delta size adaptively :
• The encoder considers the previous 𝑁 bits of output (𝑁 = 3 or 𝑁 = 4
are very common) to determine adjustments to the delta size.
• Iftheprevious𝑁bitsareall1sor0s,thestepsizeisdoubled.
• Otherwise, the step size is halved.
• The step size is adjusted for every input sample processed.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
ADPCM (Adaptive DPCM)
• AdaptiveDifferentialPulseCodeModulation
– Sophisticated version of DPCM.
– Adapts predictor to signal characteristics
– Also adapts width of quantization steps to signal characteristics
– Better quality than DPCM with same storage requirements
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
ADPCM (Adaptive DPCM)
• LikeDPCM,itcodesthedifferencesbetweenthe quantized audio signals using only a small number of specific bits which adaptively vary by signal.
• For example, G.726 ADPCM encodes difference in 4 bits, but vary the mapping of bits to difference dynamically.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
ADPCM (Adaptive DPCM) – Encoder
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
ADPCM (Adaptive DPCM) – Decoder
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
ADPCM (Adaptive DPCM)
• AdaptiveQuantization
– If rapid changing signal that produce difference with large fluctuations, use large differences.
– If slow changing signal that produce difference signals with small fluctuations, use small differences
• AdaptivePrediction
– Generally, change the coefficient a of the prediction
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
ADPCM (Adaptive DPCM)
• PredictionisusuallydonebasedonMPreviousValues (previously reconstructed quantized
• ADPCM adaptively change 𝑎𝑖 values that minimizes 𝑁
• Simplysolve
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
min 𝑓 −𝑓 𝑛𝑛
min 𝑓−𝑎𝑓
Audio Coding: Main Standards
• ITUSpeechCodingStandards – ITU G.711
– ITU G.722 – ITU G.726 – ITU G.728 – ITU G.729
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
• ITU G.711
ITU – G.7xx
– Designed for telephone bandwidth speech signal (3 KHz)
– Does direct sample by sample non uniform quantization (PCM with u law/A law encoding scheme.
– Encoder creates a 64 kbps bitstream for a signal sampled at 8 kHz.
– Provides the lowest delay possible (1 sample) and the lowest complexity.
– High rate and no recovery mechanism, used as the default coder for ISDN video telephony
• ITU G.722
– Designed for 7 KHz bandwidth voice or music
– Operating at 48, 56 and 64 kbps for sample audio data at a rate of 16 kHz.
– Divides signal in two bands (high pass and low pass), which are then encoded with
different modalities. Two sub band ADPCM.
– G.722 is preferred over G.711 PCM for teleconference type applications. Music
quality is not perfectly transparent.
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
• ITU G.726
ITU – G.7xx
– ADPCM speech codec standard covering the transmission of voice at rates of 16, 24, 32, and 40 kbps for a signal sampled at 8 kHz.
c.f ADPCM implementation on TI DSP
• ITU G.728
– Speech coding operating at 16 kbps for low bit rate (64-128 Kb/s) ISDN video telephony
– Hybrid between the lower bit rate model based coders (G.729) and ADPCM coders
• ITU G.729
– Coding of speech at 8 kbps using model based coders that use special models of production (synthesis) of speech
CSULA CS451 Multimedia Software Systems by Eun-Young Kang
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com