Multimedia Software Systems CS4551
Psychoacoustics and MPEG Audio Compression
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
Sound as Perceived
Copyright By PowCoder代写 加微信 powcoder
• Compression attained by variations of PCM coding techniques alone are not sufficient to attain data rates for modern applications to transmit CD and Surround Sound quality music.
• Perception of Sound additionally can help in compression by studying
– What frequencies we hear
– When do we hear them
– When do we not hear them
• This branch of study –Psychoacoustics– deals with sound perception science. “Auditory Masking” is a perceptual weakness of the ear, which can be used to exploit compression without compromising quality
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
Psychoacoustics – Human Hearing
• Humanhearing
– Frequency range of human hearing: 20Hz ~ 20 KHz
* Ultrasonic are sounds at higher frequencies.
– Frequency range of the voice: 500Hz to 4kHz
– Human can hear up to 120dB (sound amplitude).
– The reference point for 0dB is the quietest sound that human
can hear measured at 1KHz. Technically, it is the sound that
creates a barely audible sound intensity of 10 Watt per
square meter.
– The range of hearing depends on frequency.
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
Psychoacoustics – Human Hearing • Equal-Loudness
• Fletcher-Munson equal- loudness curve
– Frequencies (x-axis)
– Sound pressure level in dB (y-axis)
– Perceived loudness plotted
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
Psychoacoustics – Human Hearing
• Equal-Loudness Relations
– Two tones with the same amplitude but different frequencies sound
louder than the other.
– At the 4kHz dip, we perceive the sound as 10dB when the actual sound pressure is 2dB. At 10kHz, the same 10dB sound is produced by the actual 20dB sound pressure.
– Ear is most sensitive to frequencies b/w 1kHz and 5kHz. (because the ear canal amplifies frequencies from 2.5kHz to 4kHz). Ear is not usually sensitive to low frequencies.
– As the overall loudness increases, the curves flattens. => If the sound level is loud enough, we are equally sensitivities to low frequencies.
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
Psychoacoustics – Human Hearing
• Threshold of hearing
• Masking Threshold Curve
– A tone is audible only if its power is above the absolute threshold level
– 2dB for 6kHz tone
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
Psychoacoustics – Human Hearing
• Frequency Masking
– A lower tone can effectively mask (make us unable to hear) a higher
tone. Reverse is not true.
– The greater the power in the masking tone, the wider its influence – the broader the range of frequencies it can mask.
– If two tones are widely separated in frequency, little masking occurs.
– Experiments
• Play a tone (frequency) at a loud volume and check how the tone affects out ability to hear tones at nearby frequencies.
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
Psychoacoustics – Human Hearing • Frequency Masking
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
Generate a masking tone (1kHz tone at sound level 60dB), raise the level of nearby tones until it is audible and plot the level.
Psychoacoustics – Human Hearing
• Frequency Masking
•Masking diagram changes according to the frequency of a masking tone.
•The higher frequency of the masking tone, the broader a range of influence.
=> If a signal can be decomposed into frequencies, then we can detect masked frequencies and only audible part can be quantized.
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
Psychoacoustics – Human Hearing
• Frequency masking and critical
• Human auditory system can be roughly modeled as a filterbank, consisting of 25 bands.
• Each band is called a critical band
• The ear cannot distinguish sounds within the same band that occur simultaneously.
• Critical bandwidth is non-uniform.
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
Psychoacoustics – Human Hearing
• Frequency masking and critical band
• Critical bandwidth represented in frequency unit is non-uniform. So a new unit Bark is
introduced in order to define a more perceptually uniform unit for critical band.
• Critical bands: The widths of the masking bands for different masking tones are different,
increasing with the frequency of the masking tone.
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
Psychoacoustics – Human Hearing
• Temporal masking
– After the dance, it takes time for our hearing system return to normal.
How long does it take to come back to normal?
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
Psychoacoustics – Human Hearing • Temporal masking
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
Psychoacoustics – Human Hearing
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
Perceptual Coding
• Perceptual coding is a coding based on psychoacoustic model of human hearing.
• Perceptual coding tries to minimize the perceptual distortion in a transform coding scheme by carrying out the psychoacoustic model of hearing.
• Basic concept: allocate more bits (more quantization levels, less error) to those frequency channels that are most audible, fewer bits (more error) to those channels that are the least audible.
• Needs to continuously analyze the signal to determine the current audibility threshold curve using a perceptual model.
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
Perceptual Coding
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
Perceptual Coding Example
• Assume that the levels of 16 bands are:
• Assume that if the level of the 8th band is 60dB, it gives a masking of 12dB in the 7th band, 15dB in the 9th.
– Level in 7th band is 10 dB ( < 12 dB ), so ignore it.
– Level in 9th band is 35 dB ( > 15 dB ), so encode it.
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
Audio Coding : Main Standards
• ISOMPEG(MovingPictureExpertGroup)family
– MPEG1-Layer 1, Layer 2, Layer 3 (MP-3)
– MPEG2-Backward compatible with MPEG1, AAC (non- backward compatible)
– MPEG4–CELP (Code Excited Linear Prediction) and AAC (Advanced Audio Coding)
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
Basic MPEG-1 Audio Encoder/Decoder
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
MPEG-1 Coding Steps
1. Divide the audio signal into frequency subbands.
2. Determine amount of masking for each band based on its frequency and the energy of its neighboring band in frequency and time (the psychoacoustic model).
3. If the energy in a band is below the masking threshold, don’t encode it. Otherwise, determine number of bits needed to represent the coefficient in this band.
4. Format bitstream: insert proper headers, code the side information (e.g. quantization scale factors) and code the quantized coefficient indices, using variable length encoding (e.g. Huffman coding).
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
MPEG-1 Audio Coding
• LayeredAudioCompressionScheme,eachbeing backward compatible
– Transparent (undetectable difference from original signal) at
384 Kbps/channel
– Digital Audio Tape typically uses Layer1.
– Use only frequency masking of the psychoacoustic model
– Subband coding with 32 channels (12 samples/band)
– No entropy coding after transform coding
– Decoder is much simpler than the encoder.
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
MPEG-1 Audio Coding
– Transparent at 296 Kbps/channel
– Proposed for use in digital audio broadcasting
– Subband coding with 32 channels (36 samples/band)
– Use a bit of the temporal masking as well
– Bitrate reduction and quality improvement at the price of an increase in complexity
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
MPEG-1 Audio Coding
• Layer 3 (MP3)
– Transparent at 96 Kb/s per channel
– Aimed at audio transmission over ISDN line. Gained popularity via MP3 players
– Use temporal masking as well
– Use 32 channels more close to true perceptual critical bands
– Has entropy coder (Huffman)!
– Much more complex than Layer 1 and 2
– At the time of MPEG1 audio development (finalized 1992), Layer 3 was considered too complex to be practically useful. But today, layer 3 is the most widely deployed audio coding method (known as MP3).
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
MP3 Performance Comparison
CSULA CS4551 Multimedia Software Systems by Eun-Young Kang
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com