Compression of digital voice and video

Compression of Digital Voice and Video

Overview of Data Compression

The benefits of data compression in high-speed networks are obvious. Following are
those that are especially important for the compressed version of data.

• Less transmission power is required.
• Less communication bandwidth is required.
• System efficiency is increased.

There are, however, certain trade-offs with data compression. For example, the encoding
and decoding processes of data compression increase the cost, complexity, and delay of
data transmission. Both of the two processes of data compression are required for
producing multimedia networking information: compression with loss and compression
without loss.
In the first category of data compression, some less valuable or almost similar data must
be eliminated permanently. The most notable case of compression with loss is the process
of signal sampling. In this category, for example, is voice sampling.

The following figure shows the basic information process in high-speed communication
systems. Any type of "source" data is converted to digital form in a long information-
source process. The outcome is the generation of digital words. Words are encoded in the
source coding system to result in a compressed form of the data.

Digital Voice and Compression

Signal Sampling
In the process of digitalizing a signal, analog signals first go through a sampling process,
as shown in the following figure. The sampling function is required in the process of
converting an analog signal to digital bits. However, acquiring samples from an analog
signal and eliminating the unsampled portions of the signal may result in some permanent
loss of information. In other words, the sampling resembles an information-compression
process with loss.

Sampling techniques are of several types:

• Pulse amplitude modulation (PAM), which translates sampled values to pulses
with corresponding amplitudes
• Pulse width modulation (PWM), which translates sampled values to pulses with
corresponding widths
• Pulse position modulation (PPM), which translates sampled values to identical
pulses but with corresponding positions to sampling points

Quantization and Distortion

Samples are real numbersdecimal-point values and integer valuesand, thus, up to infinite
bits are required for transmission of a raw sample. The transmission of infinite bits
occupies infinite bandwidth and is not practical for implementation. In practice, sampled
values are rounded off to available quantized levels.

Still Images and JPEG Compression

This section investigates algorithms that prepare and compress still and moving
images. The compression of such data substantially affects the utilization of bandwidths
over the multimedia and IP networking infrastructures. We begin with a single visual

image, such as a photograph, and then look at video, a motion image. The Joint
Photographic Experts Group (JPEG) is the compression standard for still images. It is
used for gray-scale and quality-color images. Similar to voice compression, JPEG is a
lossy process. An image obtained after the decompression at a receiving end may not be
the same as the original.

The DCT process is complex and converts a snapshot of a real image into a
matrix of corresponding values. The quantization phase converts the values generated by
DCT to simple numbers in order to occupy less bandwidth. As usual, all quantizing
processes are lossy.

Raw-Image Sampling and DCT
As with a voice signal, we first need samples of a raw image: a picture. Pictures are of
two types: photographs, which contain no digital data, and images, which contain digital
data suitable for computer networks. An image is made up of m x n blocks of picture
units, or pixels, as shown in the following figure. For FAX transmissions, images are
made up of 0s and 1s to represent black and white pixels, respectively.

JPEG Files

Color images are based on the fact that any color can be represented to the human eye by
using a particular combination of the base colors red, green, and blue (RGB). Computer
monitor screens, digital camera images, or any other still color images are formed by
varying the intensity of the three primary colors at pixel level, resulting in the creation of
virtually any corresponding color from the real raw image. Each intensity created on any
of the three pixels is represented by 8 bits.

GIF Files
JPEG is designed to work with full-color images up to 2 24 colors. The graphics
interchange format (GIF) is an image file format that reduces the number of colors to
256. This reduction in the number of possible colors is a trade-off between the quality of
the image and the transmission bandwidth. GIF stores up to 28 = 256 colors in a table and
covers the range of colors in an image as closely as possible. Therefore, 8 bits are used to
represent a single pixel. GIF uses a variation of Lempel-Ziv encoding for compression of
an image.

Encoding
In the last phase of the JPEG process, encoding finally does the task of compression. In
the quantization phase, a matrix with numerous 0s is produced. The Q matrix in this
example has produced 57 zeros from the original raw image. A practical approach to
compressing this matrix is to use run-length coding .f run-length coding is used, scanning
matrix Q[i][j] row by row may result in several phrases.
This method is attractive because the larger values in the matrix tend to collect
in the upper-left corner of the matrix, and the elements representing larger values tend to
be gathered together in that area of the matrix. Thus, we can induce a better rule:
Scanning should always start from the upper-left corner element of the matrix. This way,
we get much longer runs for each phrase and a much lower number of phrases in the run-
length coding.

Moving Images and MPEG Compression
A motion image, or video is a rapid display of still images. Moving from one image to
another must be fast enough to fool the human eye. There are different standards on the
number of still images comprising a video clip.

The common standard that defines the video compression is the Moving Pictures Expert
Group (MPEG), which has several branch standards:

• MPEG-1, primarily for video on CD-ROM
• MPEG-2, for multimedia entertainment and high-definition television (HDTV)
and the satellite broadcasting industry
• MPEG-4, for object-oriented video compression and videoconferencing over low-
bandwidth channels
• MPEG-7, for a broad range of demands requiring large bandwidths providing
multimedia tools
• MPEG-21 for interaction among the various MPEG groups.

Logically, using JPEG compression for each still picture does not provide sufficient
compression for video as it occupies a large bandwidth. MPEG deploys additional
compression. Normally, the difference between two consecutive frames is small. With
MPEG, a base frame is sent first, and successive frames are encoded by computing the
differences.

Depending on the relative position of a frame in a sequence, it can be compressed
through one of the following types of frames:

• Interimage (I) frames. An I frame is treated as a JPEG still image and compressed
using DCT.
• Predictive (P) frames. These frames are produced by computing differences
between a current and a previous I or P frame.
• Bidirectional (B) frames. A B frame is similar to a P frame, but the P frame
considers differences between a previous, current, and future frames.

Snapshot of moving frames for MPEG compression

MP3 and Streaming Audio
The MPEG-1 layer 3 (MP3) technology compresses audio for networking and
producing CD-quality sound. The sampling part of PCM is performed at a rate of 44.1
KHz to cover the maximum of 20 KHz of audible signals. Using the commonly used 16-
bit encoding for each sample, the maximum total bits required for audio is 16 x 44.1 =
700 kilobits and 1.4 megabits for two channels if the sound is processed in a stereo
fashion. For example a 60-minute CD (3,600 seconds) requires about 1.4 x 3,600 = 5,040
megabits, or 630 megabytes. This amount may be acceptable for recording on a CD but is
considered extremely large for networking, and thus a carefully designed compression
technique is needed.

MP3 combines the advantages of MPEG with "three" layers of audio
compressions. MP3 removes from a piece of sound all portions that an average ear may
not be able to hear, such as weak background sounds.

Limits of Compression with Loss
Hartely, Nyquist, and Shannon are the founders of information theory, which has
resulted in the mathematical modeling of information sources. Consider a communication
system in which a source signal is processed to produce sequences of n words

Basics of Information Theory
If ai is the most likely output and aj is the least likely output, clearly, aj conveys the most
information and ai conveys the least information. This observation can be rephrased as an
important conclusion: The measure of information for an output is a decreasing and
continuous function of the probability of source output. To formulate this statement, let
Pk1 and Pk2 be the probabilities of an information source's outputs ak1 and ak2, respectively.
Let I(Pk1) and I(Pk2) be the information content of ak1 and ak2, respectively. The following
four facts apply.

1. As discussed, I(Pk) depends on Pk.
2. I(Pk) = a continuous function of Pk.
3. I(Pk) = a decreasing function of Pk.
4. Pk = Pk1.Pk2 (probability of two outputs happen in the same time).
5. I(Pk) = I(Pk1) + I(Pk2) (sum of two pieces of information).

Compression Methods Without Loss
Some types of data, including text, image, and video, might contain redundant or
repeated elements. If so, those elements can be eliminated and some sort of codes
substituted for future decoding. In this section, we focus on techniques that do not incur
any loss during compression:
• Arithmetic encoding
• Run-length encoding
• Huffman encoding
• Lempel-Ziv encoding

Run-Length Encoding

One of the simplest data-compression techniques is run-length encoding. This technique
is fairly effective for compression of plaintext and numbers, especially for facsimile
systems. With run-length code, repeated letters can be replaced by a run length,
beginning with Cc to express the compression letter count.

Huffman Encoding

Huffman encoding is an efficient frequency-dependent coding technique. With this
algorithm, source values with smaller probabilities appear to be encoded by a longer
word. The algorithm that implements such a technique is as follows.

Begin Huffman Encoding Algorithm

1. Sort outputs of the source in decreasing order of their probabilities. For example,
0.7, 0.6, 0.6, 0.59, ..., 0.02, 0.01.

2. Merge the two least probabilistic outputs into a single output whose probability is
the sum of corresponding probability, such as 0.02 + 0.01 = 0.03.
3. If the number of remaining outputs is 2, go to the next step; otherwise, go to step
1.
4. Assign 0 and 1 as codes on the diagram.
5. If a new output is the result of merging two outputs, append the code word with 0
and 1; otherwise, stop.

Lempel-Ziv Encoding
Lempel-Ziv codes are independent of the source statistics. This coding technique is
normally used for UNIX compressed files. The algorithm that converts a string of logical
bits into a Lempel-Ziv code is summarized as follows.
Begin Lempel-Ziv Encoding Algorithm
1. Any sequence of source output is passed in a phrase of varying length. At the first
step, identify phrases of the smallest length that have not appeared so far. Note
that all phrases are different, and lengths of words grow as the encoding process
proceeds.
2. Phrases are encoded using code words of equal length. If k1 = number of bits are
needed to describe the code word and k2 = the number of phrases, we must have

k1 = log2 k2 2.
3. A code is the location of the prefix to the phrases.
4. A code is followed by the last bit of parser output to double-check the last bit.

Compression of digital voice and video

Compression of digital voice and video

More Related Content

What's hot (20)

Similar to Compression of digital voice and video (20)

More from sangusajjan (19)

Recently uploaded (20)

Compression of digital voice and video