Non-Causal Video Encoding Method of P-Frame

Short Paper
ACEEE Int. J. on Signal & Image Processing, Vol. 4, No. 1, Jan 2013

Non-Causal Video Encoding Method of P-Frame
Cui Wang1, Akira Kubota2, Yoshinori Hatori1
1

Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Tokyo, Japan
Email: {wang.c.ad, hatori}@{m, ip}.titech.ac.jp
2
Faculty of Science and Engineering, Chuo University, Tokyo, Japan
kubota@elect.chuo-u.ac.jp

Abstract— In this paper, the feasibility and efficiency of noncausal prediction in a P-frame is examined, and based on the
findings, a new P-frame coding scheme is proposed. Motioncompensated inter-frame prediction, which has been used
widely in low-bit-rate television coding, is an efficient method
to reduce the temporal redundancy in a sequence of video
signals. Therefore, the proposed scheme combines motion
compensation with non-causal prediction based on an interpolative, but not Markov, representation. However, energy
dispersion occurs in the scheme as a result of the interpolative prediction transform matrix being non-orthogonal. To
solve this problem, we have introduced a new conditional pel
replenishment method. On the other hand, Rotation Scanning is also applied as feedback quantization is the quantizer
in this paper. Simulation results show that the proposed coding scheme achieves an approximate 0.3–2 dB improvement
when the entropy is similar to the traditional hybrid coding
method.

and transmission bandwidth reductions by applying this
method.
Conversely, we have developed a hybrid I-frame encoding method based on non-causal interpolative prediction
and differential feedback quantization that utilizes the intraframe spatial correlation [3]. To verify the efficiency of that
hybrid coding method, we compared the method with H.264
in I-frames. As a result, an approximate 0.5~5 dB improvement
was found by applying the developed method [3].
In this paper, a new configuration for P-frame coding is
presented. In designing this hybrid coding scheme, we show
that orthogonal transforms do not need to be considered as
constraints.
II. PROPOSED SCHEME
The proposed coding scheme is shown in Figure 1. In
this model, MC predictive coding is first performed, and then
the residual signal is encoded by an interpolative prediction
(IP) method based on an 8 × 8 block [3]. We term this hybrid
coding method the “MC+IP synthesis configuration”.

Index Terms— non-causal prediction, inter-frame coding, conditional pel replenishment.

I. INTRODUCTION

A. Motion-compensated Predictive Coding
MC predictive coding in the proposed method is identical to that used for inter-frame MC prediction of P-frames in
the H.264 video coding standard. Here, the number of
reference frames is set equal to one.

Motion-compensated (MC) image coding, which takes
advantage of frame-to-frame redundancy to achieve a high
data compression rate, is one of the most popular inter-frame
coding techniques [1]-[2]. For the H.26x family of video coding standards, a motion estimation (ME)/MC coding tool that
is combined with an orthogonal transform (OT)[15]-[26], such
as a discrete cosine transform (DCT), has been introduced.
This tool now plays an important role in the inter-frame coding field [12]-[14], [27][28]. According to the conditional replenishment pixel method and quantization control in the DCT
coefficient domain, the H.26x standards gain considerable
coding efficiency

B. Interpolative Prediction
The residual signal after motion compensation is coded
by an IP method based on 8 × 8 blocks. The encoding matrix
C used in this IP is similar to that presented in Ref. 3 except
for the elements that correspond to four corner pixels of block

Figure1. Proposed coding Scheme

© 2013 ACEEE
DOI: 01.IJSIP.4.1.1137

41

Short Paper
cess, which correspond to IP errors of the MC residuals, are
sequentially input to the feedback quantizer [4]. Accordingly,
coding errors resulting from the power expansion in the inter-block processing, due to having a non-orthogonal system, can be solved [4]-[5].
D. Conditional Pel Replenishment

Figure 2. MC predictive coding with interpolative prediction

in the inter-block processing. Following Ref. 3, we have shown
that non-causal prediction interpolation can be realized as a
“transform coding”, so the interpolation part can be
considered as a “substitute” for an OT such as DCT.
The configuration of MC predictive coding with
interpolative prediction is shown as Figure 2.
In our configuration, the predictive error Yn can be
expressed as (1).

Figure 3. Conditional pel replenishment

As already mentioned, because an OT is not employed in
our method, energy is not concentrated, but is distributed
throughout an entire block. Consequently, determining
whether pixels should be quantized is an issue. For this reason,
we introduced conditional pel replenishment to the scheme.
As shown in Figure 3, the data of a previously decoded
frame
, which are used for motion compensation, can
also be used to perform reference of conditional replenishment pixel control before the current pixel data is quantized.
As a result of pel replenishment, the transmission bandwidth
is constrained.
Specifically, the decoded data of the previous frame are
also processed by IP based on 8 × 8 blocks to obtain a set of
reference values; and these values are compared with the
predefined pixel threshold (PTH). As the reference values are
obtained from the decoded data, this conditional pixel
replenishment can be realized without additional overhead
information.
On the other hand, the differences of IP outputs between
current and previous frame (reference values) are then
compared with the 4 × 4 sub-block threshold (BTH) (obtained
by preliminary experiments). Thus, we determine whether the
pixels should be quantized in the 4 × 4 sub-block of the 8 × 8
block under consideration in the current frame. At this
moment, since the reference values are obtained from the
decoded data, motion vector information (which the decoder
has already gotten) and current 4 × 4 sub-block data (this
information are not transferred to decoder yet), it is necessary
to add the information for each 4 × 4 sub-block to certify
whether it should be quantized or not. As a result, this
conditional pixel replenishment can be realized with 1 bit (ON/
OFF) additional overhead information for every 16 pixels (4 ×
4 sub-block).
A threshold replenishment algorithm for adaptive vector
quantization was first proposed by Fowler [6] in 1998, and
has subsequently been used in various coding technology

(1)
Xn is the input 8×8 block signal plus four corner pixels
and with 68×1 vector form after last order scanning; X’n-1 is
the reference vector in the last reconstruct frame at the exactly same position; f ( . ) means MC function, so f(X’n-1)
is the vector of X’n-1 after MC processing; CI is the 68×68
predictive matrix, which can be expressed as follows:

(2)

(4)

(3)

(5)

(6)

A1, A2 and A3 are 8×8 matrices and shown as (3) ~ (5)
Equations. B is a 4×64 matrix and (6) shows its transpose
matrix, only at (1,1),(2,8),(3,57) and (4,64) position, the value
equal to -1, at other positions the value equal to 0. O stands
for the zero matrices.
Optimal Quantization Scheme
C. Optimal Quantization Scheme
The difference signals output by the interpolative pro
© 2013 ACEEE
DOI: 01.IJSIP.4.1.1137

42

Short Paper
Therefore, after deciding whether current 4×4 sub-block being quantized or not, for example, if #4 sub-blocks do not
need to be quantized, as (f i) of #4 sub-block is the largest
value of all sub-block, and according to the definition of
distortion in feedback quantization system, as shown at (7),
it is necessary to reorder the input signal to make sure nonsignificance sub-block could be quantized as early as possible. In Table I, shows this processing.

[7]-[10]. The proposed conditional pixel replenishment
without codebook used in our method is based on that
algorithm, because quantization is not performed on vectors,
but on each individual pixel in an 8 × 8 block. Furthermore,
the reference values are output of IP, not the distortion
measure between code-vector and quantization input vector
[6]. Therefore, computation of distortion measure for each
block is not used. It means computational time of proposed
method is less than Ref.6. However, the threshold values
here are predefined and must be changed according to each
frame of a video sequence. Improving the threshold selection
process is considered to be an area of future work.
E. Rotation Scanning
In this paper, in order to realize the replenishment of pixels
in the spatial domain, we proposed a new approach to improve
the image quality. Input 8 × 8 block signal is reordered to
adapt this sub-block system.
In feedback quantization system, the power of coding
error can be expressed as (7)
(7)
Here, fi is the feedback coefficient for one 8×8 block and
is the power of quantized error. When 4×4 sub-block
conditionnal replenishment pixel is performed, non-significant
sub-block will not be quantized, as a result, the power of
quantized error in (6) is changed to the power of predict error.
Generally, the power of quantized error is smaller than predict
error. Therefore, if we could reduce the value of fi at the nonsignificant sub-block position, we could suppress the
increase of coding error power as show in (7). According to
Ref.4, the value of fi is defined by three matrices: the predictive
matrix CI; scanning order matrix P, and transform matrix D.
Because OT is not applied in our system, matrix D has been
determined; predictive matrix CI defines the predict error, also
has been determined.
When the scanning order is 4×4×4, as shown in Figure 4,
8×8 block fi value is shown in Figure 5.

Figure 5. 8×8 block fi value
TABLE I. ROTATION PROCESSING FOR ALL SITUATIONS

In Table I, we showed that depending on the insignificant / significant (expressed as 0 and 1 in various positions
of 4×4 sub-block) appearance situation, how the rotation processing should be assigned. L means left rotation operation
(counterclockwise rotation); while R means right rotation
(clockwise); R2 means right rotation twice and “none” means
no rotation operation. However, there are two special circumstances in this Table, which marked with gray background:
0110
0111 and 1001 0111. It means after rotation
operation, the last sub-block, #4, has been set to 1 mandatory on the basis of pre-experiment result.
By this rotation operation, the effect of improving coding
efficiency is shown in Table II.

Figure 4. 4×4×4 scanning order

Each sub-block’s

is shown as follow:

(8)

© 2013 ACEEE
DOI: 01.IJSIP.4.1.1137

43

Short Paper
F. Features of Proposed Scheme
The proposed inter-frame coding scheme has three characteristic features:
• An OT is not used.
• Conditional pel replenishment is performed without
additional overhead information.
• A new hybrid coding framework, MC+IP combined with
feedback quantization, is employed.

errors will be expanded when decoded, exists in the proposed
method. As a result, feedback quantization is necessary.
III. SIMULATION
We now present simulation results obtained by using the
proposed P-frame coding scheme and show a comparison
between our method and the H.264 baseline [29]. To eliminate
the influence of the I-frame, since its decoded image is used
as the first reference frame when performing motion
compensation, the first frame of the test sequence for the
two methods is coded by the H.264 I-frame baseline under
the same parameters values.
The first seven frames two CIF (352 × 288) test video
sequence were served, foreman and bus, obtained from YUV
Video Sequences website [11]. The MC coding parameters
(both methods are the same at this point) are set as follows:

TABLE II. C ODING EFFICIENCY I MPROVEMENT BY THE R OTATION O PERATION

• Search range: 32 pixels.
• Total number of reference frames: 1.
• ME scheme: fast full search.
• PSliceSearch8x8: 1 (used, all other types are not used).
• DisableIntraInInter: 1 (Disable Intra mode for inter slice).
• Rate-distortion-optimized mode decision: used.
Besides these parameters, the threshold values of the
proposed scheme are adapted according to the input frames.
A. Comparison of Prediction Errors
TABLE III. STATISTIC VALUES OF RESIDUAL SIGNALS

A comparison between the prediction errors in the results
is first shown. Through this comparison, we can see the
distribution of the errors obtained by the proposed scheme
and whether spatial correlation exists in the signal after motion
compensation. Table b! lists several statistical values for both
methods that reflect the distribution of their residual signals.
In this table, “PM” stands for “proposed method”;
Entropy is calculated based on Shannon theory and “Average
Error” is the average value of two signal powers. Number of
0’s means how many pixels are accurately predicted.
Figure 6 shows the prediction errors of the first P-frame at
each pixel position when using the first frame of foreman.cif
as a test image. Here, the x-axis expresses the pixel position
and the y-axis expresses the value of prediction error.
The distribution of the prediction errors for the proposed
method is clearly more concentrated around zero than that
for H.264, and the residual signal power is about 37.6% lower
under the proposed method. Therefore, we consider that our
scheme provides improved coding efficiency if an appropriate
quantization method is employed.

An OT is not used in our coding scheme; instead an IP
method is adopted. In fact, an OT does not utilize the relations between pixels but merely transforms the signal from
the spatial to frequency domain. In contrast, an IP method
can compress the signal by eliminating the correlation between pixels within the frame. Generally, the MC prediction
error is independent of time; however, a spatial correlation
still exists. Accordingly, we have replaced the OT by an IP in
our method, because in Ref. 3 we showed that IP can be
achieved as a “transform coding”.
Conversely, non-orthogonally of the IP transform matrix
means the power expansion problem, which means coding
© 2013 ACEEE
DOI: 01.IJSIP.4.1.1137

B. Comparison of Coding Efficiency
Next, we show the coding results of the proposed method
44

Short Paper
and H.264. As stated before, the threshold values in our
method must be changed for each frame, and these thresholds are obtained by a preliminary experiment. Since PTH
and BTH are typically within a certain range, the results presented here are also limited. For this reason, the conditional
pel replenishment used here has the potential for improvement.

0-10 frames for other test sequences) and the vertical axes
express the entropy (bit/pixel; upper plot) and peak signalto-noise ratio (PSNR; dB; lower plot) of each test image. Although the number of bits required by the proposed method
(its entropy) is approximately equal to that of H.264, PSNR
for the proposed method is consistently higher (the average
improvement is about 0.3–2 dB).

Figure 8. Comparison of coding efficiency for bus.cif

Figure 6. Comparison of prediction errors

Figure 9. Comparison of coding efficiency for flower.cif

Figure 7. Comparison of coding effciency for foreman.cif

Figure 7 ~ 10 show comparisons between the methods’
coding efficiency for the test sequences of foreman, bus,
flower and highway, respectively. ñh and ñv means the correlation of test image in the horizontal and vertical direction
respectively.
In these plots, the horizontal axis expresses the frame
number of the coded frames (0–20 frames for foreman.cif and
© 2013 ACEEE
DOI: 01.IJSIP.4.1.1137

45

Short Paper

Figure 10. Comparison of coding efficiency for highway.cif
Figure 13.Coding Efficiency for Adaptive Mode Selection

Table IV shows the coding results when the proposed
method is applied to the bus CIF video sequence. Here, Q(.)
is the average entropy of all pixels in the frame; H(.) expresses
the overhead (motion vectors) entropy.
TABLE IV. C ODING RESULTS O F PROPOSED METHOD

TABLE V. ENTROPY OF MOTION VECTORS

Figure 14.PinP Configuration

to be codes as inter frame mode and the amount of transmission bit is almost the same to the proposed adaptive scheme.
If the coding mode is fixed to inter coding, as shown here,
the improvement of coding efficiency begin at the second Pframe and after coding 2 frames, the efficiency is closer to the
adaptive system eventually. The arrow in this figure expressed
how much improvement can be achieved by proposed
adaptive scheme.
On the other hand, because adaptive intra/inter mode
selection method with overhead information has been applied
in H.264 standard, when using the PinP test sequence shown
in Figure 14, significant decrease in PSNR does not befall.
In addition, in the H.264 scheme, because intra-frame
coding mode can be selected at any time throughout the
entire coding period, the overhead for every macro-block is
about 0.01 bit/pixel more than proposed method. However,
the disadvantage of proposed method is that adaptive mode
selection is not flexible enough. In the future, authors will do
the improvement at this point.

C. Comparison of Motion Vectors
Table V lists the entropy of motion vectors in each frame
for the two methods. Moreover, Figure 11 shows the decoded
frame obtained by the two methods for four test sequences,
where lines denote the motion vectors. The proposed method
clearly contains smaller amount of motion-vector information than H.264. Furthermore, corresponding decoded frames
without motion vectors are shown in Figure 12.
D. Effect of Adaptive coding between intra/inter mode
In Figure 13, shows the simulation results of the adaptive
coding between inter / intra coding mode. Insertion of the
PinP (Picture in Picture configuration, as shown in Figure 14)
is set to start at the first frame and completed at the eighth
frame. In Figure 13, PSNR of “inter mode only” means the
value of PSNR when all frames from the second one are forced
© 2013 ACEEE
DOI: 01.IJSIP.4.1.1137

CONCLUSIONS
In this paper, we proposed a new inter-frame coding
scheme in which the OT (e.g., a DCT) used in conventional
hybrid coding schemes is replaced by a non-causal IP. Application of this IP can potentially reduce the amount of signal
46

Short Paper

Figure 11. Comparison of motion vectors for the decoded frame
obtained by (left) H.264 and (right) the proposed scheme

Figure 12. Comparison of decoded frame obtained by (left) H.264
and (right) the proposed scheme

power a priori. Since IPs make use of the spatial correlations
between pixels, we consider them more effective than OTs
(which merely perform domain transforms) for residual data
that has been compressed after MC inter-frame prediction.
However, combining our method with an OT is also of great
research interest, as shown in Section II of this paper. As a
result, we have also introduced conditional pel replenishment to our scheme. Moreover, no additional overhead information is added by employing this method. Our model thus
has the three characteristic features shown in Section IIF. A
comparison between the simulation results of the proposed
method and H.264 in Section III, showed that when using
four test sequences a, the proposed scheme achieves an
approximate 0.3–2 dB improvement for an entropy similar to
that of the H.264 baseline level.

As future work, the conditional pel replenishment method
utilized in our scheme must be improved, and this should be
addressed first. Other areas that should also be explored are
whether the proposed scheme can maintain high coding
efficiency if the test sequence becomes large.
In conclusion, we have introduced a different approach
for P-frame hybrid coding that utilizes the spatial correlation
of the MC residual signal. Since our hybrid video coding
method achieved high coding efficiency without employing
an OT, we have shown the feasibility of non-orthogonal
transforms for effective coding.

© 2013 ACEEE
DOI: 01.IJSIP.4.1.1137

ACKNOWLEDGMENT
The authors wish to thank all researchers from Picture
Coding Symposium of Japan (PCSJ). They gave us a lot of
47

Short Paper
constructive suggestions. This work was supported in part
by a grant from KAKENHI (23560436).

[14] Ming Liou, “Overview of the px64 kbps video coding standard,”
Communications of ACM, vol. 34, no. 4, pp. 60-63, 1991.
[15] N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete Cosine
Transform,” IEEE Transactions on Communications, vol.
COM-23, pp. 90-93, Jan. 1974.
[16] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies,
“Image coding using wavelet trans-form, “ IEEE Transactions
on Image Processing, vol. 1, no. 2, pp. 205-221, Apr. 1992.
[17] Ze-Nian Li and Mark S. Drew, Fundamentals of Multimedia,
Pearson Education, New Delhi, India, 2004.
[18] Shunsuke MORI, Akira KUBOTA, Yoshinori HATORI,
“Examination of Hybrid Coding Method by Interpolative
Prediction and DCT Quantization”, IEVC 2010, 2C-3, Nice,
France, (Mar, 2010)
[19] C. E. Shannon, “A Mathematical theory of Communication,”
Bell System Technical Journal, vol. 27, pp. 623-656, 1948.
[20] Kelth Jack, Video Demystified, Mumbai: Penram International
Publishing Pvt. Ltd., 2001.
[21] W. K. Pratt, “Karhumen-Loeve transform coding of images,
“presented at the 1970 IEEE Int. Symp. Information Theory,
June 1970.
[22] W. K. Pratt, J.Kane and H.C. Andrews, “Hadamard transform
image coding , “Proc. IEEE, vol. 57, pp. 58-68, Jan. 1969.
[23] MACHIZAWA Akihiko, Tanaka, Mamoru, “Image data
compression method based on interpolative DPCM by area
decomposition,” IEICE Transctions, J69-Dÿ3ÿpp.375-382
(1986).
[24] MACHIZAWA Akihiko, “An Analysis and a Reduction
Method of Coding Error in Interpolative DPCM,” IEICE
Transctions, J75-D-IIÿ9ÿpp.1565-1572 (1992).
[25] Takahiro fujita, Akira KUBOTA, Yoshinori HATORI,
“Adaptive Quantization Ordering in Feedback Quantization
for Interpolative Prediction Coding,” ITE Journal, 63,11, pp.
1652–1658 (2009).
[26] Anshul Sehgal, Ashish Jagmohan and Narendra Ahuja, “WynerZiv Coding of Video: An Error-Resilient Compression
Framework,”IEEE Transactions on multimedia, vol.6, NO.2,
April, 2004
[27] Jae-Young Pyun, “Adaptive Video Redundancy Coding for
Scene and Channel Adaptation over Error-Prone
Network,”IEEE Transactions on Consumer Electronics, Vol.
51, NO.3, August, 2005
[28] Mihaela v.d. Schaar-Mitrea, Peter H.N. de With, “Hybrid
Compression of video with graphics in DTV communication
systems,” IEEE Transactions on Consumer Electronics, Vol.
46, No.4, November, 2000.
[29] H.264/14496-10 AVC REFERENCE SOFTWARE MANUAL,
Jan,2009

REFERENCES
[1] A.N.Netravali and J.D.Rabbins, “motion compensated
television coding: part ²,” Bell Syst. Techn.J., vol.58, no.3,
pp.631-670, Mar. 1979.
[2] N.S.Jayant and P.Noll, Digital Coding of Waveforms. Englewood
Cliffs, NJ: Prentice Hall, 1984
[3] Cui Wang, Akira KUBOTA, Yoshinori Hatori “A NEW
HYBRID PARALLEL INTRA CODING METHOD BASED
ON INTERPOLATIVE PREDICTION,” Picture Coding
Symposium (PCS), 2010
[4] Yoshinori HATORI, “Optimal Quantizing Scheme in
Interpolative Prediction,” The Journal of the Institute of
Electronics, Information and Communication Engineers,
vol.J66-B No.5,Tokyo,1983.
[5] Cui Wang, Yoshinori Hatori “A parallel hybrid video coding
method based on noncausal prediction with multimode,”
ISVC’11 Proceedings of the 7th international conference on
Advances in visual computing – Volume Part II.
[6] J. E. Fowler, “Generalized threshold replenishment: An adaptive
vector quantization algorithm for the coding of nonstationary
sources,” IEEE Trans. Image Processing, vol. 7, pp. 1410–
1424, Oct. 1998.
[7] de Lima Filho, E.B., da Silva, E.A.B., de Carvalho, M.B., Pinage,
F.S., “Universal Image Compression Using Multiscale
Recurrent Patterns With Adaptive Probability Model”, Image
Processing, IEEE Transactions on, On page(s): 512 - 527,
Volume: 17 Issue: 4, April 2008
[8] Shaou-Gang Miaou, Heng-Lin Yen, “Multichannel ECG
compression using multichannel adaptive vector quantization”,
Biomedical Engineering, IEEE Transactions on, On page(s):
1203 - 1207, Volume: 48 Issue: 10, Oct. 2001
[9] Fowler, J.E., “Adaptive vector quantization for efficient
zerotree-based coding of video with nonstationary statistics”,
Circuits and Systems for Video Technology, IEEE Transactions
on, On page(s): 1478 - 1488, Volume: 10 Issue: 8, Dec 2000
[10] Guobin Shen, Bing Zeng, Liou, M.-L., “Adaptive vector
quantization with codebook updating based on locality and
history”, Image Processing, IEEE Transactions on, On page(s):
283 - 295, Volume: 12 Issue: 3, March 2003
[11] A. K. Jain, “Image Coding via a Nearest Neighbors Image
Model,” IEEE Transactions on Communications, vol. COM23, pp. 318-331, Mar. 1975
[12] Gregory K. Wallace, “The JPEG still picture compression
standard,” Communications of ACM, vol. 34, no. 4, pp. 3144, 1991.
[13] Didier Le Gall, “MPEG: A video compression standard for
multimedia applications,” Communications of ACM, vol. 34,
no. 4, pp. 47-58, 1991.

This research was supported by KAKENHI (23560436)ÿ
Corresponding author: Cui Wang,
Interdisciplinary Graduate School of Science and Engineering, Tokyo
Institute of Technology, Tokyo, Japan
wang.c.ad@m.titech.ac.jp

© 2013 ACEEE
DOI: 01.IJSIP.4.1.1137

48

Non-Causal Video Encoding Method of P-Frame

More Related Content

What's hot (16)

Similar to Non-Causal Video Encoding Method of P-Frame (20)

More from IDES Editor (20)

Recently uploaded (20)

Non-Causal Video Encoding Method of P-Frame