Perceptual Video Coding

Perceptual Video Coding
Research Progress

Dr. Li Song
Associate Professor, SJTU
Visiting Associate Professor, SCU
2012.09

Outline

 Introduction
 Perceptual Cues in Video Coding
 Recent Research
 JND based RDO
 SSIM based RDO
 Analysis-Completion Framework
 Summary & References

Perceptual Lossless Images

PIC: 0.914 bits/pixel! Original!

[T. Pappas, Visual Signal Analysis and Compression, ICIP 2010]

Perceptual Video Coding Technique

(Digital) Video D

Codec(Encoder + Decoder)
R
Human Visual System (HVS)
(end recipient)
Dimensions of coder
performance
Basic Principle in Perceptual coding technique
- consider all the data that humans cannot perceive as
superfluous data, and discard them.

Rate-Distortion Theory
^
x Q x

Quantization noise: ˆ
e X X
N
D   pi ( xi  xi ) 2
ˆ
i 1

probabilities

If X is Gaussian distribution N(0,σ2):

D  2 2 2 R

Gap between theory and real codec
SPIHT can beat Shannon bound!

Gaussian prior
is not valid for
image!

Rate-distortion curves achieved with the SPIHT coder(dash line) and with the
Shannon RD theoretical bounds(solid line) corresponding to an i.i.d. zero-
mean Gaussian model for each wavelet sub bands (Gaussian vector source)
[A. Ortega, etc, IEEE Signal Processing Magazine, 1998]

HEVC: MSE vs MOS

Random
Low Delay
Access
Class A −36.9%
Class B −39.4% −40.3%
Class C −30.1% −31.5%
Class D −28.3% −29.2%
Class E −41.2%
Class F −26.2% −28.8%
Average −32.5% −34.2%
Average
−34.0% −35.5%
without F
[from:JCTVC-I0409, 2012] [from: JCT-VC Summary, 8th JCT-VC]

There is >20% gap between MSE and MOS!

Ideal perceptual metric

Half century’s endeavor and still open problem!
Many metrics proposed: SSIM/M-SSIM/CW-SSIM, VIF, VQM,…

[Figure from :N. Jayant, Proceedings of the IEEE ,1993]

What about Popular SSIM?

[JCTVC-H0063,2012]

Where do we use perceptual model currently?

[Pourazad, IEEE Consumer Electronics Magazine, 2012]

Frequency Masking for JPEG
The DCT-based encoder incorporated with human
visual frequency weighting(L.W Chang,2001 )

Modulation Transfer
Function(MTF)
or Quantization
Matrix(QM)

we can do better with fine
adjustment factor!

HEVC QM Design
 HEVC default quantization matrix
 Intra 8x8 QM: Uses the same QM developed for JPEG in 1999.

 Intra 4x4 QM: Sub-sampled from 8x8 Intra QM
 Intra 16x16 QM and Intra 32x32 QM: Up-sampled from 8x8 Intra QM
 Inter QM’s : Predicted from Intra QM’s, using the linear relationship between
the Intra QM’s and the corresponding inter QM’s in AVC/H.264

[JCT-VC I012]&[L.W. Chang 2001]

Local Spatial-temporal contrast sensitivity of
luminance perception

JND in the classic DCT domain
TJND  n, i, j   Tbasic  n, i, j   Flum  n   Fcontrast  n, i, j   Ftemporal  n, i, j 

The basic threshold
Spatial frequency Tbasic
The luminance adaptation factor
Luminance sensitivity Flum
The contrast masking factor
Plane, edge, texture, etc Fcontrast
The temporal modulation factor
Motion, frame rate, etc Ftemporal

[Zhenyu Wei,etc, IEEE T-CSVT, 2009]

Different Embedded Schemes

[X. Yang, TCSVT, 2005]

[Our, ISCAS 2010]&
[TCSVT (accept)]

[Z. Chen, TCSVT ,2010] &
[M. Naccari,TCSVT, 2011]

The proposed Coding Framework
Adjustment Threshold
Calculation
JND Calculation and
Translation
Adaptive Entropy
Input T Q Output
Suppression Coding

Q-1

T-1

Intra or Inter
Prediction
Frame
Buffer
Lagrange Multiplier D= D1(Q)+D2(JND)
Adaptation
Motion Vector
Scaling

Bit Saving
Bitrate Reduction Against
Bitrate (kbps)
Sequence Preset QP JM 14.2 (%)
JM 14.2 Chen’s Proposed Chen’s Proposed
20 7945.83 6889.50 5149.85 13.29 35.19
24 3165.17 2660.42 2436.40 15.95 23.02
Cyclists
28 1343.73 1103.82 1138.30 17.85 15.29
32 658.92 543.16 612.40 17.57 7.06
20 25104.43 23734.86 15822.41 5.46 36.97
24 13496.66 12290.08 8843.39 8.94 34.48
Harbour
28 6054.17 5336.50 4557.15 11.85 24.73
32 2909.30 2607.64 2588.25 10.37 11.04
20 20306.64 18749.84 11330.19 7.67 44.20
24 9688.57 8714.15 6239.72 10.06 35.60
Night
28 4507.60 4036.23 3430.19 10.46 23.90
32 2311.90 2088.36 2050.42 9.67 11.31

Bit Saving
Bitrate Reduction Against
Bitrate (kbps)
Sequence Preset QP JM 14.2 (%)
JM 14.2 Chen’s Proposed Chen’s Proposed
20 7135.21 6568.93 4147.18 7.94 41.88
24 3193.59 2850.05 2201.83 10.76 31.05
Raven
28 1537.32 1346.20 1189.10 12.43 22.65
32 803.07 705.19 710.89 12.19 11.48
20 13951.79 12986.99 7317.07 6.92 47.55
24 6472.74 5838.45 3739.43 9.80 42.23
Sheriff
28 2665.81 2361.96 1817.07 11.40 31.84
32 1159.36 1032.24 963.12 10.96 16.93
20 25071.25 21394.72 11108.62 14.66 55.69
24 7878.49 5930.58 4548.43 24.72 42.27
SpinCalendar
28 2653.01 2194.53 2046.35 17.28 22.87
32 1315.22 1129.24 1177.62 14.14 10.46
Average 12.18 28.32

Frame Differences

JM 14.2: QP=20 88th Frame

Frame Differences

Our: QP=20 88th Frame

Frame Differences

Differences: QP=20 88th
Frame

Frame Differences

JM 14.2: QP=20 102nd Frame

Frame Differences

Our: QP=20 102nd Frame

Frame Differences

Frame Differences: QP=20
102nd Frame

SSIM motivated Perceptual Coding
 Yi-Hsin Huang, etc,. "Perceptual Rate-Distortion
Optimization Using Structural Similarity Index as
Quality Metric“, IEEE T-CSVT, vol. 20, no. 11,
pp. 1614-1624, Nov., 2010.
 Replace PNSR with SSIM
 Empirically estimating Rate-SSIM model
 Reuse classical Lagrange multiplier method for
mode selection and motion estimation

Improved SSIM Perceptual Coding
 Shiqi Wang, etc., “SSIM-Motivated Rate-
Distortion Optimization for Video Coding”, IEEE
T-CSVT, Vol.22, no. 4, pp.516-529, April, 2012.
 They try to get the analytical model for the
Rate-SSIM relationship
 ChuoHao Yeo, etc., “On Rate Distortion Optimization using
SSIM”, ICASSP 2012.
 Abdul Rehman ,etc., “SSIM-Inspired Perceptual Video
Coding for HEVC”, ICME 2012.
 Xi Wang, etc., “Motion Based Perceptual Distortion and
Rate Optimization for video Coding”, ICEM 2012

Basic Analysis-Completion Structure

[P. Ndjiki-Nya, Signal Processing: Image Communication, 2012]

Abstract+Detail Framework
Key Frame (Abstract+Detail) [Z. Yuan, H. Xiong and
Li Song, ICASSP 2009]

Abstract Only(NonKey Frame) Use ME to find matching
Use Bilateral Filtering to block to recover details
remove details

Super-resolution Framework

Encoder

 Symmetric coding complexity
 5~10% bit saving at same quality

Decoder

[Q. Zhou, and Li Song, IEEE PCM 2010]

Personal Respective
 Can we do much better than HEVC?
 Yes, new generation video coding probably will
need more perceptual related techniques.
 Some preliminary works
 “On Just Noticeable Distortion Quantization in the HEVC
Codec”, JCTVC-H0477, Feb.2012
 Claim 3%~25% bitrate saving at same quality.
 “A joint JND model based on luminance and frequency
masking for HEVC”, JCTVC-I0163, May.2012
 Claim 3%~30% bitrate saving at same quality.

Personal Respective
 Future research
 Advanced computational HVS model
– Suprathreshold vs suberthreshold
– Other masking model, like attention
 Exploiting new Distortion Metric
– Image statistical properties
– Learning from large-scale datasets
 Generic R-D Optimization
– R-D relationship and RDO for video coding.

References
 Important papers
 J. L. Mannnos and D. J. Sakrison, “The Effects of a Visual Fidelity Criterion
on the Encoding of Images”, IEEE Trans. On Information Theory, Vol.20,
No.4, July 1974.(Cited by 776)
 N. Jayant, J. Johnston and R. Safranek, “Signal Compression Based on
Models of Human Perception”, Proceedings of the IEEE, Vol. 81, No.10, Oct.,
1993 (Cited by 761)
 A Ortega, K Ramchandran, Rate-distortion methods for image and video
compression, IEEE Signal Processing Magazine, Vol.15 (6), 23-50, 1998(Cited
by 597)
 W. Zhou, A.C. Bovik, "Mean Squared Error: love it or leave it? A new look at
Signal Fidelity Measures", IEEE Signal Processing Magazine , Vol.26(1):98-117,
Jan. 2009. (Cited by 353)
 Ching Yang Wang, Shiuh Ming Lee, Long-Wen Chang, “Designing JPEG
quantization tables based on human visual system”, Sig. Proc.: Image Comm.
16(5): 501-506, 2001.
 Wenjun Zeng, Scott Daly, Shawmin Lei, “An Overview of the Visual
Optimization Tools in JPEG 2000”, Sig. Proc.: Image Comm. 17: 85-104, 2002.

References
 JND related
 X. Yang, W. Lin, Z. Lu, E. Ong and S. Yao, “Motion-compensated Residue
Pre-processing in Video Coding Based on Just-noticeable-distortion
Profile”, IEEE Trans. Circuits and Systems for Video Technology,
vol.15(6), pp.742-750, June, 2005.
 Z. Chen and C. Guillemot, "Perceptually-friendly H.264/AVC video coding
based on foveated Just-Noticeable-Distortion model," IEEE Trans. Circuits
Syst. Video Technol., vol. 20, no. 6, pp. 806-819, June 2010.
 M. Naccari and F. Pereira, "Advanced H.264/AVC based perceptual video
coding: architecture, tools and assessment", IEEE Transactions on
Circuits and Systems for Video Technology, vol. 21, no. 6, pp. 766-782,
June 2011.
 M. Naccari and M. Mrak, “On Just Noticeable Distortion Quantization in
the HEVC codec”, JCTVC-H0477, JCTVT 8th Meeting, San Jose, Feb.,
2012
 Z. Luo, Li Song, S. Zheng,"Improving H.264/AVC Video Coding with
Adaptive Coefficient Suppression",IEEE International Symposium on
Circuits and Systems (ISCAS 2010), May.30-June.2, 2010, France.

References
 SSIM or Other Metrics as Distortion:
Yi-Hsin Huang, Tao-Sheng Ou, Po-Yen Su, Chen, H.H. "Perceptual Rate-
Distortion Optimization Using Structural Similarity Index as Quality Metric“,
IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, no. 11,
pp. 1614-1624, Nov., 2010.
Yi-Hsin Huang, Tao-Sheng Ou, Po-Yen Su, Chen, H.H. “SSIM-Based
Perceptual Rate Control for Video Coding”, IEEE Transactions on Circuits and
Systems for Video Technology, Vol.21, No.5, pp.682-691, May, 2012.
Shiqi Wang, Rehman, A, Zhou Wang, Siwei Ma and Wen Gao, “SSIM-Motivated
Rate-Distortion Optimization for Video Coding”, IEEE Transactions on Circuits
and Systems for Video Technology, Vol.22, no. 4, pp.516-529, April, 2012
Yeo chuoHao, Tan Huili, Tan Yihhan, “On Rate Distortion Optimization using
SSIM”, 2012 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), March 2012.
Abdul Rehman and Zhou Wang, “SSIM-Inspired Perceptual Video Coding for
HEVC”, IEEE International Conference on Multimedia and Expo, June 2012.
Xi Wang, Li Su, Qingming Huang, Chunxi Liu, Ling-yu Duan, “Motion Based
Perceptual Distortion and Rate Optimization for video Coding”, IEEE
International Conference on Multimedia and Expo, 2012.

References
 Analysis-Completion Framework:
Minmin Shen, Ping Xue and Ci Wang, “Down-Sampling Based Video Coding
Using Super-Resolution Technique”, IEEE Transaction On Circuits and
Systems for Video Technology, VOL. 21, NO. 6, pp.755-765, June, 2011
P. Ndjiki-Nya, D. Doshkov, H. Kaprykowsky, F. Zhang, D. Bull, T. Wiegand,
"Perception-oriented video coding based on image analysis and completion: A
review", Signal Processing: Image Communication 27 (2012) 579–594.
F.Zhang,D.R.Bull,Aparametricframeworkforvideocompression using region-
basedtexturemodels,IEEE Journal of Selected Topics in Signal Processing
Vol.5(7):1378–1392,2011.
Q. Zhou, Li Song, W. Zhang, “Video Coding With Key Frames Guided Super
Resolution”, IEEE Pacific-Rim Conference on Multimedia (PCM 2010),
September 21-24, Shanghai, China.
Z Yuan, H. Xiong, Li Song, “Generic Video Coding With Abstraction And
Detail Completion”, IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP 2009), April 19-24,2009, Taipei, Taiwan.

Perceptual Video Coding

More Related Content

What's hot (19)

Viewers also liked (12)

Similar to Perceptual Video Coding (20)

More from Shanghai Jiao Tong University(上海交通大学) (6)

Recently uploaded (20)

Perceptual Video Coding