SlideShare a Scribd company logo
Perceptual Video Coding
   Research Progress

          Dr. Li Song
   Associate Professor, SJTU
Visiting Associate Professor, SCU
            2012.09
Outline

 Introduction
   Perceptual Cues in Video Coding
 Recent Research
   JND based RDO
   SSIM based RDO
   Analysis-Completion Framework
 Summary & References
Perceptual Lossless Images




PIC: 0.914 bits/pixel!                                 Original!


     [T. Pappas, Visual Signal Analysis and Compression, ICIP 2010]
Perceptual Video Coding Technique

        (Digital) Video                              D


  Codec(Encoder + Decoder)
                                         R
 Human Visual System (HVS)
      (end recipient)
                                      Dimensions of coder
                                         performance
Basic Principle in Perceptual coding technique
       - consider all the data that humans cannot perceive as
superfluous data, and discard them.
Rate-Distortion Theory
                              ^
          x            Q      x


Quantization noise:              ˆ
                           e X X
                 N
        D   pi ( xi  xi ) 2
                        ˆ
                i 1


              probabilities

  If X is Gaussian distribution N(0,σ2):

                       D  2     2   2 R
Gap between theory and real codec
                SPIHT can beat Shannon bound!



                                                          Gaussian prior
                                                          is not valid for
                                                          image!




Rate-distortion curves achieved with the SPIHT coder(dash line) and with the
Shannon RD theoretical bounds(solid line) corresponding to an i.i.d. zero-
mean Gaussian model for each wavelet sub bands (Gaussian vector source)
       [A. Ortega, etc, IEEE Signal Processing Magazine, 1998]
HEVC: MSE vs MOS

            Random
                    Low Delay
             Access
Class A     −36.9%
Class B     −39.4%   −40.3%
Class C     −30.1%   −31.5%
Class D     −28.3%   −29.2%
Class E              −41.2%
Class F     −26.2%   −28.8%
Average     −32.5%   −34.2%
Average
            −34.0%   −35.5%
without F
   [from:JCTVC-I0409, 2012]      [from: JCT-VC Summary, 8th JCT-VC]


            There is >20% gap between MSE and MOS!
Ideal perceptual metric




 Half century’s endeavor and still open problem!
Many metrics proposed: SSIM/M-SSIM/CW-SSIM, VIF, VQM,…

      [Figure from :N. Jayant, Proceedings of the IEEE ,1993]
What about Popular SSIM?




                 [JCTVC-H0063,2012]
Outline

 Introduction
   Perceptual Cues in Video Coding
 Recent Research
   JND based RDO
   SSIM based RDO
   Analysis-Completion Framework
 Summary & References
Where do we use perceptual model currently?




     [Pourazad, IEEE Consumer Electronics Magazine, 2012]
Frequency Masking for JPEG
   The DCT-based encoder incorporated with human
     visual frequency weighting(L.W Chang,2001 )




Modulation Transfer
  Function(MTF)
 or Quantization
   Matrix(QM)



        we can do better with fine
           adjustment factor!
HEVC QM Design
 HEVC default quantization matrix
 Intra 8x8 QM: Uses the same QM developed for JPEG in 1999.




 Intra 4x4 QM: Sub-sampled from 8x8 Intra QM
 Intra 16x16 QM and Intra 32x32 QM: Up-sampled from 8x8 Intra QM
 Inter QM’s : Predicted from Intra QM’s, using the linear relationship between
  the Intra QM’s and the corresponding inter QM’s in AVC/H.264


                                               [JCT-VC I012]&[L.W. Chang 2001]
Local Spatial-temporal contrast sensitivity of
           luminance perception
JND in the classic DCT domain
TJND  n, i, j   Tbasic  n, i, j   Flum  n   Fcontrast  n, i, j   Ftemporal  n, i, j 

The basic threshold
      Spatial frequency                               Tbasic
The luminance adaptation factor
      Luminance sensitivity                           Flum
The contrast masking factor
      Plane, edge, texture, etc                       Fcontrast
The temporal modulation factor
      Motion, frame rate, etc                         Ftemporal

                      [Zhenyu Wei,etc, IEEE T-CSVT, 2009]
Different Embedded Schemes

             [X. Yang, TCSVT, 2005]



             [Our, ISCAS 2010]&
             [TCSVT (accept)]


             [Z. Chen, TCSVT ,2010] &
             [M. Naccari,TCSVT, 2011]
The proposed Coding Framework
        Adjustment Threshold
            Calculation
        JND Calculation and
           Translation
                                            Adaptive              Entropy
Input                  T          Q                                         Output
                                           Suppression            Coding

                                                          Q-1


                                                          T-1




                              Intra or Inter
                               Prediction
                                                         Frame
                                                         Buffer
                        Lagrange Multiplier                          D= D1(Q)+D2(JND)
                            Adaptation
           Motion Vector
             Scaling
Bit Saving
                                                              Bitrate Reduction Against
                                  Bitrate (kbps)
Sequence   Preset QP                                                 JM 14.2 (%)
                       JM 14.2       Chen’s        Proposed    Chen’s        Proposed
               20      7945.83      6889.50        5149.85      13.29          35.19
               24      3165.17      2660.42        2436.40      15.95          23.02
Cyclists
               28      1343.73       1103.82       1138.30      17.85          15.29
               32       658.92       543.16         612.40      17.57           7.06
               20      25104.43     23734.86       15822.41     5.46           36.97
               24      13496.66     12290.08       8843.39      8.94           34.48
Harbour
               28      6054.17      5336.50        4557.15      11.85          24.73
               32      2909.30      2607.64        2588.25      10.37          11.04
               20      20306.64     18749.84       11330.19     7.67           44.20
               24      9688.57      8714.15        6239.72      10.06          35.60
 Night
               28      4507.60      4036.23        3430.19      10.46          23.90
               32      2311.90      2088.36        2050.42      9.67           11.31
Bit Saving
                                                                  Bitrate Reduction Against
                                      Bitrate (kbps)
 Sequence      Preset QP                                                 JM 14.2 (%)
                           JM 14.2       Chen’s        Proposed    Chen’s        Proposed
                   20      7135.21      6568.93        4147.18      7.94           41.88
                   24      3193.59      2850.05        2201.83      10.76          31.05
   Raven
                   28      1537.32      1346.20        1189.10      12.43          22.65
                   32       803.07       705.19         710.89      12.19          11.48
                   20      13951.79     12986.99       7317.07      6.92           47.55
                   24      6472.74      5838.45        3739.43      9.80           42.23
   Sheriff
                   28      2665.81      2361.96        1817.07      11.40          31.84
                   32      1159.36      1032.24         963.12      10.96          16.93
                   20      25071.25     21394.72       11108.62     14.66          55.69
                   24      7878.49      5930.58        4548.43      24.72          42.27
SpinCalendar
                   28      2653.01      2194.53        2046.35      17.28          22.87
                   32      1315.22       1129.24       1177.62      14.14          10.46
  Average                                                           12.18          28.32
Frame Differences




JM 14.2: QP=20 88th Frame
Frame Differences




Our: QP=20 88th Frame
Frame Differences




Differences: QP=20 88th
         Frame
Frame Differences




JM 14.2: QP=20 102nd Frame
Frame Differences




Our: QP=20 102nd Frame
Frame Differences




Frame Differences: QP=20
     102nd Frame
SSIM motivated Perceptual Coding
 Yi-Hsin Huang, etc,. "Perceptual Rate-Distortion
  Optimization Using Structural Similarity Index as
  Quality Metric“, IEEE T-CSVT, vol. 20, no. 11,
  pp. 1614-1624, Nov., 2010.
     Replace PNSR with SSIM
     Empirically estimating Rate-SSIM model
     Reuse classical Lagrange multiplier method for
      mode selection and motion estimation
Improved SSIM Perceptual Coding
 Shiqi Wang, etc., “SSIM-Motivated Rate-
   Distortion Optimization for Video Coding”, IEEE
   T-CSVT, Vol.22, no. 4, pp.516-529, April, 2012.
     They try to get the analytical model for the
      Rate-SSIM relationship
 ChuoHao Yeo, etc., “On Rate Distortion Optimization using
   SSIM”, ICASSP 2012.
 Abdul Rehman ,etc., “SSIM-Inspired Perceptual Video
   Coding for HEVC”, ICME 2012.
 Xi Wang, etc., “Motion Based Perceptual Distortion and
   Rate Optimization for video Coding”, ICEM 2012
Basic Analysis-Completion Structure




  [P. Ndjiki-Nya, Signal Processing: Image Communication, 2012]
Abstract+Detail Framework
     Key Frame (Abstract+Detail)           [Z. Yuan, H. Xiong and
                                           Li Song, ICASSP 2009]




Abstract Only(NonKey Frame)        Use ME to find matching
Use Bilateral Filtering to         block to recover details
remove details
Super-resolution Framework



                        Encoder



            Symmetric coding complexity
            5~10% bit saving at same quality




                            Decoder



         [Q. Zhou, and Li Song, IEEE PCM 2010]
Outline

 Introduction
   Perceptual Cues in Video Coding
 Recent Research
   JND based RDO
   SSIM based RDO
   Analysis-Completion Framework
 Summary & References
Personal Respective
 Can we do much better than HEVC?
   Yes, new generation video coding probably will
      need more perceptual related techniques.
 Some preliminary works
      “On Just Noticeable Distortion Quantization in the HEVC
      Codec”, JCTVC-H0477, Feb.2012
        Claim 3%~25% bitrate saving at same quality.
   “A joint JND model based on luminance and frequency
    masking for HEVC”, JCTVC-I0163, May.2012
        Claim 3%~30% bitrate saving at same quality.
Personal Respective
 Future research
   Advanced computational HVS model
   – Suprathreshold vs suberthreshold
   – Other masking model, like attention
   Exploiting new Distortion Metric
    – Image statistical properties
    – Learning from large-scale datasets
   Generic R-D Optimization
    – R-D relationship and RDO for video coding.
References
 Important papers
     J. L. Mannnos and D. J. Sakrison, “The Effects of a Visual Fidelity Criterion
      on the Encoding of Images”, IEEE Trans. On Information Theory, Vol.20,
      No.4, July 1974.(Cited by 776)
     N. Jayant, J. Johnston and R. Safranek, “Signal Compression Based on
      Models of Human Perception”, Proceedings of the IEEE, Vol. 81, No.10, Oct.,
      1993 (Cited by 761)
     A Ortega, K Ramchandran, Rate-distortion methods for image and video
      compression, IEEE Signal Processing Magazine, Vol.15 (6), 23-50, 1998(Cited
      by 597)
     W. Zhou, A.C. Bovik, "Mean Squared Error: love it or leave it? A new look at
      Signal Fidelity Measures", IEEE Signal Processing Magazine , Vol.26(1):98-117,
      Jan. 2009. (Cited by 353)
     Ching Yang Wang, Shiuh Ming Lee, Long-Wen Chang, “Designing JPEG
      quantization tables based on human visual system”, Sig. Proc.: Image Comm.
      16(5): 501-506, 2001.
     Wenjun Zeng, Scott Daly, Shawmin Lei, “An Overview of the Visual
      Optimization Tools in JPEG 2000”, Sig. Proc.: Image Comm. 17: 85-104, 2002.
References
 JND related
    X. Yang, W. Lin, Z. Lu, E. Ong and S. Yao, “Motion-compensated Residue
     Pre-processing in Video Coding Based on Just-noticeable-distortion
     Profile”, IEEE Trans. Circuits and Systems for Video Technology,
     vol.15(6), pp.742-750, June, 2005.
    Z. Chen and C. Guillemot, "Perceptually-friendly H.264/AVC video coding
     based on foveated Just-Noticeable-Distortion model," IEEE Trans. Circuits
     Syst. Video Technol., vol. 20, no. 6, pp. 806-819, June 2010.
    M. Naccari and F. Pereira, "Advanced H.264/AVC based perceptual video
     coding: architecture, tools and assessment", IEEE Transactions on
     Circuits and Systems for Video Technology, vol. 21, no. 6, pp. 766-782,
     June 2011.
    M. Naccari and M. Mrak, “On Just Noticeable Distortion Quantization in
     the HEVC codec”, JCTVC-H0477, JCTVT 8th Meeting, San Jose, Feb.,
     2012
    Z. Luo, Li Song, S. Zheng,"Improving H.264/AVC Video Coding with
     Adaptive Coefficient Suppression",IEEE International Symposium on
     Circuits and Systems (ISCAS 2010), May.30-June.2, 2010, France.
References
   SSIM or Other Metrics as Distortion:
    Yi-Hsin Huang, Tao-Sheng Ou, Po-Yen Su, Chen, H.H. "Perceptual Rate-
     Distortion Optimization Using Structural Similarity Index as Quality Metric“,
     IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, no. 11,
     pp. 1614-1624, Nov., 2010.
    Yi-Hsin Huang, Tao-Sheng Ou, Po-Yen Su, Chen, H.H. “SSIM-Based
     Perceptual Rate Control for Video Coding”, IEEE Transactions on Circuits and
     Systems for Video Technology, Vol.21, No.5, pp.682-691, May, 2012.
    Shiqi Wang, Rehman, A, Zhou Wang, Siwei Ma and Wen Gao, “SSIM-Motivated
     Rate-Distortion Optimization for Video Coding”, IEEE Transactions on Circuits
     and Systems for Video Technology, Vol.22, no. 4, pp.516-529, April, 2012
    Yeo chuoHao, Tan Huili, Tan Yihhan, “On Rate Distortion Optimization using
     SSIM”, 2012 IEEE International Conference on Acoustics, Speech and Signal
     Processing (ICASSP), March 2012.
    Abdul Rehman and Zhou Wang, “SSIM-Inspired Perceptual Video Coding for
     HEVC”, IEEE International Conference on Multimedia and Expo, June 2012.
    Xi Wang, Li Su, Qingming Huang, Chunxi Liu, Ling-yu Duan, “Motion Based
     Perceptual Distortion and Rate Optimization for video Coding”, IEEE
     International Conference on Multimedia and Expo, 2012.
References
 Analysis-Completion Framework:
   Minmin Shen, Ping Xue and Ci Wang, “Down-Sampling Based Video Coding
   Using Super-Resolution Technique”, IEEE Transaction On Circuits and
   Systems for Video Technology, VOL. 21, NO. 6, pp.755-765, June, 2011
   P. Ndjiki-Nya, D. Doshkov, H. Kaprykowsky, F. Zhang, D. Bull, T. Wiegand,
   "Perception-oriented video coding based on image analysis and completion: A
   review", Signal Processing: Image Communication 27 (2012) 579–594.
   F.Zhang,D.R.Bull,Aparametricframeworkforvideocompression using region-
   basedtexturemodels,IEEE Journal of Selected Topics in Signal Processing
   Vol.5(7):1378–1392,2011.
   Q. Zhou, Li Song, W. Zhang, “Video Coding With Key Frames Guided Super
   Resolution”, IEEE Pacific-Rim Conference on Multimedia (PCM 2010),
   September 21-24, Shanghai, China.
   Z Yuan, H. Xiong, Li Song, “Generic Video Coding With Abstraction And
   Detail Completion”, IEEE International Conference on Acoustics, Speech and
   Signal Processing (ICASSP 2009), April 19-24,2009, Taipei, Taiwan.
Thanks!

More Related Content

PDF
Image Interpolation
ThomasUnivalor
 
PPTX
Wavelet video processing tecnology
Prashant Madnavat
 
PDF
Iaetsd wavelet transform based latency optimized image compression for
Iaetsd Iaetsd
 
PDF
Image Denoising Using Non Linear Filter
IJMER
 
PDF
Synopsys track c
Alona Gradman
 
PDF
Lc3618931897
IJERA Editor
 
PPTX
Keynote - SPIE Stereoscopic Displays & Applications 2014
Gordon Wetzstein
 
PDF
Image Denoising Techniques Preserving Edges
IDES Editor
 
Image Interpolation
ThomasUnivalor
 
Wavelet video processing tecnology
Prashant Madnavat
 
Iaetsd wavelet transform based latency optimized image compression for
Iaetsd Iaetsd
 
Image Denoising Using Non Linear Filter
IJMER
 
Synopsys track c
Alona Gradman
 
Lc3618931897
IJERA Editor
 
Keynote - SPIE Stereoscopic Displays & Applications 2014
Gordon Wetzstein
 
Image Denoising Techniques Preserving Edges
IDES Editor
 

What's hot (19)

PPTX
Compressive Light Field Displays
Gordon Wetzstein
 
PDF
671 679
Editor IJARCET
 
PPTX
Introduction to wavelet transform
Raj Endiran
 
PDF
Design Approach of Colour Image Denoising Using Adaptive Wavelet
IJERD Editor
 
PPTX
PPT Image Analysis(IRDE, DRDO)
Nidhi Gopal
 
PDF
Robust Super-Resolution by minimizing a Gaussian-weighted L2 error norm
Tuan Q. Pham
 
PDF
D25014017
IJERA Editor
 
PPTX
Color-plus-Depth Level-of-Detail in 3D Tele-immersive Video: A Psychophysical...
Wanmin Wu
 
PDF
Random Valued Impulse Noise Removal in Colour Images using Adaptive Threshold...
IDES Editor
 
PDF
An Optimized Transform for ECG Signal Compression
IDES Editor
 
PPTX
Voice Activity Detection using Single Frequency Filtering
Tejus Adiga M
 
PDF
Continuous variable quantum key distribution finite key analysis of composabl...
wtyru1989
 
PDF
Comparative Analysis of Dwt, Reduced Wavelet Transform, Complex Wavelet Trans...
ijsrd.com
 
PDF
Experimental demonstration of continuous variable quantum key distribution ov...
wtyru1989
 
PDF
Fundamentals of Digital Signal Processing - Question Bank
Mathankumar S
 
PDF
Lightspeed SIGGRAPH talk
Jonathan Ragan-Kelley
 
PDF
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
MediaEval2012
 
PDF
Recent Progress on Single-Image Super-Resolution
Hiroto Honda
 
PDF
Nishimoto Interspeech 2010 v3
Takuya Nishimoto
 
Compressive Light Field Displays
Gordon Wetzstein
 
Introduction to wavelet transform
Raj Endiran
 
Design Approach of Colour Image Denoising Using Adaptive Wavelet
IJERD Editor
 
PPT Image Analysis(IRDE, DRDO)
Nidhi Gopal
 
Robust Super-Resolution by minimizing a Gaussian-weighted L2 error norm
Tuan Q. Pham
 
D25014017
IJERA Editor
 
Color-plus-Depth Level-of-Detail in 3D Tele-immersive Video: A Psychophysical...
Wanmin Wu
 
Random Valued Impulse Noise Removal in Colour Images using Adaptive Threshold...
IDES Editor
 
An Optimized Transform for ECG Signal Compression
IDES Editor
 
Voice Activity Detection using Single Frequency Filtering
Tejus Adiga M
 
Continuous variable quantum key distribution finite key analysis of composabl...
wtyru1989
 
Comparative Analysis of Dwt, Reduced Wavelet Transform, Complex Wavelet Trans...
ijsrd.com
 
Experimental demonstration of continuous variable quantum key distribution ov...
wtyru1989
 
Fundamentals of Digital Signal Processing - Question Bank
Mathankumar S
 
Lightspeed SIGGRAPH talk
Jonathan Ragan-Kelley
 
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
MediaEval2012
 
Recent Progress on Single-Image Super-Resolution
Hiroto Honda
 
Nishimoto Interspeech 2010 v3
Takuya Nishimoto
 
Ad

Viewers also liked (12)

PPT
Video summarization using clustering
Sahil Biswas
 
PDF
Gaining Colour Stability in Live Image Capturing
Guy K. Kloss
 
PPTX
Current developments in video quality: From the emerging HEVC standard to tem...
Harilaos Koumaras
 
PPT
Howen CCTV System worldwide Application-201309
Berry Gao
 
PPT
Applying Media Content Analysis to the Production of Musical Videos as Summar...
Chris Huang
 
PDF
Content based video summarization into object maps
Universitat Politècnica de Catalunya
 
PDF
Paralleling Variable Block Size Motion Estimation of HEVC On CPU plus GPU Pla...
Shanghai Jiao Tong University(上海交通大学)
 
PDF
Keyframe-based Video Summarization Designer
Universitat Politècnica de Catalunya
 
PDF
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
Journal For Research
 
PDF
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Universitat Politècnica de Catalunya
 
PDF
"Image and Video Summarization," a Presentation from the University of Washin...
Edge AI and Vision Alliance
 
PPT
Integrating Physical And Logical Security
Jorge Sebastiao
 
Video summarization using clustering
Sahil Biswas
 
Gaining Colour Stability in Live Image Capturing
Guy K. Kloss
 
Current developments in video quality: From the emerging HEVC standard to tem...
Harilaos Koumaras
 
Howen CCTV System worldwide Application-201309
Berry Gao
 
Applying Media Content Analysis to the Production of Musical Videos as Summar...
Chris Huang
 
Content based video summarization into object maps
Universitat Politècnica de Catalunya
 
Paralleling Variable Block Size Motion Estimation of HEVC On CPU plus GPU Pla...
Shanghai Jiao Tong University(上海交通大学)
 
Keyframe-based Video Summarization Designer
Universitat Politècnica de Catalunya
 
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
Journal For Research
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Universitat Politècnica de Catalunya
 
"Image and Video Summarization," a Presentation from the University of Washin...
Edge AI and Vision Alliance
 
Integrating Physical And Logical Security
Jorge Sebastiao
 
Ad

Similar to Perceptual Video Coding (20)

PDF
JASLA_presentation.pdf
Vignesh V Menon
 
PDF
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUs
Fisnik Kraja
 
PDF
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Fisnik Kraja
 
PDF
CyberSec_JPEGcompressionForensics.pdf
MohammadAzreeYahaya
 
PDF
H0545156
IOSR Journals
 
PPT
Efficient LDI Representation (TPCG 2008)
Matthias Trapp
 
PDF
SigmaDeltaADC
Satish Patil
 
PPT
Image denoising using curvelet transform
Government Engineering College, Gandhinagar
 
PDF
Depth estimation do we need to throw old things away
NAVER Engineering
 
PPT
BMC 2012 - Invited Talk
BOUWMANS Thierry
 
PPTX
Design of Radio Frequency Integrated Circuits for UWB Communications
RFIC-IUMA
 
PPTX
lossy compression JPEG
Mahmoud Hikmet
 
KEY
Ph.D. Presentation
matteodefelice
 
PPT
Pcm
srkrishna341
 
PDF
The role of a biometrician in an International Agricultural Center: service a...
International Institute of Tropical Agriculture
 
PDF
How video codec work
Leandro Moreira
 
PDF
A Video Watermarking Scheme to Hinder Camcorder Piracy
IOSR Journals
 
PPT
Ibtc dwt hybrid coding of digital images
Zakaria Zubi
 
PDF
48
srimoorthi
 
PPTX
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
Electronic Arts / DICE
 
JASLA_presentation.pdf
Vignesh V Menon
 
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUs
Fisnik Kraja
 
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Fisnik Kraja
 
CyberSec_JPEGcompressionForensics.pdf
MohammadAzreeYahaya
 
H0545156
IOSR Journals
 
Efficient LDI Representation (TPCG 2008)
Matthias Trapp
 
SigmaDeltaADC
Satish Patil
 
Image denoising using curvelet transform
Government Engineering College, Gandhinagar
 
Depth estimation do we need to throw old things away
NAVER Engineering
 
BMC 2012 - Invited Talk
BOUWMANS Thierry
 
Design of Radio Frequency Integrated Circuits for UWB Communications
RFIC-IUMA
 
lossy compression JPEG
Mahmoud Hikmet
 
Ph.D. Presentation
matteodefelice
 
The role of a biometrician in an International Agricultural Center: service a...
International Institute of Tropical Agriculture
 
How video codec work
Leandro Moreira
 
A Video Watermarking Scheme to Hinder Camcorder Piracy
IOSR Journals
 
Ibtc dwt hybrid coding of digital images
Zakaria Zubi
 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
Electronic Arts / DICE
 

More from Shanghai Jiao Tong University(上海交通大学) (6)

PDF
ICIP2013-video stabilization with l1 l2 optimization
Shanghai Jiao Tong University(上海交通大学)
 
PDF
THE SJTU 4K VIDEO SEQUENCE DATASET
Shanghai Jiao Tong University(上海交通大学)
 
PDF
No-reference Video Quality Assessment on Mobile Devices
Shanghai Jiao Tong University(上海交通大学)
 
PDF
Efficient Realization of Parallel HEVC Intra Coding
Shanghai Jiao Tong University(上海交通大学)
 
PDF
Foreground Detection : Combining Background Subspace Learning with Object Smo...
Shanghai Jiao Tong University(上海交通大学)
 
PDF
Background Subtraction Based on Phase and Distance Transform Under Sudden Ill...
Shanghai Jiao Tong University(上海交通大学)
 
ICIP2013-video stabilization with l1 l2 optimization
Shanghai Jiao Tong University(上海交通大学)
 
THE SJTU 4K VIDEO SEQUENCE DATASET
Shanghai Jiao Tong University(上海交通大学)
 
No-reference Video Quality Assessment on Mobile Devices
Shanghai Jiao Tong University(上海交通大学)
 
Efficient Realization of Parallel HEVC Intra Coding
Shanghai Jiao Tong University(上海交通大学)
 
Foreground Detection : Combining Background Subspace Learning with Object Smo...
Shanghai Jiao Tong University(上海交通大学)
 
Background Subtraction Based on Phase and Distance Transform Under Sudden Ill...
Shanghai Jiao Tong University(上海交通大学)
 

Recently uploaded (20)

PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 

Perceptual Video Coding

  • 1. Perceptual Video Coding Research Progress Dr. Li Song Associate Professor, SJTU Visiting Associate Professor, SCU 2012.09
  • 2. Outline  Introduction  Perceptual Cues in Video Coding  Recent Research  JND based RDO  SSIM based RDO  Analysis-Completion Framework  Summary & References
  • 3. Perceptual Lossless Images PIC: 0.914 bits/pixel! Original! [T. Pappas, Visual Signal Analysis and Compression, ICIP 2010]
  • 4. Perceptual Video Coding Technique (Digital) Video D Codec(Encoder + Decoder) R Human Visual System (HVS) (end recipient) Dimensions of coder performance Basic Principle in Perceptual coding technique - consider all the data that humans cannot perceive as superfluous data, and discard them.
  • 5. Rate-Distortion Theory ^ x Q x Quantization noise: ˆ e X X N D   pi ( xi  xi ) 2 ˆ i 1 probabilities If X is Gaussian distribution N(0,σ2): D  2 2 2 R
  • 6. Gap between theory and real codec SPIHT can beat Shannon bound! Gaussian prior is not valid for image! Rate-distortion curves achieved with the SPIHT coder(dash line) and with the Shannon RD theoretical bounds(solid line) corresponding to an i.i.d. zero- mean Gaussian model for each wavelet sub bands (Gaussian vector source) [A. Ortega, etc, IEEE Signal Processing Magazine, 1998]
  • 7. HEVC: MSE vs MOS Random Low Delay Access Class A −36.9% Class B −39.4% −40.3% Class C −30.1% −31.5% Class D −28.3% −29.2% Class E −41.2% Class F −26.2% −28.8% Average −32.5% −34.2% Average −34.0% −35.5% without F [from:JCTVC-I0409, 2012] [from: JCT-VC Summary, 8th JCT-VC] There is >20% gap between MSE and MOS!
  • 8. Ideal perceptual metric Half century’s endeavor and still open problem! Many metrics proposed: SSIM/M-SSIM/CW-SSIM, VIF, VQM,… [Figure from :N. Jayant, Proceedings of the IEEE ,1993]
  • 9. What about Popular SSIM? [JCTVC-H0063,2012]
  • 10. Outline  Introduction  Perceptual Cues in Video Coding  Recent Research  JND based RDO  SSIM based RDO  Analysis-Completion Framework  Summary & References
  • 11. Where do we use perceptual model currently? [Pourazad, IEEE Consumer Electronics Magazine, 2012]
  • 12. Frequency Masking for JPEG The DCT-based encoder incorporated with human visual frequency weighting(L.W Chang,2001 ) Modulation Transfer Function(MTF) or Quantization Matrix(QM) we can do better with fine adjustment factor!
  • 13. HEVC QM Design  HEVC default quantization matrix  Intra 8x8 QM: Uses the same QM developed for JPEG in 1999.  Intra 4x4 QM: Sub-sampled from 8x8 Intra QM  Intra 16x16 QM and Intra 32x32 QM: Up-sampled from 8x8 Intra QM  Inter QM’s : Predicted from Intra QM’s, using the linear relationship between the Intra QM’s and the corresponding inter QM’s in AVC/H.264 [JCT-VC I012]&[L.W. Chang 2001]
  • 14. Local Spatial-temporal contrast sensitivity of luminance perception
  • 15. JND in the classic DCT domain TJND  n, i, j   Tbasic  n, i, j   Flum  n   Fcontrast  n, i, j   Ftemporal  n, i, j  The basic threshold Spatial frequency Tbasic The luminance adaptation factor Luminance sensitivity Flum The contrast masking factor Plane, edge, texture, etc Fcontrast The temporal modulation factor Motion, frame rate, etc Ftemporal [Zhenyu Wei,etc, IEEE T-CSVT, 2009]
  • 16. Different Embedded Schemes [X. Yang, TCSVT, 2005] [Our, ISCAS 2010]& [TCSVT (accept)] [Z. Chen, TCSVT ,2010] & [M. Naccari,TCSVT, 2011]
  • 17. The proposed Coding Framework Adjustment Threshold Calculation JND Calculation and Translation Adaptive Entropy Input T Q Output Suppression Coding Q-1 T-1 Intra or Inter Prediction Frame Buffer Lagrange Multiplier D= D1(Q)+D2(JND) Adaptation Motion Vector Scaling
  • 18. Bit Saving Bitrate Reduction Against Bitrate (kbps) Sequence Preset QP JM 14.2 (%) JM 14.2 Chen’s Proposed Chen’s Proposed 20 7945.83 6889.50 5149.85 13.29 35.19 24 3165.17 2660.42 2436.40 15.95 23.02 Cyclists 28 1343.73 1103.82 1138.30 17.85 15.29 32 658.92 543.16 612.40 17.57 7.06 20 25104.43 23734.86 15822.41 5.46 36.97 24 13496.66 12290.08 8843.39 8.94 34.48 Harbour 28 6054.17 5336.50 4557.15 11.85 24.73 32 2909.30 2607.64 2588.25 10.37 11.04 20 20306.64 18749.84 11330.19 7.67 44.20 24 9688.57 8714.15 6239.72 10.06 35.60 Night 28 4507.60 4036.23 3430.19 10.46 23.90 32 2311.90 2088.36 2050.42 9.67 11.31
  • 19. Bit Saving Bitrate Reduction Against Bitrate (kbps) Sequence Preset QP JM 14.2 (%) JM 14.2 Chen’s Proposed Chen’s Proposed 20 7135.21 6568.93 4147.18 7.94 41.88 24 3193.59 2850.05 2201.83 10.76 31.05 Raven 28 1537.32 1346.20 1189.10 12.43 22.65 32 803.07 705.19 710.89 12.19 11.48 20 13951.79 12986.99 7317.07 6.92 47.55 24 6472.74 5838.45 3739.43 9.80 42.23 Sheriff 28 2665.81 2361.96 1817.07 11.40 31.84 32 1159.36 1032.24 963.12 10.96 16.93 20 25071.25 21394.72 11108.62 14.66 55.69 24 7878.49 5930.58 4548.43 24.72 42.27 SpinCalendar 28 2653.01 2194.53 2046.35 17.28 22.87 32 1315.22 1129.24 1177.62 14.14 10.46 Average 12.18 28.32
  • 20. Frame Differences JM 14.2: QP=20 88th Frame
  • 23. Frame Differences JM 14.2: QP=20 102nd Frame
  • 26. SSIM motivated Perceptual Coding  Yi-Hsin Huang, etc,. "Perceptual Rate-Distortion Optimization Using Structural Similarity Index as Quality Metric“, IEEE T-CSVT, vol. 20, no. 11, pp. 1614-1624, Nov., 2010.  Replace PNSR with SSIM  Empirically estimating Rate-SSIM model  Reuse classical Lagrange multiplier method for mode selection and motion estimation
  • 27. Improved SSIM Perceptual Coding  Shiqi Wang, etc., “SSIM-Motivated Rate- Distortion Optimization for Video Coding”, IEEE T-CSVT, Vol.22, no. 4, pp.516-529, April, 2012.  They try to get the analytical model for the Rate-SSIM relationship  ChuoHao Yeo, etc., “On Rate Distortion Optimization using SSIM”, ICASSP 2012.  Abdul Rehman ,etc., “SSIM-Inspired Perceptual Video Coding for HEVC”, ICME 2012.  Xi Wang, etc., “Motion Based Perceptual Distortion and Rate Optimization for video Coding”, ICEM 2012
  • 28. Basic Analysis-Completion Structure [P. Ndjiki-Nya, Signal Processing: Image Communication, 2012]
  • 29. Abstract+Detail Framework Key Frame (Abstract+Detail) [Z. Yuan, H. Xiong and Li Song, ICASSP 2009] Abstract Only(NonKey Frame) Use ME to find matching Use Bilateral Filtering to block to recover details remove details
  • 30. Super-resolution Framework Encoder  Symmetric coding complexity  5~10% bit saving at same quality Decoder [Q. Zhou, and Li Song, IEEE PCM 2010]
  • 31. Outline  Introduction  Perceptual Cues in Video Coding  Recent Research  JND based RDO  SSIM based RDO  Analysis-Completion Framework  Summary & References
  • 32. Personal Respective  Can we do much better than HEVC?  Yes, new generation video coding probably will need more perceptual related techniques.  Some preliminary works  “On Just Noticeable Distortion Quantization in the HEVC Codec”, JCTVC-H0477, Feb.2012  Claim 3%~25% bitrate saving at same quality.  “A joint JND model based on luminance and frequency masking for HEVC”, JCTVC-I0163, May.2012  Claim 3%~30% bitrate saving at same quality.
  • 33. Personal Respective  Future research  Advanced computational HVS model – Suprathreshold vs suberthreshold – Other masking model, like attention  Exploiting new Distortion Metric – Image statistical properties – Learning from large-scale datasets  Generic R-D Optimization – R-D relationship and RDO for video coding.
  • 34. References  Important papers  J. L. Mannnos and D. J. Sakrison, “The Effects of a Visual Fidelity Criterion on the Encoding of Images”, IEEE Trans. On Information Theory, Vol.20, No.4, July 1974.(Cited by 776)  N. Jayant, J. Johnston and R. Safranek, “Signal Compression Based on Models of Human Perception”, Proceedings of the IEEE, Vol. 81, No.10, Oct., 1993 (Cited by 761)  A Ortega, K Ramchandran, Rate-distortion methods for image and video compression, IEEE Signal Processing Magazine, Vol.15 (6), 23-50, 1998(Cited by 597)  W. Zhou, A.C. Bovik, "Mean Squared Error: love it or leave it? A new look at Signal Fidelity Measures", IEEE Signal Processing Magazine , Vol.26(1):98-117, Jan. 2009. (Cited by 353)  Ching Yang Wang, Shiuh Ming Lee, Long-Wen Chang, “Designing JPEG quantization tables based on human visual system”, Sig. Proc.: Image Comm. 16(5): 501-506, 2001.  Wenjun Zeng, Scott Daly, Shawmin Lei, “An Overview of the Visual Optimization Tools in JPEG 2000”, Sig. Proc.: Image Comm. 17: 85-104, 2002.
  • 35. References  JND related  X. Yang, W. Lin, Z. Lu, E. Ong and S. Yao, “Motion-compensated Residue Pre-processing in Video Coding Based on Just-noticeable-distortion Profile”, IEEE Trans. Circuits and Systems for Video Technology, vol.15(6), pp.742-750, June, 2005.  Z. Chen and C. Guillemot, "Perceptually-friendly H.264/AVC video coding based on foveated Just-Noticeable-Distortion model," IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 6, pp. 806-819, June 2010.  M. Naccari and F. Pereira, "Advanced H.264/AVC based perceptual video coding: architecture, tools and assessment", IEEE Transactions on Circuits and Systems for Video Technology, vol. 21, no. 6, pp. 766-782, June 2011.  M. Naccari and M. Mrak, “On Just Noticeable Distortion Quantization in the HEVC codec”, JCTVC-H0477, JCTVT 8th Meeting, San Jose, Feb., 2012  Z. Luo, Li Song, S. Zheng,"Improving H.264/AVC Video Coding with Adaptive Coefficient Suppression",IEEE International Symposium on Circuits and Systems (ISCAS 2010), May.30-June.2, 2010, France.
  • 36. References  SSIM or Other Metrics as Distortion: Yi-Hsin Huang, Tao-Sheng Ou, Po-Yen Su, Chen, H.H. "Perceptual Rate- Distortion Optimization Using Structural Similarity Index as Quality Metric“, IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, no. 11, pp. 1614-1624, Nov., 2010. Yi-Hsin Huang, Tao-Sheng Ou, Po-Yen Su, Chen, H.H. “SSIM-Based Perceptual Rate Control for Video Coding”, IEEE Transactions on Circuits and Systems for Video Technology, Vol.21, No.5, pp.682-691, May, 2012. Shiqi Wang, Rehman, A, Zhou Wang, Siwei Ma and Wen Gao, “SSIM-Motivated Rate-Distortion Optimization for Video Coding”, IEEE Transactions on Circuits and Systems for Video Technology, Vol.22, no. 4, pp.516-529, April, 2012 Yeo chuoHao, Tan Huili, Tan Yihhan, “On Rate Distortion Optimization using SSIM”, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2012. Abdul Rehman and Zhou Wang, “SSIM-Inspired Perceptual Video Coding for HEVC”, IEEE International Conference on Multimedia and Expo, June 2012. Xi Wang, Li Su, Qingming Huang, Chunxi Liu, Ling-yu Duan, “Motion Based Perceptual Distortion and Rate Optimization for video Coding”, IEEE International Conference on Multimedia and Expo, 2012.
  • 37. References  Analysis-Completion Framework: Minmin Shen, Ping Xue and Ci Wang, “Down-Sampling Based Video Coding Using Super-Resolution Technique”, IEEE Transaction On Circuits and Systems for Video Technology, VOL. 21, NO. 6, pp.755-765, June, 2011 P. Ndjiki-Nya, D. Doshkov, H. Kaprykowsky, F. Zhang, D. Bull, T. Wiegand, "Perception-oriented video coding based on image analysis and completion: A review", Signal Processing: Image Communication 27 (2012) 579–594. F.Zhang,D.R.Bull,Aparametricframeworkforvideocompression using region- basedtexturemodels,IEEE Journal of Selected Topics in Signal Processing Vol.5(7):1378–1392,2011. Q. Zhou, Li Song, W. Zhang, “Video Coding With Key Frames Guided Super Resolution”, IEEE Pacific-Rim Conference on Multimedia (PCM 2010), September 21-24, Shanghai, China. Z Yuan, H. Xiong, Li Song, “Generic Video Coding With Abstraction And Detail Completion”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2009), April 19-24,2009, Taipei, Taiwan.