SlideShare a Scribd company logo
Convex-hull Estimation using XPSNR for
Versatile Video Coding
—
Vignesh V Menon, Christian R Helmrich, Adam Więckowski, Benjamin Bross, Detlev Marpe
Video Communication and Applications Dept., Fraunhofer HHI, Germany
Introduction
—
MHV’24
Introduction
30.10.2024 © Fraunhofer
Slide 3
HTTP Adaptive Streaming (HAS)
• HTTP Adaptive Streaming (HAS) has become the standard for delivering video content over
various internet speeds and devices [1].
• Key Components:
○ Segmented Video Content: Video is encoded at multiple quality levels and split into
segments (e.g., 2-10 seconds each).
○ Manifest Files (MPD/HLS Manifest): Provides clients with information about available
video bitrates, resolutions, and segment locations.
○ Adaptive Bitrate (ABR) Streaming: Enables real-time switching between different video
qualities based on bandwidth, device capability, and buffer state.
• Video Coding Relevance:
○ Efficient Encoding: Advanced codecs (e.g., HEVC [2], VVC [3]) enable high
compression ratios without compromising quality, allowing adaptive streaming to meet
quality and bandwidth requirements effectively.
○ Per-Title/Per-Scene Encoding: Customizes encoding for each title or scene to achieve
optimal quality at each bitrate.
○ Quality Metrics (e.g., VMAF, PSNR): Used to select encoding parameters and maintain
consistent perceptual quality across bitrates.
[1] I. Sodagar, “The MPEG-DASH Standard for Multimedia Streaming Over the Internet,” IEEE MultiMedia, vol. 18, no. 4, pp. 62–67, 2011.
[2] G. J. Sullivan et al., “Overview of the High Efficiency Video Coding (HEVC) Standard,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, 2012, pp. 1649–1668.
[3] B. Bross et al., “Overview of the Versatile Video Coding (VVC) Standard and its Applications,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, 2021, pp. 3736–3764.
MHV’24
Introduction
30.10.2024 © Fraunhofer
Slide 4
Coding complexity
● Modern video standards, such as Versatile Video Coding
(VVC), provide high compression efficiency but at a cost of
significantly increased coding complexity.
● VVC Complexity: Up to 10x more complex than previous
standards (e.g., HEVC) [1].
● Challenge:
○ Computational Demands: Higher resolution and quality
requirements increase both encoding and decoding
time, impacting streaming latency and energy efficiency
[2].
● Complexity Metrics:
○ Encoding Time: Increased due to finer partitioning,
more prediction modes, and higher-level coding tools.
○ Decoding Time: High complexity impacts playback on
devices with limited processing power.
Figure: Rate-distortion (RD) and rate-decoding time curves of representative
sequences of Inter-4K dataset [3] encoded at 540p, 1080p, and 2160p resolutions
using VVenC at the faster preset, and decoded using VTM decoder.
[1] R. Kaafarani et al., “Evaluation Of Bitrate Ladders For Versatile Video Coder,” in 2021 International Conference on Visual Communications and Image Processing (VCIP), 2021, pp. 1–5.
[2] V. V. Menon et al., “Video Super-Resolution for Optimized Bitrate and Green Online Streaming,” in 2024 Picture Coding Symposium (PCS), 2024.
[3] A. Stergiou and R. Poppe, “AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling,” in IEEE Transactions on Image Processing, vol. 32, 2023, pp. 251–266.
MHV’24
Introduction
30.10.2024 © Fraunhofer
Slide 5
Limitations of existing metrics
● PSNR [1] measurements are based on picture-wise MSE
○ Normalized to bit-depth of the assessed images
○ Usually, picture PSNR values averaged on videos
○ Problem: sensitivity of HVS to distortion not fixed
● SSIM [2] is a multiplicative combination of luminance, contrast and structure
similarities
● VMAF [3] applies several VQA models
○ Model results fused using machine learning
○ High correlation between VMAF and subjective scores.
○ Problem: high computational complexity, not differentiable, fails for
VVC-coded UHD content [4, 5].
[1] Alliance For Telecommunications Industry Solutions, “Objective Video Quality Measurement Using A PeakSignal-To-Noise-Ratio (PSNR) Full Reference Technique,” T1.TR.74-2001, 2001.
[2] Z. Wang et al., “Image quality assessment: from error visibility to structural similarity,” in IEEE Transactions on Image Processing, vol. 13, no. 4, 2004, pp. 600–612.
[3] Z. Li et al., “VMAF: The journey continues,” Netflix Technology Blog, vol. 25, 2018.
[4] M. Wien and V. Baroncini, “Report on VVC compression performance verification testing in the SDR UHD Random Access Category,” in WG 05 MPEG Joint Video Coding Team(s) with ITU-T
SG 16, document JVET-T0097, Oct. 2020.
[5] C. R. Helmrich et al., “Information on and analysis of the VVC encoders in the SDR UHD verification test,” in WG 05 MPEG Joint Video Coding Team(s) with ITUT SG 16, document
JVET-T0103, Oct. 2020
Figure: VMAF computation [3].
MHV’24
Introduction
30.10.2024 © Fraunhofer
Slide 6
Objective
Aim:
● To enhance the efficiency and quality of HTTP Adaptive Streaming in Versatile Video
Coding (VVC) using an improved perceptual quality metric, XPSNR, as a reliable
alternative to traditional metrics like VMAF.
Core Goals:
● Perceptual Quality Optimization: Improve perceptual video quality by leveraging
XPSNR, which shows higher correlation with subjective quality scores for UHD
content.
● Efficient Bitrate-Resolution Selection: Develop a method for estimating the convex
hull online to determine optimal bitrate-resolution pairs, balancing quality and
computational efficiency.
● Resource Optimization: Reduce encoding and decoding times through effective
bitrate-resolution adaptation, allowing for lower energy consumption and enhanced
user experience.
XPSNR
MHV’24
XPSNR
30.10.2024 © Fraunhofer
Slide 8
Introduction
● XPSNR (eXtended Peak Signal-to-Noise Ratio) is an advanced VQA metric.
● Extension of PSNR: While traditional PSNR lacks correlation with human perception, XPSNR incorporates
perceptual considerations, making it more aligned with subjective video quality judgments.
● Low complexity: Designed to maintain low computational complexity, XPSNR is efficient for real-time
encoding tasks, particularly in high-resolution content like UHD.
● Defined block-wise for block B (at index k) of input picture pic.
● Also depends on image width W, height H, and bit depth BD.
● Core: visual activity measure from spatio-temporal high-pass
Only low-complexity operations, squares and square-roots cancel.
[1] C. R. Helmrich et al., “Information on and analysis of the VVC encoders in the SDR UHD verification test,” in WG 05 MPEG Joint Video Coding Team(s) with ITU-T SG 16, document JVET-T0103, Oct. 2020.
[2] M. Wien and V. Baroncini, “Report on VVC compression performance verification testing in the SDR UHD Random Access Category,” in WG 05 MPEG Joint Video Coding Team(s) with ITU-T SG 16, document JVET-T0097,
Oct. 2020.
[3] C. R. Helmrich et al., “A study of the extended perceptually weighted peak signal-to-noise ratio (XPSNR) for video compression with different resolutions and bit depths,” in ITU Journal: ICT Discoveries, vol. 3, May 2020.
[Online] http://handle.itu.int/11.1002/pub/8153d78b-en
MHV’24
XPSNR
30.10.2024 © Fraunhofer
Slide 9
Improvement over VMAF
Figure: Comparison of computation times of
quality metrics for UHD videos with 240 frames.
Table: Evaluation of Pearson linear correlation. Higher
values implies higher correlation with MOS scores [1].
Table: Evaluation of Spearman rank correlation [1].
[1] C. R. Helmrich et al., “Information on and analysis of the VVC encoders in the SDR UHD verification test,” in WG 05 MPEG Joint Video Coding Team(s) with ITU-T SG 16, document JVET-T0103, Oct. 2020.
VEXUS Architecture
MHV’24
VEXUS Architecture
30.10.2024 © Fraunhofer
Slide 11
Convex-hull
Figure: Conceptual plot to depict the bitrate-quality relationship for
any video source encoded at different resolutions [2].
[1] R. Kaafarani et al., “Evaluation Of Bitrate Ladders For Versatile Video Coder,” in 2021 International Conference on Visual Communications and Image Processing (VCIP), 2021, pp. 1–5.
[2] A. Aaron et al., "Per-title encode optimization." The Netflix Techblog (2015).
[3] A. Stergiou and R. Poppe, “AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling,” in IEEE Transactions on Image Processing, vol. 32, 2023, pp. 251–266.
[4] A. Wieckowski et al., “Vvenc: An Open And Optimized VVC Encoder Implementation,” in 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Jul. 2021, pp. 1–2.
[5] A. Wieckowski et al., “Towards a live software decoder implementation for the upcoming versatile video coding (vvc) codec,” in Proc. IEEE International Conference on Image Processing (ICIP), pp. 3124–3128.
● The convex hull is where the encoding point achieves “Pareto efficiency”
[1, 2].
● Online convex-hull estimation methods provide a dynamic and adaptive
means to optimize bitrate and resolution selections [2].
Figure: Rate-XPSNR curves and decoding times of example video sequences
from the employed dataset, Inter4K [3], encoded using VVenC [4] and decoded
using VVdeC [5].
MHV’24
VEXUS Architecture
30.10.2024 © Fraunhofer
Slide 12
Components:
● Input parameters:
○ Set of supported resolutions,
○ Set of bitrates,
○ Maximum supported resolution.
● Outputs:
○ Optimized encoding bitrate ladder.
Workflow
Figure: Convex-hull estimation and encoding architecture using VEXUS.
MHV’24
Spatiotemporal complexity feature extraction
30.10.2024 © Fraunhofer
Slide 13
We use seven DCT-energy-based features extracted using Video
Complexity Analyzer (VCA) [1, 2]:
● average luma texture energy (EY),
● average gradient of the luma texture energy (h)
● average luma brightness (LY),
● average chroma texture energy of U and V channels (EU and EV)
● average chroma brightness of U and V channels (LU and LV).
[1] V. V. Menon et al., “Green Video Complexity Analysis for Efficient Encoding in Adaptive Video Streaming,” in First International ACM Green Multimedia Systems Workshop (GMSys ’23), 2023.
[2] V. V. Menon et al., "JND-Aware Two-Pass Per-Title Encoding Scheme for Adaptive Live Streaming," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 2, pp. 1281-1294, Feb. 2024, doi:
10.1109/TCSVT.2023.3290725.
Figure: Heatmap of EY (left) and h (right).
MHV’24
XPSNR-optimized resolution prediction
30.10.2024 © Fraunhofer
Slide 14
● Process: Two-part approach involving modeling and optimization.
● Goal: Maximize perceptual quality.
The perceptual quality of the representation (rt, bt) rely on the extracted video complexity features, encoding resolution, and target
bitrate:
● A higher resolution, and/or bitrate may improve the quality.
● LY, LU, and LV features consider variations in luminance and chrominance within localized regions.
● EY, EU, and EV features account for variations in texture across different frame regions, providing insights into how well the
compression or reconstruction method preserves textural details.
● h introduces a temporal dimension, where dynamic changes in texture over time influencing perceived quality are captured.
Modeling:
Optimization:
VEXUS optimizes the perceptual quality (in terms of XPSNR) of encoded video segments.
MHV’24
Optimized QP prediction
30.10.2024 © Fraunhofer
Slide 15
Modeling:
The optimized QP is modeled as a function of spatiotemporal features, target bitrate b, and normalized resolution height r‘ as:
● For applications such as streaming, avoiding exceeding the maximum bitrates specified in the HLS/DASH manifests [1, 2] during
the encoding process is essential.
● Failure to adhere to these limits can lead to buffer overflows or underflows in video players.
[1] I. Sodagar, “The MPEG-DASH Standard for Multimedia Streaming Over the Internet,” IEEE MultiMedia, vol. 18, no. 4, pp. 62–67, 2011.
[2] A. Bentaleb, B. Taani, A. C. Begen, C. Timmerer, and R. Zimmermann, “A Survey on Bitrate Adaptation Schemes for Streaming Media Over HTTP,” vol, 21, no. 1, IEEE Communications Surveys
Tutorials, pp. 562–585, 2019.
QP optimization:
The optimization function aims to predict the QP, minimizing the discrepancy between the predicted and target bitrate for a given
resolution.
MHV’24
Optimized QP prediction
30.10.2024 © Fraunhofer
Slide 16
● The equation captures the non-linear relationship between bitrate and QP by employing a
logarithmic mapping of the bitrate values.
● VVenC implemented capped VBR ratecontrol in Jan 2024 release [1], the QP is specified
using the qp option, while the maxrate (easy mode) or MaxBitrate (expert mode) option is
used to specify the upper bound of bitrate variability.
● This method involves training distinct XGboost regression models for minimum and maximum QP values (qmin and qmax,
respectively).
● The optimized for a target bitrate b is determined using linear regression, as follows:
Figure: QP versus normalized bitrate (in log
scale) for a representative video segment.
[1] C. Helmrich, V. George, V. V. Menon, A. Wieckowski, B. Bross, and D. Marpe, “Fast constant-quality video encoding using VVenC with rate capping based on pre-analysis statistics”, 2024 IEEE International
Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 2024, pp. 1823-1828, doi: 10.1109/ICIP51287.2024.10647456.
Cascaded Approach
Experimental Design
MHV’24
Experimental setup
30.10.2024 © Fraunhofer
Slide 18
● We used 1000 videos of the Inter-4K dataset [1] to validate the performance of
the encoding methods.
● We encoded the sequences at UHD (2160p) 60fps using VVenC v1.10 [2]
using preset 0 (faster).
● We extracted the spatiotemporal features using VCA v2.0.
● We ran constant quality encoding by varying qp values from qmin to qmax for
each resolution.
● We computed full-reference PSNR and XPSNR quality metrics after the
compressed video was upscaled to the original resolution (2160p).
Table: Experimental parameters used to evaluate VEXUS.
[1] A. Stergiou and R. Poppe, “AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling,” in IEEE Transactions on Image Processing, vol. 32, 2023, pp. 251–266.
[2] A. Więckowski, J. Brandenburg, T. Hinz, C. Bartnik, V. George, G. Hege, C. Helmrich, A. Henkel, C. Lehmann, C. Stoffers, I. Zupancic, B. Bross, and D. Marpe, “VVenC: An Open And Optimized
VVC Encoder Implementation,” in Proc. IEEE International Conference on Multimedia Expo Workshops (ICMEW), pp. 1–2.
Figure: Calculation of the groundtruth PSNR, XPSNR, and bitrate to train
the prediction models. This example shows the ground truth calculation of a
video encoded at 1080p with qp 30.
Dataset
MHV’24
Experimental setup
30.10.2024 © Fraunhofer
Slide 19
[1] Apple Inc., “HLS Authoring Specification for Apple Devices.” [Online]. Available: https://developer.apple.com/documentation/http-live-streaming/hls-authoring-specification-for-apple-devices
[2] J. De Cock et al., “Complexity-based consistent-quality encoding in the cloud,” in 2016 IEEE International Conference on Image Processing (ICIP), 2016, pp. 1484–1488.
[3] C. Chen et al., “Optimized Transcoding for Large Scale Adaptive Streaming Using Playback Statistics,” in 2018 25th IEEE International Conference on Image Processing (ICIP), 2018, pp.
3269–3273.
Benchmarks
1. Default: This method employs a fixed resolution encoding, i.e., all
bitstreams are encoded at the exact resolution as the input video.
2. FixedLadder: This method employs a fixed set of bitrate-resolution pairs.
We use the HLS bitrate ladder specified in the Apple authoring
specifications [1] as the fixed set of bitrate-resolution pairs.
3. Bruteforce: This method determines optimized resolution, which yields the
highest XPSNR for a given target bitrate after an exhaustive encoding
process at all supported resolutions and QPs [2, 3].
Table: An example fixed bitrate-ladder, i.e., set of
bitrate-resolution pairs. Source: [1]
Results and Evaluation
MHV’24
Prediction analysis
30.10.2024 © Fraunhofer
Slide 21
● Accuracy:
○ XPSNR prediction
■ MAE: 0.17 dB, R2: 0.99
■ Std. dev: 0.22 dB
○ QP prediction
■ MAE: 1.32, R2: 0.97
■ Std. dev: 1.86
● Speed of feature extraction: 176 fps
● Model inference time: 4 ms
Figure: Prediction results of the XPSNR prediction model.
● XPSNR prediction model
○ XGBoost regressor [1] is trained for each supported resolution.
○ Hyperparameters: max_depth=10, and n_estimators=400.
● QP prediction models
○ XGBoost regressor is trained for qmin and qmax for each supported resolution.
[1] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2016,
pp. 785–794.
MHV’24
Rate-distortion Analysis
30.10.2024 © Fraunhofer
Slide 22
RD curve of VEXUS closely mirrors the Bruteforce method, indicating the effectiveness of its predictive modeling in approximating
optimized resolutions and QPs.
Figure: RD curves of representative video sequences using Default (green line), FixedLadder (blue line), Bruteforce (black line), and VEXUS (red line) encodings.
MHV’24
Encoding and decoding times
30.10.2024 © Fraunhofer
Slide 23
Figure: Encoding and decoding times of representative video sequences using Default (green line), FixedLadder (blue line),
Bruteforce (black line), and VEXUS (red line) encodings.
● Encoding and decoding times are reduced for lower bitrates, as encoding and decoding operations become less complex due to
lower resolutions.
MHV’24
Result Summary
30.10.2024 © Fraunhofer
Slide 24
● Coding efficiency (in terms of Bjøntegaard Delta [1] rates), encoding and decoding times decrease as rmax decreases.
● The trade-off between quality and coding efficiency is based on the target audience, delivery platform, and available resources.
[1] HSTP-VID-WPOM, “Working practices using objective metrics for evaluation of video coding efficiency experiments,” International Telecommunication Union, 2020. [Online].
Available: http://handle.itu.int/11.1002/pub/8160e8da-en
Table: Average results of the encoding schemes compared to Default encoding.
Conclusions
MHV’24
Conclusions
30.10.2024 © Fraunhofer
Slide 26
● XPSNR demonstrates a better correlation with subjective quality scores for VVC-coded UHD content.
● Leveraging this insight, we introduced an approach where XPSNR is predicted for VVC-coded bitstreams using spatiotemporal
complexity features of the video and the target encoding configuration.
● We proposed VEXUS, where the convex-hull is estimated online using the predicted XPSNR.
● On average, VEXUS yields a substantial improvement of 5.84 dB in PSNR and 0.62 dB in XPSNR for the same bitrates compared to the
conventional UHD encoding with the VVenC encoder, followed by a 44.43% reduction in overall encoding time, and a 65.46% reduction
in overall decoding time using VTM decoder.
Open-source tools:
1. VVC encoder: Fraunhofer Versatile Video Encoder (VVenC) v1.10
Available: https://github.com/fraunhoferhhi/vvenc
2. VVC decoder: VTM reference decoder v22.0
Available: https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM
3. Spatiotemporal feature extractor: Video Complexity Analyzer (VCA) v2.0
Available: https://github.com/cd-athena/VCA
4. Convex-hull estimation framework:
Available: https://github.com/PhoenixVideo/QADRA
Thank you for your attention
— ▪ Vignesh V Menon (vignesh.menon@hhi.fraunhofer.de)

More Related Content

Similar to Convex-hull Estimation using XPSNR for Versatile Video Coding (20)

PDF
Efficient bitrate ladder construction for live video streaming
Minh Nguyen
 
PDF
Shahid presentation
Muhammad Shahid
 
PDF
Improving Per-title Encoding for HTTP Adaptive Streaming by Utilizing Video S...
Alpen-Adria-Universität
 
PDF
Versatile Video Coding: Compression Tools for UHD and 360° Video
Mathias Wien
 
PDF
Bitmovin AV1/VVC Presentation_Streaming Media East by Christian Feldmann
Bitmovin Inc
 
PDF
LiveVBR presentation at VQEG NORM.pdf
Vignesh V Menon
 
PDF
HTTP Adaptive Streaming – Where Is It Heading?
Alpen-Adria-Universität
 
PDF
Perceptually-aware Per-title Encoding for Adaptive Video Streaming
Alpen-Adria-Universität
 
PDF
Machine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming
Alpen-Adria-Universität
 
PDF
HTTP Adaptive Streaming – Quo Vadis? (2023)
Alpen-Adria-Universität
 
PDF
Perceptually-aware Per-title Encoding for Adaptive Video Streaming.pdf
Vignesh V Menon
 
PDF
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
Alpen-Adria-Universität
 
PDF
VCIP_MCBE_presentation.pdf
Vignesh V Menon
 
PDF
[IJET-V1I2P1] Authors :Imran Ullah Khan ,Mohd. Javed Khan ,S.Hasan Saeed ,Nup...
IJET - International Journal of Engineering and Techniques
 
PDF
Green Variable framerate encoding for Adaptive Live Streaming
Vignesh V Menon
 
PDF
VVC tutorial at ICME 2020 together with Benjamin Bross
Mathias Wien
 
PDF
Gain of Grain: A Film Grain Handling Toolchain for VVC-based Open Implementat...
Vignesh V Menon
 
PPTX
Objective Evaluation of Video Quality
Anton Venema
 
PDF
What’s new in MPEG?
Alpen-Adria-Universität
 
PDF
Learning Quality from Complexity and Structure: A Feature-Fused XGBoost Model...
Vignesh V Menon
 
Efficient bitrate ladder construction for live video streaming
Minh Nguyen
 
Shahid presentation
Muhammad Shahid
 
Improving Per-title Encoding for HTTP Adaptive Streaming by Utilizing Video S...
Alpen-Adria-Universität
 
Versatile Video Coding: Compression Tools for UHD and 360° Video
Mathias Wien
 
Bitmovin AV1/VVC Presentation_Streaming Media East by Christian Feldmann
Bitmovin Inc
 
LiveVBR presentation at VQEG NORM.pdf
Vignesh V Menon
 
HTTP Adaptive Streaming – Where Is It Heading?
Alpen-Adria-Universität
 
Perceptually-aware Per-title Encoding for Adaptive Video Streaming
Alpen-Adria-Universität
 
Machine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming
Alpen-Adria-Universität
 
HTTP Adaptive Streaming – Quo Vadis? (2023)
Alpen-Adria-Universität
 
Perceptually-aware Per-title Encoding for Adaptive Video Streaming.pdf
Vignesh V Menon
 
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
Alpen-Adria-Universität
 
VCIP_MCBE_presentation.pdf
Vignesh V Menon
 
[IJET-V1I2P1] Authors :Imran Ullah Khan ,Mohd. Javed Khan ,S.Hasan Saeed ,Nup...
IJET - International Journal of Engineering and Techniques
 
Green Variable framerate encoding for Adaptive Live Streaming
Vignesh V Menon
 
VVC tutorial at ICME 2020 together with Benjamin Bross
Mathias Wien
 
Gain of Grain: A Film Grain Handling Toolchain for VVC-based Open Implementat...
Vignesh V Menon
 
Objective Evaluation of Video Quality
Anton Venema
 
What’s new in MPEG?
Alpen-Adria-Universität
 
Learning Quality from Complexity and Structure: A Feature-Fused XGBoost Model...
Vignesh V Menon
 

More from Vignesh V Menon (19)

PDF
Film Grain Coding for Versatile Video Coding Systems: Techniques, Challenges,...
Vignesh V Menon
 
PDF
A Tutorial on Latency- and Energy-Aware Video Coding and Delivery Streaming S...
Vignesh V Menon
 
PDF
Enhancing Film Grain Coding in VVC: Improving Encoding Quality and Efficiency
Vignesh V Menon
 
PDF
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
Vignesh V Menon
 
PDF
Content_adaptive_video_coding_for_HTTP_Adaptive_Streaming.pdf
Vignesh V Menon
 
PDF
JASLA_presentation.pdf
Vignesh V Menon
 
PDF
Green_VCA_presentation.pdf
Vignesh V Menon
 
PDF
TQPM.pdf
Vignesh V Menon
 
PDF
CAPS_Presentation.pdf
Vignesh V Menon
 
PDF
Doctoral Symposium presentation.pdf
Vignesh V Menon
 
PDF
ETPS_Efficient_Two_pass_Encoding_Scheme_for_Adaptive_Streaming.pdf
Vignesh V Menon
 
PDF
OPSE_Online Per-Scene Encoding for Adaptive HTTP Live Streaming.pdf
Vignesh V Menon
 
PDF
Research@Lunch_Presentation.pdf
Vignesh V Menon
 
PDF
OPTE: Online Per-title Encoding for Live Video Streaming.pdf
Vignesh V Menon
 
PDF
Video Complexity Dataset (VCD).pdf
Vignesh V Menon
 
PDF
Live-PSTR: Live Per-Title Encoding for Ultra HD Adaptive Streaming
Vignesh V Menon
 
PDF
IEEE MMSP'21: INCEPT: Intra CU Depth Prediction for HEVC
Vignesh V Menon
 
PDF
IEEE PCS'21: Efficient multi-encoding for large-scale HTTP Adaptive Streaming...
Vignesh V Menon
 
PDF
IEEE ICIP'22:Efficient Content-Adaptive Feature-based Shot Detection for HTTP...
Vignesh V Menon
 
Film Grain Coding for Versatile Video Coding Systems: Techniques, Challenges,...
Vignesh V Menon
 
A Tutorial on Latency- and Energy-Aware Video Coding and Delivery Streaming S...
Vignesh V Menon
 
Enhancing Film Grain Coding in VVC: Improving Encoding Quality and Efficiency
Vignesh V Menon
 
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
Vignesh V Menon
 
Content_adaptive_video_coding_for_HTTP_Adaptive_Streaming.pdf
Vignesh V Menon
 
JASLA_presentation.pdf
Vignesh V Menon
 
Green_VCA_presentation.pdf
Vignesh V Menon
 
TQPM.pdf
Vignesh V Menon
 
CAPS_Presentation.pdf
Vignesh V Menon
 
Doctoral Symposium presentation.pdf
Vignesh V Menon
 
ETPS_Efficient_Two_pass_Encoding_Scheme_for_Adaptive_Streaming.pdf
Vignesh V Menon
 
OPSE_Online Per-Scene Encoding for Adaptive HTTP Live Streaming.pdf
Vignesh V Menon
 
Research@Lunch_Presentation.pdf
Vignesh V Menon
 
OPTE: Online Per-title Encoding for Live Video Streaming.pdf
Vignesh V Menon
 
Video Complexity Dataset (VCD).pdf
Vignesh V Menon
 
Live-PSTR: Live Per-Title Encoding for Ultra HD Adaptive Streaming
Vignesh V Menon
 
IEEE MMSP'21: INCEPT: Intra CU Depth Prediction for HEVC
Vignesh V Menon
 
IEEE PCS'21: Efficient multi-encoding for large-scale HTTP Adaptive Streaming...
Vignesh V Menon
 
IEEE ICIP'22:Efficient Content-Adaptive Feature-based Shot Detection for HTTP...
Vignesh V Menon
 
Ad

Recently uploaded (20)

PPTX
Capitol Doctoral Presentation -July 2025.pptx
CapitolTechU
 
PDF
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 
PPTX
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
PPTX
Views on Education of Indian Thinkers Mahatma Gandhi.pptx
ShrutiMahanta1
 
PPTX
Accounting Skills Paper-I, Preparation of Vouchers
Dr. Sushil Bansode
 
PPTX
Views on Education of Indian Thinkers J.Krishnamurthy..pptx
ShrutiMahanta1
 
PPTX
How to Manage Access Rights & User Types in Odoo 18
Celine George
 
PPT
digestive system for Pharm d I year HAP
rekhapositivity
 
PPTX
ROLE OF ANTIOXIDANT IN EYE HEALTH MANAGEMENT.pptx
Subham Panja
 
PDF
community health nursing question paper 2.pdf
Prince kumar
 
PPTX
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
PPTX
Nutri-QUIZ-Bee-Elementary.pptx...................
ferdinandsanbuenaven
 
PDF
1, 2, 3… E MAIS UM CICLO CHEGA AO FIM!.pdf
Colégio Santa Teresinha
 
PPTX
Pyhton with Mysql to perform CRUD operations.pptx
Ramakrishna Reddy Bijjam
 
PPTX
How to Configure Lost Reasons in Odoo 18 CRM
Celine George
 
PPTX
How to Define Translation to Custom Module And Add a new language in Odoo 18
Celine George
 
PPTX
Explorando Recursos do Summer '25: Dicas Essenciais - 02
Mauricio Alexandre Silva
 
PDF
IMP NAAC-Reforms-Stakeholder-Consultation-Presentation-on-Draft-Metrics-Unive...
BHARTIWADEKAR
 
PDF
Federal dollars withheld by district, charter, grant recipient
Mebane Rash
 
PDF
The-Beginnings-of-Indian-Civilisation.pdf/6th class new ncert social/by k san...
Sandeep Swamy
 
Capitol Doctoral Presentation -July 2025.pptx
CapitolTechU
 
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
Views on Education of Indian Thinkers Mahatma Gandhi.pptx
ShrutiMahanta1
 
Accounting Skills Paper-I, Preparation of Vouchers
Dr. Sushil Bansode
 
Views on Education of Indian Thinkers J.Krishnamurthy..pptx
ShrutiMahanta1
 
How to Manage Access Rights & User Types in Odoo 18
Celine George
 
digestive system for Pharm d I year HAP
rekhapositivity
 
ROLE OF ANTIOXIDANT IN EYE HEALTH MANAGEMENT.pptx
Subham Panja
 
community health nursing question paper 2.pdf
Prince kumar
 
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
Nutri-QUIZ-Bee-Elementary.pptx...................
ferdinandsanbuenaven
 
1, 2, 3… E MAIS UM CICLO CHEGA AO FIM!.pdf
Colégio Santa Teresinha
 
Pyhton with Mysql to perform CRUD operations.pptx
Ramakrishna Reddy Bijjam
 
How to Configure Lost Reasons in Odoo 18 CRM
Celine George
 
How to Define Translation to Custom Module And Add a new language in Odoo 18
Celine George
 
Explorando Recursos do Summer '25: Dicas Essenciais - 02
Mauricio Alexandre Silva
 
IMP NAAC-Reforms-Stakeholder-Consultation-Presentation-on-Draft-Metrics-Unive...
BHARTIWADEKAR
 
Federal dollars withheld by district, charter, grant recipient
Mebane Rash
 
The-Beginnings-of-Indian-Civilisation.pdf/6th class new ncert social/by k san...
Sandeep Swamy
 
Ad

Convex-hull Estimation using XPSNR for Versatile Video Coding

  • 1. Convex-hull Estimation using XPSNR for Versatile Video Coding — Vignesh V Menon, Christian R Helmrich, Adam Więckowski, Benjamin Bross, Detlev Marpe Video Communication and Applications Dept., Fraunhofer HHI, Germany
  • 3. MHV’24 Introduction 30.10.2024 © Fraunhofer Slide 3 HTTP Adaptive Streaming (HAS) • HTTP Adaptive Streaming (HAS) has become the standard for delivering video content over various internet speeds and devices [1]. • Key Components: ○ Segmented Video Content: Video is encoded at multiple quality levels and split into segments (e.g., 2-10 seconds each). ○ Manifest Files (MPD/HLS Manifest): Provides clients with information about available video bitrates, resolutions, and segment locations. ○ Adaptive Bitrate (ABR) Streaming: Enables real-time switching between different video qualities based on bandwidth, device capability, and buffer state. • Video Coding Relevance: ○ Efficient Encoding: Advanced codecs (e.g., HEVC [2], VVC [3]) enable high compression ratios without compromising quality, allowing adaptive streaming to meet quality and bandwidth requirements effectively. ○ Per-Title/Per-Scene Encoding: Customizes encoding for each title or scene to achieve optimal quality at each bitrate. ○ Quality Metrics (e.g., VMAF, PSNR): Used to select encoding parameters and maintain consistent perceptual quality across bitrates. [1] I. Sodagar, “The MPEG-DASH Standard for Multimedia Streaming Over the Internet,” IEEE MultiMedia, vol. 18, no. 4, pp. 62–67, 2011. [2] G. J. Sullivan et al., “Overview of the High Efficiency Video Coding (HEVC) Standard,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, 2012, pp. 1649–1668. [3] B. Bross et al., “Overview of the Versatile Video Coding (VVC) Standard and its Applications,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, 2021, pp. 3736–3764.
  • 4. MHV’24 Introduction 30.10.2024 © Fraunhofer Slide 4 Coding complexity ● Modern video standards, such as Versatile Video Coding (VVC), provide high compression efficiency but at a cost of significantly increased coding complexity. ● VVC Complexity: Up to 10x more complex than previous standards (e.g., HEVC) [1]. ● Challenge: ○ Computational Demands: Higher resolution and quality requirements increase both encoding and decoding time, impacting streaming latency and energy efficiency [2]. ● Complexity Metrics: ○ Encoding Time: Increased due to finer partitioning, more prediction modes, and higher-level coding tools. ○ Decoding Time: High complexity impacts playback on devices with limited processing power. Figure: Rate-distortion (RD) and rate-decoding time curves of representative sequences of Inter-4K dataset [3] encoded at 540p, 1080p, and 2160p resolutions using VVenC at the faster preset, and decoded using VTM decoder. [1] R. Kaafarani et al., “Evaluation Of Bitrate Ladders For Versatile Video Coder,” in 2021 International Conference on Visual Communications and Image Processing (VCIP), 2021, pp. 1–5. [2] V. V. Menon et al., “Video Super-Resolution for Optimized Bitrate and Green Online Streaming,” in 2024 Picture Coding Symposium (PCS), 2024. [3] A. Stergiou and R. Poppe, “AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling,” in IEEE Transactions on Image Processing, vol. 32, 2023, pp. 251–266.
  • 5. MHV’24 Introduction 30.10.2024 © Fraunhofer Slide 5 Limitations of existing metrics ● PSNR [1] measurements are based on picture-wise MSE ○ Normalized to bit-depth of the assessed images ○ Usually, picture PSNR values averaged on videos ○ Problem: sensitivity of HVS to distortion not fixed ● SSIM [2] is a multiplicative combination of luminance, contrast and structure similarities ● VMAF [3] applies several VQA models ○ Model results fused using machine learning ○ High correlation between VMAF and subjective scores. ○ Problem: high computational complexity, not differentiable, fails for VVC-coded UHD content [4, 5]. [1] Alliance For Telecommunications Industry Solutions, “Objective Video Quality Measurement Using A PeakSignal-To-Noise-Ratio (PSNR) Full Reference Technique,” T1.TR.74-2001, 2001. [2] Z. Wang et al., “Image quality assessment: from error visibility to structural similarity,” in IEEE Transactions on Image Processing, vol. 13, no. 4, 2004, pp. 600–612. [3] Z. Li et al., “VMAF: The journey continues,” Netflix Technology Blog, vol. 25, 2018. [4] M. Wien and V. Baroncini, “Report on VVC compression performance verification testing in the SDR UHD Random Access Category,” in WG 05 MPEG Joint Video Coding Team(s) with ITU-T SG 16, document JVET-T0097, Oct. 2020. [5] C. R. Helmrich et al., “Information on and analysis of the VVC encoders in the SDR UHD verification test,” in WG 05 MPEG Joint Video Coding Team(s) with ITUT SG 16, document JVET-T0103, Oct. 2020 Figure: VMAF computation [3].
  • 6. MHV’24 Introduction 30.10.2024 © Fraunhofer Slide 6 Objective Aim: ● To enhance the efficiency and quality of HTTP Adaptive Streaming in Versatile Video Coding (VVC) using an improved perceptual quality metric, XPSNR, as a reliable alternative to traditional metrics like VMAF. Core Goals: ● Perceptual Quality Optimization: Improve perceptual video quality by leveraging XPSNR, which shows higher correlation with subjective quality scores for UHD content. ● Efficient Bitrate-Resolution Selection: Develop a method for estimating the convex hull online to determine optimal bitrate-resolution pairs, balancing quality and computational efficiency. ● Resource Optimization: Reduce encoding and decoding times through effective bitrate-resolution adaptation, allowing for lower energy consumption and enhanced user experience.
  • 8. MHV’24 XPSNR 30.10.2024 © Fraunhofer Slide 8 Introduction ● XPSNR (eXtended Peak Signal-to-Noise Ratio) is an advanced VQA metric. ● Extension of PSNR: While traditional PSNR lacks correlation with human perception, XPSNR incorporates perceptual considerations, making it more aligned with subjective video quality judgments. ● Low complexity: Designed to maintain low computational complexity, XPSNR is efficient for real-time encoding tasks, particularly in high-resolution content like UHD. ● Defined block-wise for block B (at index k) of input picture pic. ● Also depends on image width W, height H, and bit depth BD. ● Core: visual activity measure from spatio-temporal high-pass Only low-complexity operations, squares and square-roots cancel. [1] C. R. Helmrich et al., “Information on and analysis of the VVC encoders in the SDR UHD verification test,” in WG 05 MPEG Joint Video Coding Team(s) with ITU-T SG 16, document JVET-T0103, Oct. 2020. [2] M. Wien and V. Baroncini, “Report on VVC compression performance verification testing in the SDR UHD Random Access Category,” in WG 05 MPEG Joint Video Coding Team(s) with ITU-T SG 16, document JVET-T0097, Oct. 2020. [3] C. R. Helmrich et al., “A study of the extended perceptually weighted peak signal-to-noise ratio (XPSNR) for video compression with different resolutions and bit depths,” in ITU Journal: ICT Discoveries, vol. 3, May 2020. [Online] http://handle.itu.int/11.1002/pub/8153d78b-en
  • 9. MHV’24 XPSNR 30.10.2024 © Fraunhofer Slide 9 Improvement over VMAF Figure: Comparison of computation times of quality metrics for UHD videos with 240 frames. Table: Evaluation of Pearson linear correlation. Higher values implies higher correlation with MOS scores [1]. Table: Evaluation of Spearman rank correlation [1]. [1] C. R. Helmrich et al., “Information on and analysis of the VVC encoders in the SDR UHD verification test,” in WG 05 MPEG Joint Video Coding Team(s) with ITU-T SG 16, document JVET-T0103, Oct. 2020.
  • 11. MHV’24 VEXUS Architecture 30.10.2024 © Fraunhofer Slide 11 Convex-hull Figure: Conceptual plot to depict the bitrate-quality relationship for any video source encoded at different resolutions [2]. [1] R. Kaafarani et al., “Evaluation Of Bitrate Ladders For Versatile Video Coder,” in 2021 International Conference on Visual Communications and Image Processing (VCIP), 2021, pp. 1–5. [2] A. Aaron et al., "Per-title encode optimization." The Netflix Techblog (2015). [3] A. Stergiou and R. Poppe, “AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling,” in IEEE Transactions on Image Processing, vol. 32, 2023, pp. 251–266. [4] A. Wieckowski et al., “Vvenc: An Open And Optimized VVC Encoder Implementation,” in 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Jul. 2021, pp. 1–2. [5] A. Wieckowski et al., “Towards a live software decoder implementation for the upcoming versatile video coding (vvc) codec,” in Proc. IEEE International Conference on Image Processing (ICIP), pp. 3124–3128. ● The convex hull is where the encoding point achieves “Pareto efficiency” [1, 2]. ● Online convex-hull estimation methods provide a dynamic and adaptive means to optimize bitrate and resolution selections [2]. Figure: Rate-XPSNR curves and decoding times of example video sequences from the employed dataset, Inter4K [3], encoded using VVenC [4] and decoded using VVdeC [5].
  • 12. MHV’24 VEXUS Architecture 30.10.2024 © Fraunhofer Slide 12 Components: ● Input parameters: ○ Set of supported resolutions, ○ Set of bitrates, ○ Maximum supported resolution. ● Outputs: ○ Optimized encoding bitrate ladder. Workflow Figure: Convex-hull estimation and encoding architecture using VEXUS.
  • 13. MHV’24 Spatiotemporal complexity feature extraction 30.10.2024 © Fraunhofer Slide 13 We use seven DCT-energy-based features extracted using Video Complexity Analyzer (VCA) [1, 2]: ● average luma texture energy (EY), ● average gradient of the luma texture energy (h) ● average luma brightness (LY), ● average chroma texture energy of U and V channels (EU and EV) ● average chroma brightness of U and V channels (LU and LV). [1] V. V. Menon et al., “Green Video Complexity Analysis for Efficient Encoding in Adaptive Video Streaming,” in First International ACM Green Multimedia Systems Workshop (GMSys ’23), 2023. [2] V. V. Menon et al., "JND-Aware Two-Pass Per-Title Encoding Scheme for Adaptive Live Streaming," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 2, pp. 1281-1294, Feb. 2024, doi: 10.1109/TCSVT.2023.3290725. Figure: Heatmap of EY (left) and h (right).
  • 14. MHV’24 XPSNR-optimized resolution prediction 30.10.2024 © Fraunhofer Slide 14 ● Process: Two-part approach involving modeling and optimization. ● Goal: Maximize perceptual quality. The perceptual quality of the representation (rt, bt) rely on the extracted video complexity features, encoding resolution, and target bitrate: ● A higher resolution, and/or bitrate may improve the quality. ● LY, LU, and LV features consider variations in luminance and chrominance within localized regions. ● EY, EU, and EV features account for variations in texture across different frame regions, providing insights into how well the compression or reconstruction method preserves textural details. ● h introduces a temporal dimension, where dynamic changes in texture over time influencing perceived quality are captured. Modeling: Optimization: VEXUS optimizes the perceptual quality (in terms of XPSNR) of encoded video segments.
  • 15. MHV’24 Optimized QP prediction 30.10.2024 © Fraunhofer Slide 15 Modeling: The optimized QP is modeled as a function of spatiotemporal features, target bitrate b, and normalized resolution height r‘ as: ● For applications such as streaming, avoiding exceeding the maximum bitrates specified in the HLS/DASH manifests [1, 2] during the encoding process is essential. ● Failure to adhere to these limits can lead to buffer overflows or underflows in video players. [1] I. Sodagar, “The MPEG-DASH Standard for Multimedia Streaming Over the Internet,” IEEE MultiMedia, vol. 18, no. 4, pp. 62–67, 2011. [2] A. Bentaleb, B. Taani, A. C. Begen, C. Timmerer, and R. Zimmermann, “A Survey on Bitrate Adaptation Schemes for Streaming Media Over HTTP,” vol, 21, no. 1, IEEE Communications Surveys Tutorials, pp. 562–585, 2019. QP optimization: The optimization function aims to predict the QP, minimizing the discrepancy between the predicted and target bitrate for a given resolution.
  • 16. MHV’24 Optimized QP prediction 30.10.2024 © Fraunhofer Slide 16 ● The equation captures the non-linear relationship between bitrate and QP by employing a logarithmic mapping of the bitrate values. ● VVenC implemented capped VBR ratecontrol in Jan 2024 release [1], the QP is specified using the qp option, while the maxrate (easy mode) or MaxBitrate (expert mode) option is used to specify the upper bound of bitrate variability. ● This method involves training distinct XGboost regression models for minimum and maximum QP values (qmin and qmax, respectively). ● The optimized for a target bitrate b is determined using linear regression, as follows: Figure: QP versus normalized bitrate (in log scale) for a representative video segment. [1] C. Helmrich, V. George, V. V. Menon, A. Wieckowski, B. Bross, and D. Marpe, “Fast constant-quality video encoding using VVenC with rate capping based on pre-analysis statistics”, 2024 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 2024, pp. 1823-1828, doi: 10.1109/ICIP51287.2024.10647456. Cascaded Approach
  • 18. MHV’24 Experimental setup 30.10.2024 © Fraunhofer Slide 18 ● We used 1000 videos of the Inter-4K dataset [1] to validate the performance of the encoding methods. ● We encoded the sequences at UHD (2160p) 60fps using VVenC v1.10 [2] using preset 0 (faster). ● We extracted the spatiotemporal features using VCA v2.0. ● We ran constant quality encoding by varying qp values from qmin to qmax for each resolution. ● We computed full-reference PSNR and XPSNR quality metrics after the compressed video was upscaled to the original resolution (2160p). Table: Experimental parameters used to evaluate VEXUS. [1] A. Stergiou and R. Poppe, “AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling,” in IEEE Transactions on Image Processing, vol. 32, 2023, pp. 251–266. [2] A. Więckowski, J. Brandenburg, T. Hinz, C. Bartnik, V. George, G. Hege, C. Helmrich, A. Henkel, C. Lehmann, C. Stoffers, I. Zupancic, B. Bross, and D. Marpe, “VVenC: An Open And Optimized VVC Encoder Implementation,” in Proc. IEEE International Conference on Multimedia Expo Workshops (ICMEW), pp. 1–2. Figure: Calculation of the groundtruth PSNR, XPSNR, and bitrate to train the prediction models. This example shows the ground truth calculation of a video encoded at 1080p with qp 30. Dataset
  • 19. MHV’24 Experimental setup 30.10.2024 © Fraunhofer Slide 19 [1] Apple Inc., “HLS Authoring Specification for Apple Devices.” [Online]. Available: https://developer.apple.com/documentation/http-live-streaming/hls-authoring-specification-for-apple-devices [2] J. De Cock et al., “Complexity-based consistent-quality encoding in the cloud,” in 2016 IEEE International Conference on Image Processing (ICIP), 2016, pp. 1484–1488. [3] C. Chen et al., “Optimized Transcoding for Large Scale Adaptive Streaming Using Playback Statistics,” in 2018 25th IEEE International Conference on Image Processing (ICIP), 2018, pp. 3269–3273. Benchmarks 1. Default: This method employs a fixed resolution encoding, i.e., all bitstreams are encoded at the exact resolution as the input video. 2. FixedLadder: This method employs a fixed set of bitrate-resolution pairs. We use the HLS bitrate ladder specified in the Apple authoring specifications [1] as the fixed set of bitrate-resolution pairs. 3. Bruteforce: This method determines optimized resolution, which yields the highest XPSNR for a given target bitrate after an exhaustive encoding process at all supported resolutions and QPs [2, 3]. Table: An example fixed bitrate-ladder, i.e., set of bitrate-resolution pairs. Source: [1]
  • 21. MHV’24 Prediction analysis 30.10.2024 © Fraunhofer Slide 21 ● Accuracy: ○ XPSNR prediction ■ MAE: 0.17 dB, R2: 0.99 ■ Std. dev: 0.22 dB ○ QP prediction ■ MAE: 1.32, R2: 0.97 ■ Std. dev: 1.86 ● Speed of feature extraction: 176 fps ● Model inference time: 4 ms Figure: Prediction results of the XPSNR prediction model. ● XPSNR prediction model ○ XGBoost regressor [1] is trained for each supported resolution. ○ Hyperparameters: max_depth=10, and n_estimators=400. ● QP prediction models ○ XGBoost regressor is trained for qmin and qmax for each supported resolution. [1] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2016, pp. 785–794.
  • 22. MHV’24 Rate-distortion Analysis 30.10.2024 © Fraunhofer Slide 22 RD curve of VEXUS closely mirrors the Bruteforce method, indicating the effectiveness of its predictive modeling in approximating optimized resolutions and QPs. Figure: RD curves of representative video sequences using Default (green line), FixedLadder (blue line), Bruteforce (black line), and VEXUS (red line) encodings.
  • 23. MHV’24 Encoding and decoding times 30.10.2024 © Fraunhofer Slide 23 Figure: Encoding and decoding times of representative video sequences using Default (green line), FixedLadder (blue line), Bruteforce (black line), and VEXUS (red line) encodings. ● Encoding and decoding times are reduced for lower bitrates, as encoding and decoding operations become less complex due to lower resolutions.
  • 24. MHV’24 Result Summary 30.10.2024 © Fraunhofer Slide 24 ● Coding efficiency (in terms of Bjøntegaard Delta [1] rates), encoding and decoding times decrease as rmax decreases. ● The trade-off between quality and coding efficiency is based on the target audience, delivery platform, and available resources. [1] HSTP-VID-WPOM, “Working practices using objective metrics for evaluation of video coding efficiency experiments,” International Telecommunication Union, 2020. [Online]. Available: http://handle.itu.int/11.1002/pub/8160e8da-en Table: Average results of the encoding schemes compared to Default encoding.
  • 26. MHV’24 Conclusions 30.10.2024 © Fraunhofer Slide 26 ● XPSNR demonstrates a better correlation with subjective quality scores for VVC-coded UHD content. ● Leveraging this insight, we introduced an approach where XPSNR is predicted for VVC-coded bitstreams using spatiotemporal complexity features of the video and the target encoding configuration. ● We proposed VEXUS, where the convex-hull is estimated online using the predicted XPSNR. ● On average, VEXUS yields a substantial improvement of 5.84 dB in PSNR and 0.62 dB in XPSNR for the same bitrates compared to the conventional UHD encoding with the VVenC encoder, followed by a 44.43% reduction in overall encoding time, and a 65.46% reduction in overall decoding time using VTM decoder. Open-source tools: 1. VVC encoder: Fraunhofer Versatile Video Encoder (VVenC) v1.10 Available: https://github.com/fraunhoferhhi/vvenc 2. VVC decoder: VTM reference decoder v22.0 Available: https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM 3. Spatiotemporal feature extractor: Video Complexity Analyzer (VCA) v2.0 Available: https://github.com/cd-athena/VCA 4. Convex-hull estimation framework: Available: https://github.com/PhoenixVideo/QADRA
  • 27. Thank you for your attention — ▪ Vignesh V Menon ([email protected])