Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resolution Encoding

Introduction to dynamic resolution encoding
2
[1] A. Aaron, Z. Li, M. Manohara, J.D. Cock, D. Ronca, "Per-title encode optimization." The Netflix Techblog (2015).
Motivation
● Traditional dynamic resolution encoding schemes are
based on the fact that one resolution performs better than
others in a scene for a given bitrate range.
● The performance (quality) of resolutions will differ from
one codec to the other.
● The more efficient the codec, the less bitrate it needs
to encode at a specific resolution with acceptable
quality.
Figure: Conceptual plot to depict the bitrate-quality
relationship for any video source encoded at different
resolutions. Source: [1]

3
[2] G. Sullivan, J. Ohm, W. Han, and T. Wiegand, “Overview of the high efficiency video coding (HEVC) standard,” in IEEE Transactions on circuits and systems for video technology, vol. 22, no.
12. IEEE, 2012, pp. 1649–1668.
[3] B. Bross, Y. Wang, Y. Ye, S. Liu, J. Chen, G. Sullivan, and J. Ohm. (2021). Overview of the Versatile Video Coding (VVC) Standard and its Applications. IEEE Transactions on Circuits and
Systems for Video Technology. 31. 3736-3764. 10.1109/TCSVT.2021.3101953.
[4] A. Stergiou and R. Poppe, “AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling,” in IEEE Transactions on Image Processing, vol. 32, 2023, pp. 251–266.
In the context of VVC…
● Versatile Video Coding (VVC) is the successor of High Efficiency Video Coding (HEVC) [2], with better
compression efficiency [3].
● The best resolution for a given bitrate depends on the video complexity.
Figure: Rate-distortion (RD) curves of representative sequences (segments) of Inter-4K [4] dataset encoded at 540p, 1080p, and 2160p
resolutions using VVenC at faster preset. Here, XPSNR is used as the quality metric.

4
Encoding time
● Encoding time increases as the target encoding resolution increases owing to the increased
computational complexity.
Figure: Rate-encoding time curves of representative sequences (segments) of Inter-4K dataset encoded at 540p, 1080p, and 2160p resolutions
using VVenC at faster preset.

5
Encoding time-constraint- Green encoding?
Lower encoding latency aligns with eco-friendly streaming practices by optimizing resource
utilization and minimizing unnecessary energy expenditure.
● Reducing encoding energy consumption (in data centers) is critical in streaming applications
since it contributes to environmental sustainability [5].
● Some researches suggest a pseudo linear relationship between encoding time and encoding
energy consumption [6].
[5] A. Stephens, C. Tremlett-Williams, L. Fitzpatrick, L. Acerini, M. Anderson, and N. Crabbendam, “The Carbon Impacts of Video Streaming,” Jun. 2021.
[6] V. V. Menon, S. Afzal, P. T. Rajendran, K. Schoeffmann, R. Prodan, and C. Timmerer. [n. d.]. Content-Adaptive Variable Framerate Encoding Scheme for Green Live
Streaming, [Online]. Available: http://arxiv.org/abs/2311.08074

Related work
6
Table: Comparison of the state-of-the-art dynamic resolution per-
title encoding methods with LADRE.
[3] J. Cock, Zhi Li, M. Manohara, and A. Aaron, “Complexity-based consistent-quality encoding in the cloud,” in 2016 IEEE International Conference on Image Processing (ICIP), 2016, pp.
1484–1488.
[4] A. Katsenou, J. Sole, and D. R. Bull, “Content-gnostic Bitrate Ladder Prediction for Adaptive Video Streaming,” in 2019 Picture Coding Symposium (PCS), 2019, pp. 1–5.
[5] A. Zabrovskiy, P. Agrawal, C. Timmerer, and R. Prodan, “FAUST: Fast Per-Scene Encoding Using Entropy-Based Scene Detection and Machine Learning,” in 2021 30th Conference of
Open Innovations Association FRUCT, 2021, pp. 292–302.
[6] M. Bhat, J. Thiesse, and P. Le Callet, “Combining Video Quality Metrics To Select Perceptually Accurate Resolution In A Wide Quality Range: A Case Study,” in 2021 IEEE
International Conference on Image Processing (ICIP), 2021, pp. 2164–2168.
[7] V. V. Menon, P. T. Rajendran, C. Feldmann, K. Schoeffmann, M. Ghanbari and C. Timmerer, "JND-Aware Two-Pass Per-Title Encoding Scheme for Adaptive Live Streaming," in IEEE
Transactions on Circuits and Systems for Video Technology, vol. 34, no. 2, pp. 1281-1294, Feb. 2024, doi: 10.1109/TCSVT.2023.3290725.
• Current related work lacks considering encoding
latency constraints while selecting the optimized
encoding resolution.
• Most state-of-the-art methods need pre-encodings
which yield significant latency and energy
consumption.
• As an example, for bruteforce, the total time taken is
the total time of all encodings and the quality
evaluation of those encodings.

Latency-Aware Dynamic Resolution Encoding (LADRE)
7
Figure: Encoding using LADRE envisioned for video streaming.
• Spatiotemporal complexity feature extraction,
• Optimized resolution prediction,
• Optimized rate factor prediction, and
• Constrained variable bitrate (cVBR) encoding using the selected bitrate-resolution-rate factor
combinations.
Proposed architecture

Spatiotemporal complexity feature extraction
8
We use seven DCT-energy-based features extracted using Video Complexity Analyzer (VCA)
[8]:
● average texture energy (EY),
● average gradient of the luma texture energy (h)
● average luma brightness (LY),
● average chroma texture energy of U and V channels (EU and EV)
● average chroma brightness of U and V channels (LU and LV) [8].
[8] V. V. Menon, C. Feldmann, K. Schoeffmann, M. Ghanbari, and C. Timmerer, “Green Video Complexity Analysis for Efficient Encoding in Adaptive Video Streaming,” in First
International ACM Green Multimedia Systems Workshop (GMSys ’23), 2023.

Optimized resolution prediction
9
[9] V. V. Menon et al. Content-Adaptive Variable Framerate Encoding Scheme for Green Live Streaming. 2023. arXiv: 2311.08074 [cs.MM]. url: https://arxiv.org/pdf/2311.08074.
[10] V. V. Menon et al. “Energy-Efficient Multi-Codec Bitrate Ladder Estimation for Adaptive Video Streaming”. In: 2023 International Conference on Visual Communications and Image
Processing (VCIP). 2023..
Modeling
• The perceptual quality and encoding time of the representation (rt , bt ) rely on the
extracted video complexity features, encoding resolution, and target bitrate [9, 10]:
• A higher resolution, and/or bitrate may improve the quality and increase the file size of the
encoded video segment.
• Similarly, a higher resolution, and/or bitrate can increase the encoding duration.

Optimized resolution prediction
10
[11] C. R. Helmrich et al., “Information on and analysis of the VVC encoders in the SDR UHD verification test,” in WG 05 MPEG Joint Video Coding Team(s) with ITU-T SG 16,
document JVET-T0103, Oct. 2020.
[12] M. Wien and V. Baroncini, “Report on VVC compression performance verification testing in the SDR UHD Random Access Category,” in WG 05 MPEG Joint Video Coding
Team(s) with ITU-T SG 16, document JVET-T0097, Oct. 2020.
[13] C. R. Helmrich et al., “A study of the extended perceptually weighted peak signal-to-noise ratio (XPSNR) for video compression with different resolutions and bit depths,” in ITU
Journal: ICT Discoveries, vol. 3, May 2020.
Optimization
• LADRE optimizes the perceptual quality (in terms of XPSNR) of encoded video
segments while adhering to real-time processing constraints. The optimization function
is:
We consider XPSNR as the perceptual quality measure instead of the popular VMAF.
• The correlation of XPSNR with subjective quality scores is higher than VMAF for
VVC-coded UHD bitstreams [11,12,13].

Optimized rate factor prediction
11
[14] V. V Menon, H. Amirpour, M. Ghanbari, and C. Timmerer, “ETPS: Efficient Two-Pass Encoding Scheme for Adaptive Live Streaming,” in Proc. IEEE Int. Conf. Image Process.
(ICIP), Bordeaux, Oct. 2022.
Modeling
• Predicting the rate factor helps ensure consistent video quality throughout the stream.
• It allows the encoder to allocate bits judiciously, preventing under-allocation (resulting
in poor quality) or over-allocation (wasting bandwidth) of bits for encoding [14].
• The mathematical formulation of the rate factor optimization to yield a bitrate as close to
the target bitrate as possible can be expressed as follows:.
Optimization

Experimental design
12
Experimental parameters
Table: Experimental parameters used to evaluate LADRE.
• The Inter-4K dataset [15] is used to validate the performance.
• The sequences are encoded using VVenC v1.10 [16] using preset 0 (faster).
• The spatiotemporal features are extracted using VCA v2.0.
• LADRE uses random forest regression models [17] to predict XPSNR, encoding time, and rate factor
trained for each supported resolution.
• Viewing resolution of UHD is assumed.
[15] A. Stergiou and R. Poppe, “AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling,” in IEEE Transactions on Image Processing, vol. 32, 2023, pp. 251–266.
[16] A. Wieckowski, et. al., “VVenC: An Open And Optimized VVC Encoder Implementation,” in Proc. IEEE International Conference on Multimedia Expo Workshops (ICMEW), pp. 1–2.
[17] L. Breiman, “Random Forests,” Machine Learning, vol. 45, 2001.

Experimental design
13
Benchmark schemes
Table: An example fixed bitrate ladder, i.e., set of
bitrate-resolution pairs. Source: [18].
1. Default: This method employs a fixed set of bitrate-
resolution pairs. We use the HLS bitrate ladder
specified in the Apple authoring specifications [18] as
the fixed set of bitrate-resolution pairs.
2. OPTE: This scheme predicts optimized resolution,
which yields the highest XPSNR for a given target
bitrate.
[18] Apple Inc., “HLS Authoring Specification for Apple Devices.” [Online]. Available: https://developer.apple.com/documentation/http-live-streaming/hls-authoring-specification-for-apple-
devices

Experimental results
14
Figure: RD curves and encoding times of representative video sequences (segments) using default encoding (blue line), OPTE (purple line), LADRE (red line).

Experimental results
15
Table: Average results of the encoding schemes compared to the Default encoding.
● Coding efficiency (in terms of Bjøntegaard Delta [19] rates), and encoding times decrease as the
encoding time constraint is decreased.
● OPTE yields the highest coding efficiency (maximizes XPSNR with no encoding time constraint).
● We yield storage reduction using LADRE.
● Some high bitrate representations are eliminated because it cannot be encoded with the target
maximum encoding time.
● The maximum encoding time for a video segment in parallel encoding environment is shown in the
table below (last column). It is below the desired threshold.
[19] HSTP-VID-WPOM, “Working practices using objective metrics for evaluation of video coding efficiency experiments,” International Telecommunication Union, 2020. [Online].
Available: http://handle.itu.int/11.1002/pub/8160e8da-en

Conclusions
16
● We proposed an encoding latency-aware dynamic resolution per-title encoding scheme
(LADRE) for adaptive streaming applications.
● LADRE includes an optimized resolution prediction, which uses random forest-based
models to estimate bitrate-resolution-rate factor triples for a given video segment based
on its spatial and temporal characteristics.
● On average, LADRE, with a target encoding time constraint of 200 s, yields bitrate
savings of 10.25% and 12.03% to maintain the same PSNR and XPSNR, respectively,
compared to the reference HLS bitrate ladder with a negligible additional latency in
streaming.
● Furthermore, an average decrease of 84.17% in encoding energy consumption is
observed.

Reproducibility
17
LADRE is implemented in a Python-based framework:
● Available: https://github.com/PhoenixVideo/QADRA
● Monotonicity of XPSNR and QP predictions
● Addressed in QADRA using a cascading approach instead of using a single
regression model.

Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resolution Encoding

More Related Content

Similar to Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resolution Encoding (20)

More from Vignesh V Menon (17)

Recently uploaded (20)

Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resolution Encoding