Machine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming

All rights reserved. ©2020
Machine Learning Based Video Coding
Enhancements for HTTP Adaptive Streaming
ACM MMSys’21 Doctoral Symposium
September 30, 2021
Ekrem Çetinkaya
Christian Doppler Laboratory ATHENA | Alpen-Adria-Universität Klagenfurt | Austria
ekrem.cetinkaya@aau.at | athena.itec.aau.at
1

● Introduction
● Research Questions
● Methodology & Existing Results
● Ongoing & Future Work
● Q & A
Agenda
2

Introduction
3

Video Streaming
Share in the Internet Trafﬁc
82%
Content Characteristics
1 Million
minutes
Video Streamed Every Second
As of 2021
* Cisco VNI Forecast Highlights (2021)
4

HTTP Adaptive Streaming (HAS)
Very Nice Video
Play
Play
5
240
kbps
Client HAS Server
1200
kbps
3500
kbps
480
kbps
2500
kbps
7000
kbps

Video Encoding
Block
Partitioning
Motion
Compensation
Transformation
& Quantization
Entropy Coding
Entropy
Decoding
Inverse
Transformation &
Inverse
Quantization
Inter or Intra
Prediction
Picture Buffer In-loop Filtering
6

Video Codecs
C. Feldmann, “State of Compression Standards - VVC”, 2020, https://bitmovin.com/compression-standards-vvc-2020/
Vanne et.al., “Comparative Rate-Distortion-Complexity Analysis of HEVC and AVC Video Codecs”, TCSVT, 2012
High Efﬁciency Video
Coding (HEVC)
2003
Advanced Video
Coding (AVC)
2013
Versatile Video
Coding (VVC)
2020
170 % 954 %
37 % 35 %
Block size 16x16
Quaternary tree
Supports up to 4K
Block size 64x64
Quaternary tree
Supports up to 8K
Block size 128x128
Multi-type tree
Supports up to
16K, 360° videos
7

Video Encoding with Machine Learning
Block
Partitioning
Motion
Compensation
Transformation
& Quantization
Entropy
Coding
Entropy
Decoding
Inverse
Transformation
& Inverse
Quantization
Inter or Intra
Prediction
Picture Buffer
In-loop
Filtering
Block
Partitioning
Decision
Prediction
Optical Flow
Detection
Mode
Prediction
Angular
Direction
Prediction
Deblocking
with ML
Denoising
with ML
Super-resolution
8

Research Questions
9

10
RQ-1 How to efﬁciently provide multi-rate
representations over a wide range of resolutions for HAS?
RQ-2 How to improve the performance of video codecs
using machine learning?
RQ-4 How to use machine learning to improve
perceptual quality assessment for videos?
RQ-3 How to improve the visual quality of videos using
machine learning?
Why?
🔋High-resolution content is getting more common,
required number of representations for HAS is increasing.
Literature
🗂 ML based approaches are utilized in video codecs to
speed up encoder decision.
🗂 Some attempts in end-to-end ML based video codecs.
Literature
🗂 ML based reﬁnement techniques applied.
🗂 Post-processing in decoded frames to improve
quality.
🗂 Super-resolution for images and videos.
Literature
🗂 ML model is used in VMAF.
🗂 Several more attempts for non-reference perceptual
quality assessment.
Why?
🔋ML based image restoration methods are improving,
however video is mostly ignored. QoE can be increased.
Why?
🔋Finding a reliable metric for perceptual quality is
important as current objective metrics are problematic.
Why?
🔋More complex codecs, many possibilities to apply ML,
still much room for improvement.
Literature
🗂 Choose a reference representation and use its
information to speed up remaining encodings.

Methodology & Existing Results
11

Design and Abstraction Methodology
Design
Propose a solution (algorithm,
concept, protocol, etc.) for a
given problem
Implement
Prototype software
implementation using the
proposed solution
Analyze
Qualitative and quantitative analysis
of the solution
Repeat the cycle
to improve the
solutions
12

● State-of-the-art:
○ Encode the highest quality 1
or the lowest
quality 2
as the reference first then use these
information
● Proposed Method 3
:
○ Encode the highest quality first,
○ Use its information to encode the lowest
quality
○ Use information from both representations to
encode the remaining representations
○ Double bound for CTU search ranges
1
Schroeder, Damien, et al. "Efficient multi-rate video encoding for HEVC-based adaptive HTTP streaming." IEEE Transactions on Circuits and systems for
Video Technology 28.1 (2016): 143-157.
2
B. Guo, Y. Han, J. Wen, "Fast Block Structure Determination in AV1-based Multiple Resolutions Video Encoding," in 2018 IEEE International Conference on
Multimedia and Expo (ICME), San Diego, CA, USA, July 2018.
3
H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari, "Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020 Data Compression Conference
(DCC), Snowbird, UT, USA, 2020, pp. 358-358
QP1
QPN
QPN-1
QP3
QP2
...
13
Fast Multi-rate Encoding (DCC’20)

14
Fast Multi-rate Encoding (DCC’20)

○ Encode the highest quality 1
or the lowest
quality 2
as the reference ﬁrst then use these
information
:
○ Try different quality levels as the reference
representation to determine the best starting
point for parallel encoding
○ Encode the middle quality ﬁrst and use its
information.
○ Upper or lower bound depending on the quality
level
Towards Optimal Multirate Encoding (MMM’21)
1
2
B. Guo, Y. Han, J. Wen, "Fast Block Structure Determination in AV1-based Multiple Resolutions Video Encoding," in 2018 IEEE International Conference on
Multimedia and Expo (ICME), San Diego, CA, USA, July 2018.
3
H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari, "Towards Optimal Multirate Encoding for HTTP Adaptive Streaming," The International MultiMedia
Modeling Conference (MMM), Prague, Czech Republic, 2021
QPN/2
QPN
QP2
QP1
...
15

16
Towards Optimal Multirate Encoding (MMM’21)

○ Use a CNN to predict CTU depth decisions 1
:
○ Train a CNN with encoding information
obtained from the reference representation and
use its decision to encode dependent
representations.
○ Focus on parallel encoding, thus only apply for
bottleneck situations
○ Train different CNNs for different QP targets
1
Kim, Kyungah, and Won Woo Ro. "Fast CU depth decision for HEVC using neural networks." IEEE Transactions on Circuits and Systems for Video Technology
29.5 (2018): 1462-1473.
2
E. Çetinkaya, H. Amirpour, C. Timmerer and M. Ghanbari, “FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Learning,” 2020
IEEE International Conference on Visual Communications and Image Processing (VCIP), Macau, 2020, pp. 87-90.
QPN
CNN
QPN-1
QP1
QP2
...
HEVC
HEVC
HEVC
CNN
HEVC HEVC
17
Fast Multi-rate Encoding with ML (VCIP’20)

18
Fast Multi-rate Encoding with ML (VCIP’20)

○ Use the highest quality representation as the
reference 1
:
○ Train a CNN with encoding information obtained
from the reference representation (the highest
quality from the lowest resolution) and use its
decision to encode dependent representations
○ Improves parallel encoding as well as serial
encoding
○ Train different CNNs for different QP and resolution
targets
1
2
E. Çetinkaya, H. Amirpour, C. Timmerer and M. Ghanbari, "Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine
Learning," in IEEE Open Journal of Signal Processing, vol. 2, pp. 484-495, 2021, doi: 10.1109/OJSP.2021.3078657.
19
Fast Multi-rate and Multi-resolution
Encoding with ML (IEEE OJ-SP)
HEVC
QP1
HEVC
QP2
CNN
HEVC
QPN
CNN
HEVC
..
CNN
HEVC
QP2
CNN
HEVC
QPN
CNN
HEVC
..
CNN
HEVC
QP2
CNN
HEVC
QPN
CNN
HEVC
..
CNN
HEVC
QP1
CNN
HEVC
QP1
CNN
540p
540p
1080p
2160p

20
Fast Multi-rate and Multi-resolution Encoding with ML (IEEE OJ-SP)
Normalized
Encoding
time
HM 16.21 Lower Bound FaRes-ML

Ongoing & Future Work
21

Work Plan
2019
Q4
2020 2021 2022
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2
1. How to efﬁciently provide multi-bitrate representations over a wide
range of resolutions for HAS ?
2. How to improve performance of video codecs using machine learning ?
3. How to improve quality of videos using machine learning ?
4. How to use machine learning to improve perceptual quality assessment
for videos ?
Literature review
DCC’20 Paper
MMM’21 Paper
VCIP’20 Paper
Multi-rate and Multi-resolution Encoding
IEEE OJSP Paper
RQ1
RQ2
RQ3
RQ4
Super-resolution
Literature
Review
Perceptual Quality Assessment with ML
2023
Thesis
Bitrate Ladder Prediction
Literature
Review
Improvement in In-loop Filtering
with ML
Mobile Player Optimization with SR
22
Fast multi-rate encoding for adaptive http streaming
Towards optimal multirate encoding for HTTP
adaptive streaming
FaME-ML: Fast multirate encoding for HTTP adaptive
streaming using machine learning
Fast Multi-Resolution and Multi-Rate Encoding for
HTTP Adaptive Streaming Using Machine Learning

Thank you!
ekrem.cetinkaya@aau.at @ekremcetinkaya_ linkedin.com/in/ekrcet

Machine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming

More Related Content

What's hot (20)

Similar to Machine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming (20)

More from Alpen-Adria-Universität (20)

Recently uploaded (20)

Machine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming