SlideShare a Scribd company logo
Efficient Content-Adaptive Feature-based Shot Detection for
HTTP Adaptive Streaming
Vignesh V Menon, Hadi Amirpour, Mohammad Ghanbari, Christian Timmerer
Christian Doppler Laboratory ATHENA, Institute of Information Technology (ITEC), University of Klagenfurt, Austria
19-22 September 2021
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 1
Outline
1 Introduction
2 Shot detection
3 Proposed Algorithm
4 Evaluation
5 Conclusions and Future Directions
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 2
Introduction
Introduction
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 3
Introduction
Introduction
Background of HTTP Adaptive Streaming (HAS)1
Source: https://bitmovin.com/adaptive-streaming/
Why Adaptive Streaming?
Adapt for a wide range of devices
Adapt for a broad set of Internet speeds
What HAS does?
Each source video is split into segments
Encoded at multiple bitrates, resolutions,
and codecs
Delivered to the client based on the device
capability, network speed etc.
1
A. Bentaleb et al. “A Survey on Bitrate Adaptation Schemes for Streaming Media Over HTTP”. In: IEEE Communications Surveys Tutorials 21.1 (2019),
pp. 562–585. doi: 10.1109/COMST.2018.2862938.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 4
Introduction
Introduction
Multi-shot encoding framework for VoD HAS applications2
Input Video Shot Detection
Shot Encodings
Video Quality Measure
Convex Hull Determination
Encoding Set Generation
Multi-shot Encoding
Encoded Shots
Bitrate Quality Pairs
Bitrate Resolution Pairs
Target Encoding Set
2
Venkata Phani Kumar M, Christian Timmerer, and Hermann Hellwagner. “MiPSO: Multi-Period Per-Scene Optimization For HTTP Adaptive Streaming”. In:
2020 IEEE International Conference on Multimedia and Expo (ICME). 2020, pp. 1–6. doi: 10.1109/ICME46284.2020.9102775.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 5
Shot detection
Shot detection
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 6
Shot detection
Shot Detection
The boundaries between video shots are commonly known as shot transitions or shot-cuts.
The act of segmenting a video sequence into shots is called shot detection.
Objective:
Detect the first picture of each shot and encode it as an Instantaneous Decoder Refresh
(IDR) frame.
Encode the subsequent frames of the new shot based on the first one via motion compen-
sation and prediction.3
3
J.-R Ding and Jar-Ferr Yang. “Adaptive group-of-pictures and scene change detection methods based on existing H.264 advanced video coding information”.
In: Image Processing, IET 2 (May 2008), pp. 85 –94. doi: 10.1049/iet-ipr:20070014.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 7
Shot detection
Shot Detection
Shot transitions can be present in two ways:
hard shot-cuts
gradual shot transitions
The detection of gradual changes is much more difficult owing to the fact it is difficult to
determine the change in the visual information in a quantitative format.
Note
1 Ratio of IDR frames to non-IDR frames is skewed, i.e, uneven distribution.
2 Missed shot-cut detections and wrong IDR placements cause low compression efficiency,
i.e., cost of error is large.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 8
Proposed Algorithm
Proposed Algorithm
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 9
Proposed Algorithm Phase 1: Feature Extraction
Proposed Algorithm
Phase 1: Feature Extraction
Compute texture energy per Coding Tree Unit (CTU)
A DCT-based energy function is used to determine the block-wise feature of each frame
defined as:
Hk =
w
X
i=1
h
X
j=1
e|( ij
wh
)2−1|
|DCT(i − 1, j − 1)| (1)
where w and h are the width and height of the block, and DCT(i, j) is the (i, j)th DCT
component when i + j > 2, and 0 otherwise.
The energy values of CTUs in a frame is averaged to determine the energy per frame.4
4
Michael King, Zinovi Tauber, and Ze-Nian Li. “A New Energy Function for Segmentation and Compression”. In: July 2007, pp. 1647–1650. doi:
10.1109/ICME.2007.4284983.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 10
Proposed Algorithm Phase 1: Feature Extraction
Proposed Algorithm
Phase 1: Feature Extraction
Figure: Hk of Tears of Steel sequence. Black circles denote the regions of shot transitions.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 11
Proposed Algorithm Phase 1: Feature Extraction
Proposed Algorithm
Phase 1: Feature Extraction
hk: Mean Squared Error (MSE) of the CTU level energy values of frame k to that of the
previous frame k − 1, normalized to Hk.
hk =
PM
i=1(Hk(i) − Hk−1(i))2
MHk
(2)
where M denotes the number of CTUs in frame k.
: gradient of h per frame,  given by:
k =
hk−1 − hk
hk−1
(3)
Note
If hk = 0, kth frame is a duplicate of (k − 1)th frame.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 12
Proposed Algorithm Phase 2: Successive Elimination Algorithm
Proposed Algorithm
Phase 2: Successive Elimination Algorithm
Step 1: while Parsing all video frames do
if k  T1 then
k ← IDR-frame, a new shot.
else if k ≤ T2 then
k ← P-frame or B-frame, not a new shot.
T1 , T2 : maximum and minimum threshold for k
Note
The frames are classified into three categories in this step:
1 a new shot
2 not a new shot
3 not decided
In the next steps of the algorithm, only frames of category (3) are considered.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 13
Proposed Algorithm Phase 2: Successive Elimination Algorithm
Proposed Algorithm
Phase 2: Successive Elimination Algorithm
f : video fps
Q : set of frames where T1 ≥   T2
q0: current frame number in the set Q
q−1: previous frame number in the set Q
q1: next frame number in the set Q
Step 2: while Parsing Q do
if q0 − q−1  f and q1 − q0  f then
q0 ← IDR-frame, a new shot.
Eliminate q0 from Q.
Step 3: while Parsing Q do
if q0 − q−1  f and q1 − q0 ≤ f then
compare q0 with q when q is from the subset of Q where q1 − q0 ≤ f
Frame q with the highest  value ← IDR-frame, a new shot.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 14
Proposed Algorithm Phase 2: Successive Elimination Algorithm
Proposed Algorithm
Working Example
Table: Step 1.
Frame Hk 
33 52162 21.68
54 52119 13.51
65 52625 19.21
86 52038 10.12
97 52499 17.34
161 47790 11.53
833 48644 11.49
1409 40367 14.51
1665 35321 19.93
1686 40463 10.72
1889 38475 12.16
2205 37218 10.08
2536 35793 10.49
Table: Step 2.
Frame Hk  q0 − q−1 q1 − q0
33 52162 21.68 33 21
54 52119 13.51 21 11
65 52625 19.21 11 21
86 52038 10.12 21 11
97 52499 17.34 11 64
161 47790 11.53 64 672
833 48644 11.49 672 576
1409 40367 14.51 576 256
1665 35321 19.93 256 21
1686 40463 10.72 21 203
1889 38475 12.16 203 316
2205 37218 10.08 316 331
2536 35793 10.49 331 -
Table: Step 3.
Frame Hk  q0 − q−1 q1 − q0
33 52162 21.68 33 21
54 52119 13.51 21 11
65 52625 19.21 11 21
86 52038 10.12 21 11
97 52499 17.34 11 64
1665 35321 19.93 256 21
1686 40463 10.72 21 203
2536 35793 10.49 331 -
This example uses FunOnTheRiver (24 fps) test sequence. Detected frames to be encoded as
IDR-frames in each step are:
Step 1: -
Step 2: 161, 833, 1409, 1889, 2205
Step 3: 33, 1665, 2536
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 15
Evaluation
Evaluation
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 16
Evaluation
Evaluation
Test Methodology
Test videos: JVET test sequences5 and professionally produced UHD HDR cinematic con-
tent6 having typical multi-scene content
System: Dual-processor server with Intel Xeon Gold 5218R (80 cores, 2.10 GHz)
Benchmark algorithm: default shot detection algorithm in x265
T1 = 50 and T2 = 10 for the proposed algorithm; determined experimentally
Metrics: accuracy, precision, recall,7 and F-measure8
5
Jill Boyce et al. JVET-J1010: JVET common test conditions and software reference configurations. July 2018.
6
M. H. Pinson. “The Consumer Digital Video Library [Best of the Web]”. In: IEEE Signal Processing Magazine 30.4 (2013), pp. 172–174. doi:
10.1109/MSP.2013.2258265.
7
Markus Junker, Rainer Hoch, and Andreas Dengel. “On the Evaluation of Document Analysis Components by Recall, Precision, and Accuracy”. In: (Apr.
2000). doi: 10.1109/ICDAR.1999.791887.
8
Sasaki Yutaka. “The truth of the F-measure”. In: https://www.toyota-ti.ac.jp/Lab/Denshi/COIN/people/yutaka.sasaki/F-measure-YS-26Oct07.pdf. 2007.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 17
Evaluation
Evaluation
Experimental Results
Table: Shot detection results
Video Actual Benchmark algorithm Proposed algorithm
shot-cuts Accuracy Precision Recall F-measure Accuracy Precision Recall F-measure
BigBuckBunny 10 99.88% 100.00% 80.00% 88.89% 100.00% 100.00% 100.00% 100.00%
Dinner 4 99.89% 100.00% 75.00% 85.71% 99.89% 100.00% 75.00% 85.71%
FoodMarket4 2 99.72% - 0% - 99.86% 100.00% 50.00% 66.67%
sintel trailer 14 99.86% 100.00% 85.71% 92.31% 99.93% 100.00% 92.86% 96.30%
snow mnt 3 99.47% - 0% - 99.65% 100.00% 33.33% 50.00%
Tears of Steel 13 99.93% 100.00% 92.31% 96.00 % 100.00% 100.00% 100.00% 100.00%
Busy City 11 99.64% 50.00% 18.18% 26.67% 99.87% 100.00% 63.64% 77.78%
FunOnTheRiver 12 99.60% 0% 0% - 99.80% 85.71% 50.00% 63.16%
Remarks
1 Actual shot-cuts: the ground truth, i.e., the number of real shot transitions in the considered test videos
determined manually.
2 Recall rate of the proposed algorithm is 25% better than the benchmark algorithm.
3 F-measure of the proposed algorithm is 20% higher compared to the benchmark algorithm.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 18
Evaluation
Evaluation
Experimental Results
Table: Detection rate statistics of the algorithms
Algorithm TPR FPR
Benchmark 53.62% 0.03%
Proposed 78.26% 0.01%
Runtime per frame: 0.1% of the total time taken for encoding each frame.
The algorithm needs to be run only once for a video. The decisions made can be used for
all remaining representations in HAS applications.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 19
Conclusions and Future Directions
Conclusions and Future Directions
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 20
Conclusions and Future Directions
Conclusions
Proposed a shot detection algorithm as a feature-based pre-processing step for x265-based
HEVC encoding in VoD HAS applications.
Identified a DCT-based energy function as a feature to determine shot cuts.
Proposed a successive elimination algorithm to remove the false detections during gradual
shot transitions.
The proposed algorithm gives better-balanced shot detections compared to the benchmark
algorithm.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 21
Conclusions and Future Directions
Future Directions
We can extend the work in this paper to compute the relative complexity of the shots to
that of the entire video sequence using the feature metric and predict the ideal bitrate per
resolution for each shot.
As an extension of this work, more encoding parameter decisions like optimal block parti-
tioning, quantization offsets can be predicted.
This work can be extended to support more recent codecs e.g., VVC.
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 22
Conclusions and Future Directions
Q  A
Thank you for your attention!
Vignesh V Menon (vignesh.menon@aau.at)
Hadi Amirpour (hadi.amirpourazarian@aau.at)
Mohammad Ghanbari (ghan@essex.ac.uk)
Christian Timmerer (Christian.Timmerer@aau.at)
Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 23

More Related Content

Similar to IEEE ICIP'22:Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming (20)

PDF
Comparative Study of Different Video Shot Boundary Detection Techniques
ijtsrd
 
PDF
Propose shot boundary detection methods by using visual hybrid features
IJECEIAES
 
PDF
Video indexing using shot boundary detection approach and search tracks
IAEME Publication
 
PDF
Video shot boundary detection based on frames objects comparison and scale-in...
CSITiaesprime
 
PDF
Video Shot Boundary Detection Using The Scale Invariant Feature Transform and...
IJECEIAES
 
PDF
IRJET-Feature Extraction from Video Data for Indexing and Retrieval
IRJET Journal
 
PDF
Content_adaptive_video_coding_for_HTTP_Adaptive_Streaming.pdf
Vignesh V Menon
 
PDF
Content-adaptive Video Coding for HTTP Adaptive Streaming
Alpen-Adria-Universität
 
PDF
Research Inventy : International Journal of Engineering and Science
researchinventy
 
PDF
Scene change detection
Chandra Shekhar Mithlesh
 
PDF
Doctoral Symposium presentation.pdf
Vignesh V Menon
 
PDF
AcademicProject
Anvesh Kolluri
 
PDF
CAPS_Presentation.pdf
Vignesh V Menon
 
PDF
C1 mala1 akila
Jasline Presilda
 
PDF
F0953235
IOSR Journals
 
PDF
Bn32416419
IJERA Editor
 
PDF
Video Coding Enhancements for HTTP Adaptive Streaming
Alpen-Adria-Universität
 
PDF
Research@Lunch_Presentation.pdf
Vignesh V Menon
 
PDF
Video Key-Frame Extraction using Unsupervised Clustering and Mutual Comparison
CSCJournals
 
PDF
OPSE_Online Per-Scene Encoding for Adaptive HTTP Live Streaming.pdf
Vignesh V Menon
 
Comparative Study of Different Video Shot Boundary Detection Techniques
ijtsrd
 
Propose shot boundary detection methods by using visual hybrid features
IJECEIAES
 
Video indexing using shot boundary detection approach and search tracks
IAEME Publication
 
Video shot boundary detection based on frames objects comparison and scale-in...
CSITiaesprime
 
Video Shot Boundary Detection Using The Scale Invariant Feature Transform and...
IJECEIAES
 
IRJET-Feature Extraction from Video Data for Indexing and Retrieval
IRJET Journal
 
Content_adaptive_video_coding_for_HTTP_Adaptive_Streaming.pdf
Vignesh V Menon
 
Content-adaptive Video Coding for HTTP Adaptive Streaming
Alpen-Adria-Universität
 
Research Inventy : International Journal of Engineering and Science
researchinventy
 
Scene change detection
Chandra Shekhar Mithlesh
 
Doctoral Symposium presentation.pdf
Vignesh V Menon
 
AcademicProject
Anvesh Kolluri
 
CAPS_Presentation.pdf
Vignesh V Menon
 
C1 mala1 akila
Jasline Presilda
 
F0953235
IOSR Journals
 
Bn32416419
IJERA Editor
 
Video Coding Enhancements for HTTP Adaptive Streaming
Alpen-Adria-Universität
 
Research@Lunch_Presentation.pdf
Vignesh V Menon
 
Video Key-Frame Extraction using Unsupervised Clustering and Mutual Comparison
CSCJournals
 
OPSE_Online Per-Scene Encoding for Adaptive HTTP Live Streaming.pdf
Vignesh V Menon
 

More from Vignesh V Menon (20)

PDF
Learning Quality from Complexity and Structure: A Feature-Fused XGBoost Model...
Vignesh V Menon
 
PDF
Film Grain Coding for Versatile Video Coding Systems: Techniques, Challenges,...
Vignesh V Menon
 
PDF
Decoding Complexity-Rate-Quality Pareto-Front for Adaptive VVC Streaming
Vignesh V Menon
 
PDF
Energy-Quality-aware Variable Framerate Pareto-Front for Adaptive Video Strea...
Vignesh V Menon
 
PDF
Convex-hull Estimation using XPSNR for Versatile Video Coding
Vignesh V Menon
 
PDF
A Tutorial on Latency- and Energy-Aware Video Coding and Delivery Streaming S...
Vignesh V Menon
 
PDF
Video Super-Resolution for Optimized Bitrate and Green Online Streaming
Vignesh V Menon
 
PDF
Enhancing Film Grain Coding in VVC: Improving Encoding Quality and Efficiency
Vignesh V Menon
 
PDF
Online Bitrate ladder prediction for Adaptive VVC Streaming
Vignesh V Menon
 
PDF
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
Vignesh V Menon
 
PDF
Gain of Grain: A Film Grain Handling Toolchain for VVC-based Open Implementat...
Vignesh V Menon
 
PDF
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
Vignesh V Menon
 
PDF
VCIP_MCBE_presentation.pdf
Vignesh V Menon
 
PDF
Green Variable framerate encoding for Adaptive Live Streaming
Vignesh V Menon
 
PDF
JASLA_presentation.pdf
Vignesh V Menon
 
PDF
Green_VCA_presentation.pdf
Vignesh V Menon
 
PDF
TQPM.pdf
Vignesh V Menon
 
PDF
LiveVBR presentation at VQEG NORM.pdf
Vignesh V Menon
 
PDF
ETPS_Efficient_Two_pass_Encoding_Scheme_for_Adaptive_Streaming.pdf
Vignesh V Menon
 
PDF
Perceptually-aware Per-title Encoding for Adaptive Video Streaming.pdf
Vignesh V Menon
 
Learning Quality from Complexity and Structure: A Feature-Fused XGBoost Model...
Vignesh V Menon
 
Film Grain Coding for Versatile Video Coding Systems: Techniques, Challenges,...
Vignesh V Menon
 
Decoding Complexity-Rate-Quality Pareto-Front for Adaptive VVC Streaming
Vignesh V Menon
 
Energy-Quality-aware Variable Framerate Pareto-Front for Adaptive Video Strea...
Vignesh V Menon
 
Convex-hull Estimation using XPSNR for Versatile Video Coding
Vignesh V Menon
 
A Tutorial on Latency- and Energy-Aware Video Coding and Delivery Streaming S...
Vignesh V Menon
 
Video Super-Resolution for Optimized Bitrate and Green Online Streaming
Vignesh V Menon
 
Enhancing Film Grain Coding in VVC: Improving Encoding Quality and Efficiency
Vignesh V Menon
 
Online Bitrate ladder prediction for Adaptive VVC Streaming
Vignesh V Menon
 
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
Vignesh V Menon
 
Gain of Grain: A Film Grain Handling Toolchain for VVC-based Open Implementat...
Vignesh V Menon
 
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
Vignesh V Menon
 
VCIP_MCBE_presentation.pdf
Vignesh V Menon
 
Green Variable framerate encoding for Adaptive Live Streaming
Vignesh V Menon
 
JASLA_presentation.pdf
Vignesh V Menon
 
Green_VCA_presentation.pdf
Vignesh V Menon
 
TQPM.pdf
Vignesh V Menon
 
LiveVBR presentation at VQEG NORM.pdf
Vignesh V Menon
 
ETPS_Efficient_Two_pass_Encoding_Scheme_for_Adaptive_Streaming.pdf
Vignesh V Menon
 
Perceptually-aware Per-title Encoding for Adaptive Video Streaming.pdf
Vignesh V Menon
 
Ad

Recently uploaded (20)

PPTX
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
PDF
BÀI TẬP BỔ TRỢ THEO LESSON TIẾNG ANH - I-LEARN SMART WORLD 7 - CẢ NĂM - CÓ ĐÁ...
Nguyen Thanh Tu Collection
 
PPTX
Explorando Recursos do Summer '25: Dicas Essenciais - 02
Mauricio Alexandre Silva
 
PPTX
How to Create Rental Orders in Odoo 18 Rental
Celine George
 
PPTX
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
PPTX
Presentation: Climate Citizenship Digital Education
Karl Donert
 
PPTX
SCHOOL-BASED SEXUAL HARASSMENT PREVENTION AND RESPONSE WORKSHOP
komlalokoe
 
PPTX
Latest Features in Odoo 18 - Odoo slides
Celine George
 
PDF
Zoology (Animal Physiology) practical Manual
raviralanaresh2
 
PPTX
PPT on the Development of Education in the Victorian England
Beena E S
 
PDF
Federal dollars withheld by district, charter, grant recipient
Mebane Rash
 
PPTX
HYDROCEPHALUS: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
PPTX
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
PPTX
How to Configure Access Rights of Manufacturing Orders in Odoo 18 Manufacturing
Celine George
 
PDF
CEREBRAL PALSY: NURSING MANAGEMENT .pdf
PRADEEP ABOTHU
 
PPSX
Health Planning in india - Unit 03 - CHN 2 - GNM 3RD YEAR.ppsx
Priyanshu Anand
 
PPTX
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
PPTX
How to Configure Lost Reasons in Odoo 18 CRM
Celine George
 
PDF
DIGESTION OF CARBOHYDRATES,PROTEINS,LIPIDS
raviralanaresh2
 
PDF
The-Beginnings-of-Indian-Civilisation.pdf/6th class new ncert social/by k san...
Sandeep Swamy
 
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
BÀI TẬP BỔ TRỢ THEO LESSON TIẾNG ANH - I-LEARN SMART WORLD 7 - CẢ NĂM - CÓ ĐÁ...
Nguyen Thanh Tu Collection
 
Explorando Recursos do Summer '25: Dicas Essenciais - 02
Mauricio Alexandre Silva
 
How to Create Rental Orders in Odoo 18 Rental
Celine George
 
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
Presentation: Climate Citizenship Digital Education
Karl Donert
 
SCHOOL-BASED SEXUAL HARASSMENT PREVENTION AND RESPONSE WORKSHOP
komlalokoe
 
Latest Features in Odoo 18 - Odoo slides
Celine George
 
Zoology (Animal Physiology) practical Manual
raviralanaresh2
 
PPT on the Development of Education in the Victorian England
Beena E S
 
Federal dollars withheld by district, charter, grant recipient
Mebane Rash
 
HYDROCEPHALUS: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
Optimizing Cancer Screening With MCED Technologies: From Science to Practical...
i3 Health
 
How to Configure Access Rights of Manufacturing Orders in Odoo 18 Manufacturing
Celine George
 
CEREBRAL PALSY: NURSING MANAGEMENT .pdf
PRADEEP ABOTHU
 
Health Planning in india - Unit 03 - CHN 2 - GNM 3RD YEAR.ppsx
Priyanshu Anand
 
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
How to Configure Lost Reasons in Odoo 18 CRM
Celine George
 
DIGESTION OF CARBOHYDRATES,PROTEINS,LIPIDS
raviralanaresh2
 
The-Beginnings-of-Indian-Civilisation.pdf/6th class new ncert social/by k san...
Sandeep Swamy
 
Ad

IEEE ICIP'22:Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming

  • 1. Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming Vignesh V Menon, Hadi Amirpour, Mohammad Ghanbari, Christian Timmerer Christian Doppler Laboratory ATHENA, Institute of Information Technology (ITEC), University of Klagenfurt, Austria 19-22 September 2021 Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 1
  • 2. Outline 1 Introduction 2 Shot detection 3 Proposed Algorithm 4 Evaluation 5 Conclusions and Future Directions Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 2
  • 3. Introduction Introduction Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 3
  • 4. Introduction Introduction Background of HTTP Adaptive Streaming (HAS)1 Source: https://bitmovin.com/adaptive-streaming/ Why Adaptive Streaming? Adapt for a wide range of devices Adapt for a broad set of Internet speeds What HAS does? Each source video is split into segments Encoded at multiple bitrates, resolutions, and codecs Delivered to the client based on the device capability, network speed etc. 1 A. Bentaleb et al. “A Survey on Bitrate Adaptation Schemes for Streaming Media Over HTTP”. In: IEEE Communications Surveys Tutorials 21.1 (2019), pp. 562–585. doi: 10.1109/COMST.2018.2862938. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 4
  • 5. Introduction Introduction Multi-shot encoding framework for VoD HAS applications2 Input Video Shot Detection Shot Encodings Video Quality Measure Convex Hull Determination Encoding Set Generation Multi-shot Encoding Encoded Shots Bitrate Quality Pairs Bitrate Resolution Pairs Target Encoding Set 2 Venkata Phani Kumar M, Christian Timmerer, and Hermann Hellwagner. “MiPSO: Multi-Period Per-Scene Optimization For HTTP Adaptive Streaming”. In: 2020 IEEE International Conference on Multimedia and Expo (ICME). 2020, pp. 1–6. doi: 10.1109/ICME46284.2020.9102775. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 5
  • 6. Shot detection Shot detection Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 6
  • 7. Shot detection Shot Detection The boundaries between video shots are commonly known as shot transitions or shot-cuts. The act of segmenting a video sequence into shots is called shot detection. Objective: Detect the first picture of each shot and encode it as an Instantaneous Decoder Refresh (IDR) frame. Encode the subsequent frames of the new shot based on the first one via motion compen- sation and prediction.3 3 J.-R Ding and Jar-Ferr Yang. “Adaptive group-of-pictures and scene change detection methods based on existing H.264 advanced video coding information”. In: Image Processing, IET 2 (May 2008), pp. 85 –94. doi: 10.1049/iet-ipr:20070014. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 7
  • 8. Shot detection Shot Detection Shot transitions can be present in two ways: hard shot-cuts gradual shot transitions The detection of gradual changes is much more difficult owing to the fact it is difficult to determine the change in the visual information in a quantitative format. Note 1 Ratio of IDR frames to non-IDR frames is skewed, i.e, uneven distribution. 2 Missed shot-cut detections and wrong IDR placements cause low compression efficiency, i.e., cost of error is large. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 8
  • 9. Proposed Algorithm Proposed Algorithm Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 9
  • 10. Proposed Algorithm Phase 1: Feature Extraction Proposed Algorithm Phase 1: Feature Extraction Compute texture energy per Coding Tree Unit (CTU) A DCT-based energy function is used to determine the block-wise feature of each frame defined as: Hk = w X i=1 h X j=1 e|( ij wh )2−1| |DCT(i − 1, j − 1)| (1) where w and h are the width and height of the block, and DCT(i, j) is the (i, j)th DCT component when i + j > 2, and 0 otherwise. The energy values of CTUs in a frame is averaged to determine the energy per frame.4 4 Michael King, Zinovi Tauber, and Ze-Nian Li. “A New Energy Function for Segmentation and Compression”. In: July 2007, pp. 1647–1650. doi: 10.1109/ICME.2007.4284983. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 10
  • 11. Proposed Algorithm Phase 1: Feature Extraction Proposed Algorithm Phase 1: Feature Extraction Figure: Hk of Tears of Steel sequence. Black circles denote the regions of shot transitions. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 11
  • 12. Proposed Algorithm Phase 1: Feature Extraction Proposed Algorithm Phase 1: Feature Extraction hk: Mean Squared Error (MSE) of the CTU level energy values of frame k to that of the previous frame k − 1, normalized to Hk. hk = PM i=1(Hk(i) − Hk−1(i))2 MHk (2) where M denotes the number of CTUs in frame k. : gradient of h per frame, given by: k = hk−1 − hk hk−1 (3) Note If hk = 0, kth frame is a duplicate of (k − 1)th frame. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 12
  • 13. Proposed Algorithm Phase 2: Successive Elimination Algorithm Proposed Algorithm Phase 2: Successive Elimination Algorithm Step 1: while Parsing all video frames do if k T1 then k ← IDR-frame, a new shot. else if k ≤ T2 then k ← P-frame or B-frame, not a new shot. T1 , T2 : maximum and minimum threshold for k Note The frames are classified into three categories in this step: 1 a new shot 2 not a new shot 3 not decided In the next steps of the algorithm, only frames of category (3) are considered. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 13
  • 14. Proposed Algorithm Phase 2: Successive Elimination Algorithm Proposed Algorithm Phase 2: Successive Elimination Algorithm f : video fps Q : set of frames where T1 ≥ T2 q0: current frame number in the set Q q−1: previous frame number in the set Q q1: next frame number in the set Q Step 2: while Parsing Q do if q0 − q−1 f and q1 − q0 f then q0 ← IDR-frame, a new shot. Eliminate q0 from Q. Step 3: while Parsing Q do if q0 − q−1 f and q1 − q0 ≤ f then compare q0 with q when q is from the subset of Q where q1 − q0 ≤ f Frame q with the highest value ← IDR-frame, a new shot. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 14
  • 15. Proposed Algorithm Phase 2: Successive Elimination Algorithm Proposed Algorithm Working Example Table: Step 1. Frame Hk 33 52162 21.68 54 52119 13.51 65 52625 19.21 86 52038 10.12 97 52499 17.34 161 47790 11.53 833 48644 11.49 1409 40367 14.51 1665 35321 19.93 1686 40463 10.72 1889 38475 12.16 2205 37218 10.08 2536 35793 10.49 Table: Step 2. Frame Hk q0 − q−1 q1 − q0 33 52162 21.68 33 21 54 52119 13.51 21 11 65 52625 19.21 11 21 86 52038 10.12 21 11 97 52499 17.34 11 64 161 47790 11.53 64 672 833 48644 11.49 672 576 1409 40367 14.51 576 256 1665 35321 19.93 256 21 1686 40463 10.72 21 203 1889 38475 12.16 203 316 2205 37218 10.08 316 331 2536 35793 10.49 331 - Table: Step 3. Frame Hk q0 − q−1 q1 − q0 33 52162 21.68 33 21 54 52119 13.51 21 11 65 52625 19.21 11 21 86 52038 10.12 21 11 97 52499 17.34 11 64 1665 35321 19.93 256 21 1686 40463 10.72 21 203 2536 35793 10.49 331 - This example uses FunOnTheRiver (24 fps) test sequence. Detected frames to be encoded as IDR-frames in each step are: Step 1: - Step 2: 161, 833, 1409, 1889, 2205 Step 3: 33, 1665, 2536 Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 15
  • 16. Evaluation Evaluation Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 16
  • 17. Evaluation Evaluation Test Methodology Test videos: JVET test sequences5 and professionally produced UHD HDR cinematic con- tent6 having typical multi-scene content System: Dual-processor server with Intel Xeon Gold 5218R (80 cores, 2.10 GHz) Benchmark algorithm: default shot detection algorithm in x265 T1 = 50 and T2 = 10 for the proposed algorithm; determined experimentally Metrics: accuracy, precision, recall,7 and F-measure8 5 Jill Boyce et al. JVET-J1010: JVET common test conditions and software reference configurations. July 2018. 6 M. H. Pinson. “The Consumer Digital Video Library [Best of the Web]”. In: IEEE Signal Processing Magazine 30.4 (2013), pp. 172–174. doi: 10.1109/MSP.2013.2258265. 7 Markus Junker, Rainer Hoch, and Andreas Dengel. “On the Evaluation of Document Analysis Components by Recall, Precision, and Accuracy”. In: (Apr. 2000). doi: 10.1109/ICDAR.1999.791887. 8 Sasaki Yutaka. “The truth of the F-measure”. In: https://www.toyota-ti.ac.jp/Lab/Denshi/COIN/people/yutaka.sasaki/F-measure-YS-26Oct07.pdf. 2007. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 17
  • 18. Evaluation Evaluation Experimental Results Table: Shot detection results Video Actual Benchmark algorithm Proposed algorithm shot-cuts Accuracy Precision Recall F-measure Accuracy Precision Recall F-measure BigBuckBunny 10 99.88% 100.00% 80.00% 88.89% 100.00% 100.00% 100.00% 100.00% Dinner 4 99.89% 100.00% 75.00% 85.71% 99.89% 100.00% 75.00% 85.71% FoodMarket4 2 99.72% - 0% - 99.86% 100.00% 50.00% 66.67% sintel trailer 14 99.86% 100.00% 85.71% 92.31% 99.93% 100.00% 92.86% 96.30% snow mnt 3 99.47% - 0% - 99.65% 100.00% 33.33% 50.00% Tears of Steel 13 99.93% 100.00% 92.31% 96.00 % 100.00% 100.00% 100.00% 100.00% Busy City 11 99.64% 50.00% 18.18% 26.67% 99.87% 100.00% 63.64% 77.78% FunOnTheRiver 12 99.60% 0% 0% - 99.80% 85.71% 50.00% 63.16% Remarks 1 Actual shot-cuts: the ground truth, i.e., the number of real shot transitions in the considered test videos determined manually. 2 Recall rate of the proposed algorithm is 25% better than the benchmark algorithm. 3 F-measure of the proposed algorithm is 20% higher compared to the benchmark algorithm. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 18
  • 19. Evaluation Evaluation Experimental Results Table: Detection rate statistics of the algorithms Algorithm TPR FPR Benchmark 53.62% 0.03% Proposed 78.26% 0.01% Runtime per frame: 0.1% of the total time taken for encoding each frame. The algorithm needs to be run only once for a video. The decisions made can be used for all remaining representations in HAS applications. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 19
  • 20. Conclusions and Future Directions Conclusions and Future Directions Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 20
  • 21. Conclusions and Future Directions Conclusions Proposed a shot detection algorithm as a feature-based pre-processing step for x265-based HEVC encoding in VoD HAS applications. Identified a DCT-based energy function as a feature to determine shot cuts. Proposed a successive elimination algorithm to remove the false detections during gradual shot transitions. The proposed algorithm gives better-balanced shot detections compared to the benchmark algorithm. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 21
  • 22. Conclusions and Future Directions Future Directions We can extend the work in this paper to compute the relative complexity of the shots to that of the entire video sequence using the feature metric and predict the ideal bitrate per resolution for each shot. As an extension of this work, more encoding parameter decisions like optimal block parti- tioning, quantization offsets can be predicted. This work can be extended to support more recent codecs e.g., VVC. Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 22
  • 23. Conclusions and Future Directions Q A Thank you for your attention! Vignesh V Menon ([email protected]) Hadi Amirpour ([email protected]) Mohammad Ghanbari ([email protected]) Christian Timmerer ([email protected]) Vignesh V Menon Efficient Content-Adaptive Feature-based Shot Detection for HTTP Adaptive Streaming 23