SlideShare a Scribd company logo
A COMPREHENSIVE SURVEY ON
VIDEO FRAME INTERPOLATION
TECHNIQUES - 2020
BY ANIL SINGH PARIHAR · DISHA VARSHNEY · KSHITIJA
PANDYA · ASHRAY AGGARWAL
PRESENTED BY :
SHAHABUDDIN AHMED
ROLL NO - 220EE1473
UNDER THE SUPERVISION OF :
DR. DIPTI PATRA
CONTENTS
Overview of The Survey
Benchmark Datasets
Video Frame interpolation Methods
Key Aspects:
 High Visual Quality
 Low Complexity
 High Efficiency of Interpolated Output Image
VFI Appications
Video post-processing
Surveillance
Video Restoration Tasks.
OVERVIEW OF THIS SURVEY
 A comprehensive evaluation of the advanced deep learning-based methods
developed in the past decade, thereby assisting readers with a detailed overview
of comparative performance and research results of recent state-of-the-art
methods.
 Insightful analysis of system requirements, algorithm complexity, quality of
results, and characterization of methods based on different schemes of image
processing and frame rate conversion, emphasizing the pros and cons of these
methods outlined in reviewed works, and
 discussion of potential challenges of frame rate conversion domain for
identification of shortcomings of existing techniques and aligning potential
directions for future
VIDEO FRAME INTERPOLATION METHODS
BENCHMARK DATASETS
Most Frequently used Datasets : -
UCF101
MIDDLEBURY
VIMEO – 90K
Other Relevant Datasets : -
ADOBE240
KITTI
DAVIS
UCF101
 UCF101 is an action recognition dataset collected from user-uploaded
YouTube videos.
 There are 13320 videos divided into 101 categories in the UCF101 dataset.
 In total, 101 action categories are segregated into 25 groups, each having
4-7 videos of the action
 Videos in the same group generally contain some common features like
object appearances and background.
MIDDLEBURY
 There are two sub- sets in Middlebury datasets. First, the Other set, which
provides the ground-truth middle frames, second the Evaluation set,
which hides the ground truth and is evaluated by uploading the results to
the benchmark website.
 There are four types of data to test different aspects of optical flow
algorithms:
 (1) sequences with non-rigid motion where the ground-truth flow is
determined by tracking hidden fluorescent texture.
 (2) realistic synthetic sequences.
 (3) high-frame-rate video used to study interpolation error.
 (4) modified stereo sequences of static scenes.
VIMEO – 90K
 It contains more than 89,000 videos of 720p or higher resolution, which
are downloaded from the Vimeo video sharing platform.
 The motion of objects in the Vimeo-90k dataset is much larger than that
of UCF10.
 It includes 51,313 triplets for training. Each triplet is made up of 3
consecutive video frames with a resolution of 448×256 pixels.
 Authors have trained their networks to predict the middle frame (i.e., t =
0.5) of each triplet. There are 3,782 triplets in the test set of this dataset,
and hence, it is used widely for comparing per- formances of different
algorithms due to the high-quality videos.
Other DATASETs
Xiph
DAVIS
KITTI
CNN and kernel-based Methods
Paper /Article Authors and
Year
Research
Flownet : Learning Optical Flow with
Convolutional Networks
Dosovitskiy
et. al.
(2015)
The network is entirely convolutional and can be trained using
any triplet of an image sequence, which can be of different
resolutions. They used the Charbonneir loss function and
trained it on the KITTI dataset.
Learning Image Matching by Simply
Watching Video
Long et. Al.
(2016)
They developed a deep CNN which
can be trained without any supervision They exploited the
temporal coherency that occurred naturally in the real-world
videos and used it to calculate sensitivity maps, i.e., gradient
with respect to input via back propagation.
Visual Dynamics: Probabilistic future
frame synthesis via Cross – Convolutional
Networks
Xue et al.
(2016)
Their network predicts multiple extrapolated frames from a
single frame. The proposed network consists of five
components. First is a variational auto-encoder to encompass
motion information. Second is a kernel decoder which learns
motion kernels from the output of the above motion encoder.
The third is an image encoder which estimated feature maps
from an image. Furthermore, the next is their novel cross-
convolutional layer which convolves the feature maps with
motion kernels. And then finally, a regressor.
Video Frame Interpolation via Adaptive
Convolution
Niklaus et. al.
(2017)
They proposed a fully convolutional deep neural network which
predicts spatially adaptive convolutional kernels for each pixel
from the two given successive input frames. They calculated a
separate kernel for each pixel in the interpolated frame to
estimate the value of the pixel. The predicted spatially adaptive
pixel-wise convolution kernels are then convolved with the
input frames to generate the interpolated frame.
Paper /Article Authors and
Year
Research
Video Frame Interpolation via Adaptive
Separable Convolution
Niklaus et. al.
(2017)
They improved on their previous approach by using separable
convolutions. Instead of estimating the whole kernel for each
pixel, they estimated spatially adaptive pairs of 1D convolution
kernels for each pixel, thus reducing the parameters to be
estimated. It optimizes the algorithm within the allowable range.
Video Frame Synthesis using Deep Voxel
Flow
Liu et al.
(2017)
It calculates dense voxel
Flow and uses it to generate an interpolated frame. These voxels
encase the motion changes in the temporal domain, and
intermediate frames can be generated using trilinear
interpolation.
They proposed an end to end full differentiable
network that adopts a fully convolutional encoder–decoder
architecture with a bottleneck layer that calculates voxel.
Deep Video Frame Interpolation using Cyclic
Frame Generation.
Liu et al.
(2019)
It introduces a novel loss called cycle consistency loss, which can
be integrated with any frame interpolation method. They
postulated that given three consecutive frames I1, I2, and I3, the
frames generated by I1-I2 and I2-I3 would generate another frame
that will be bounded by frame I2 in a cyclic manner.
Tracking and Frame Rate Enhancement for
real-time 2D human pose estimation
Vidanpathirana et al.
(2019)
It attempted to reduce optical flow errors by designing a pose
tracking system that operates on a system of queues in a multi-
threaded environment and provides a fast point tracking solution
to boost the frame rate of pose estimation system.
Paper /Article Authors and
Year
Research
AdaCoF : Adaptive collaboration of flows
for video frame interpolation
Lee et al.
(2020)
designed the latest warping module called adaptive collaboration
of flows (AdaCof) based on an operation that uses any number of
pixels and any location.
Video Frame Interpolation via deformable
Separable Convolution
Cheng et. al.
(2020)
They proposed to use more relevant pixels to estimate kernels
adaptively calling it deformable separable convolution (DSepConv),
hence using smaller kernel size with relevant features for handling
large motion. DSepConv uses the encoder–decoder network for
feature Extraction.
Multiple Video Frame Interpolation via
Enhanced Deformable Separable
Convolution.
Cheng et. al.
(2020)
They were able to reduce the number of
parameters to be trained while maintaining the same results.
They were also able to generate multiple interpolated frames
between two consecutive frames making it the first kernelbased
approach to do so. However, the results for arbitrary
time interpolation were not as good as state-of-the-art flowbased
approaches.
Flow Based Methods
 High-quality video frame interpolation often depends on precise motion
estimation techniques that train mathematical or deep learning models to
establish a strong correlation between consecutive frames in order to preserve
the continuity of flow, based on the actual optical displacement of flow vectors
and trajectory of visual components via relevant occlusion reasoning and color
consistency methods. [1]
 So far, flow-based methods have managed to achieve comparable results
parallel to the latest GAN [2], CNN, hybrid technologies, and are evolving
evidently to outstand heavy computational requirements of deep learning
methods on real-time benchmarks.
[1] - Yazdi et. Al. ,Real-time object tracking based on an adaptive transition model and extended kalman filter (2019)
[2] – Caballero et. al. , Frame interpolation with multiscale deep loss functions and generative adversarial networks. (2017)
Path selective interpolation
 Dhruv Mahajan et al. (2009) [40] proposed a path framework parallel to an inverse
optical flow approach that computes background motion of arbitrary intermediate
pixel ‘p’ in the input frames.
 Bo Yan et al. (2013) improved the scheme mentioned above by introducing two
leading innovations: first, using standard optical flow to supervise path direction
by constraining path length.
 Yizhou Fan et al. (2016) further guided the optimization process of path
construction by collaborating conventional path-based framework of Mahajan and
Bo Yan with useful feature points extracted from input frames. Integrating
semantic information identifies critical pixels in input frames. It supervises the
method for accurate motion pattern recognition via optimal energy minimization ,
thus avoiding wrong path selection and achieving more natural results.
GAN-based interpolation
• Inspired by the evident success of deep convolutional neural networks in video
processing tasks and thorough research in the field of GANs by Goodfellow since
2014, the first GAN interpolation network designed by Mark Koren et al.
(FINNiGAN [111]) in 2016 managed to cut traditional “ghosting” and “tearing”
artifacts and compensated for fast-motion discrepancies. The general architecture
of a GAN framework. Few developments have been discussed below.
• They enhanced frame rate by utilizing the convolutional neural network
framework collaborated with generative adversarial networks known as FINNiGAN.
• The idea is to prevent common structural information loss during up-sampling by
using a SIN (structure interpolation network) that produces the structure of the
intermediate frame
• ZheHu et al. (2018) [112] proposed amulti-scale structure to share parameters
across various layers and cut costly global optimization, unlike usual flow-based
methods. MSFSN (multi-scale frame synthesis network) became the first model to
provide flexibility by parametrizing the temporal locus of the interest frame.
• Chenguang Li et al. (2018) [113] exploited multi-scale CNN architecture to support
the long-varied motion and employed additional WGAN-GP (Wasserstein
generative adversarial network loss with gradient penalty) [114] to achieve more
natural results. The model has a slim generator network structure requiring
relatively less storage and high-speed processing due to residual structure and
cumulative generation of high-resolution frames.
Conclusion
This paper puts forward a comprehensive survey of classic video frame
interpolation techniques using deep learning. They present a broad
overview of existing widely used benchmark evaluations. The five
modalities exhibit their unique features and branch to diverse choices
of deep learning techniques to utilize their properties productively.
The inherent spatial, temporal, and structural attributes of a video
sequence are identified. From the aspect of spatiotemporal structural
encoding, we highlight the pros and cons of available techniques.
Based on key insights underlined by the survey, the problem of video
frame interpolation contains promising research opportunities.
New techniques in deep learnin will enlarge the scope of improvement
of frame interpolation. GAN-based training paradigms show a
promising future in the field of frame interpolation. Combining frame
interpolation with other video processing tasks also seems to interest
researchers.
THANK YOU

More Related Content

PDF
AN EFFICIENT MODEL FOR VIDEO PREDICTION
IRJET Journal
 
PDF
AN EFFICIENT MODEL FOR VIDEO PREDICTION
IRJET Journal
 
PDF
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
CSCJournals
 
PDF
Design and Analysis of Quantization Based Low Bit Rate Encoding System
ijtsrd
 
PDF
An Intelligent Approach for Effective Retrieval of Content from Large Data Se...
IJCSIS Research Publications
 
PDF
A Study on Algorithms for Block Motion Estimation in Video Coding
Associate Professor in VSB Coimbatore
 
PDF
A New Approach for video denoising and enhancement using optical flow Estimation
IRJET Journal
 
PDF
Image super resolution using Generative Adversarial Network.
IRJET Journal
 
AN EFFICIENT MODEL FOR VIDEO PREDICTION
IRJET Journal
 
AN EFFICIENT MODEL FOR VIDEO PREDICTION
IRJET Journal
 
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
CSCJournals
 
Design and Analysis of Quantization Based Low Bit Rate Encoding System
ijtsrd
 
An Intelligent Approach for Effective Retrieval of Content from Large Data Se...
IJCSIS Research Publications
 
A Study on Algorithms for Block Motion Estimation in Video Coding
Associate Professor in VSB Coimbatore
 
A New Approach for video denoising and enhancement using optical flow Estimation
IRJET Journal
 
Image super resolution using Generative Adversarial Network.
IRJET Journal
 

Similar to 06-08 ppt.pptx (20)

PDF
Parking Surveillance Footage Summarization
IRJET Journal
 
PDF
real time embedded objct detection and tracking in zynq soc
archanadeiva
 
PDF
kanimozhi2019.pdf
AshrafDabbas1
 
PPTX
Pipeline anomaly detection
GauravBiswas9
 
PDF
Residual balanced attention network for real-time traffic scene semantic segm...
IJECEIAES
 
PDF
An Analysis of Various Deep Learning Algorithms for Image Processing
vivatechijri
 
PDF
IRJET-Multiple Object Detection using Deep Neural Networks
IRJET Journal
 
PDF
Video saliency detection using modified high efficiency video coding and back...
International Journal of Reconfigurable and Embedded Systems
 
PDF
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
IJAEMSJORNAL
 
PDF
Review On Different Feature Extraction Algorithms
IRJET Journal
 
PDF
Human Action Recognition in Videos
IRJET Journal
 
PDF
1-s2.0-S09252312240168zádgfsdgdfg01-main.pdf
ssuser1c6d971
 
PDF
Matlab 2013 14 papers astract
IGEEKS TECHNOLOGIES
 
PDF
Key Frame Extraction for Salient Activity Recognition
Suhas Pillai
 
PDF
Deep learning-based switchable network for in-loop filtering in high efficie...
IJECEIAES
 
PDF
The International Journal of Engineering and Science (The IJES)
theijes
 
PDF
Dataset creation for Deep Learning-based Geometric Computer Vision problems
PetteriTeikariPhD
 
PDF
A Video Processing based System for Counting Vehicles
IRJET Journal
 
PDF
IRJET- Analysis of Vehicle Number Plate Recognition
IRJET Journal
 
PDF
IRJET- Semantic Segmentation using Deep Learning
IRJET Journal
 
Parking Surveillance Footage Summarization
IRJET Journal
 
real time embedded objct detection and tracking in zynq soc
archanadeiva
 
kanimozhi2019.pdf
AshrafDabbas1
 
Pipeline anomaly detection
GauravBiswas9
 
Residual balanced attention network for real-time traffic scene semantic segm...
IJECEIAES
 
An Analysis of Various Deep Learning Algorithms for Image Processing
vivatechijri
 
IRJET-Multiple Object Detection using Deep Neural Networks
IRJET Journal
 
Video saliency detection using modified high efficiency video coding and back...
International Journal of Reconfigurable and Embedded Systems
 
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
IJAEMSJORNAL
 
Review On Different Feature Extraction Algorithms
IRJET Journal
 
Human Action Recognition in Videos
IRJET Journal
 
1-s2.0-S09252312240168zádgfsdgdfg01-main.pdf
ssuser1c6d971
 
Matlab 2013 14 papers astract
IGEEKS TECHNOLOGIES
 
Key Frame Extraction for Salient Activity Recognition
Suhas Pillai
 
Deep learning-based switchable network for in-loop filtering in high efficie...
IJECEIAES
 
The International Journal of Engineering and Science (The IJES)
theijes
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
PetteriTeikariPhD
 
A Video Processing based System for Counting Vehicles
IRJET Journal
 
IRJET- Analysis of Vehicle Number Plate Recognition
IRJET Journal
 
IRJET- Semantic Segmentation using Deep Learning
IRJET Journal
 
Ad

Recently uploaded (20)

PDF
[Cameron] Robust Inference for Regression with Clustered Data - slides (2015)...
soarnagi1
 
PDF
Eni 2023 Second Quarter Results - July 2025
Eni
 
PPTX
Chapter One. Basics of public finance and taxation
kumlachewTegegn1
 
PPTX
creation economic value Chapter 2 - PPT.pptx
ahmed5156
 
PDF
Stormy Decade - A Ten-Year Retrospective on the Ukrainian Investment Landscape
Ukrainian Venture Capital and Private Equity Association
 
PDF
CV of Dr.Choen Krainara Thai National, Nonthaburi City
Dr.Choen Krainara
 
PDF
STEM Education in Rural Maharashtra by Abhay Bhutada Foundation
Heera Yadav
 
PDF
Asia’s Top 10 Hospital CEOs Transforming Healthcare in 2025
Gorman Bain Capital
 
PDF
SCB EIC expects CLMV outlook to face diverging risks amid global trade headwinds
SCBEICSCB
 
PPTX
BCom_First_Sem_Accountancy_PPT.pptx FOR accademic purpose
anandakrishnanexcell
 
PPTX
Shared Finance Service Transformation.pptx
zakishaikh26
 
PDF
FESE Capital Markets Fact Sheet 2025 Q2.pdf
secretariat4
 
PPTX
01_4E - Ten Principles of Economics.pptx
AzelChio
 
PPTX
办理加利福尼亚大学圣芭芭拉分校文凭|购买UCSB毕业证录取通知书学位证书
1cz3lou8
 
PPTX
Accounting for Managers and businesses .pptx
Nikita Bhardwaj
 
PPTX
Gilgit Baltistan Budget 2025-26 PRESENTATION.pptx
SheikhInayat7
 
PPTX
Judaism-group-1.pptx for reporting grade 11
ayselprettysomuch
 
PDF
Cryptocurrency Wallet Security Protecting Your Digital Assets.pdf
Kabir Singh
 
PPTX
Hard Money Lender Construction Loans: HML Investments
HML Investments
 
PPTX
Internal-Controls powerpoint presentation
GamePro14
 
[Cameron] Robust Inference for Regression with Clustered Data - slides (2015)...
soarnagi1
 
Eni 2023 Second Quarter Results - July 2025
Eni
 
Chapter One. Basics of public finance and taxation
kumlachewTegegn1
 
creation economic value Chapter 2 - PPT.pptx
ahmed5156
 
Stormy Decade - A Ten-Year Retrospective on the Ukrainian Investment Landscape
Ukrainian Venture Capital and Private Equity Association
 
CV of Dr.Choen Krainara Thai National, Nonthaburi City
Dr.Choen Krainara
 
STEM Education in Rural Maharashtra by Abhay Bhutada Foundation
Heera Yadav
 
Asia’s Top 10 Hospital CEOs Transforming Healthcare in 2025
Gorman Bain Capital
 
SCB EIC expects CLMV outlook to face diverging risks amid global trade headwinds
SCBEICSCB
 
BCom_First_Sem_Accountancy_PPT.pptx FOR accademic purpose
anandakrishnanexcell
 
Shared Finance Service Transformation.pptx
zakishaikh26
 
FESE Capital Markets Fact Sheet 2025 Q2.pdf
secretariat4
 
01_4E - Ten Principles of Economics.pptx
AzelChio
 
办理加利福尼亚大学圣芭芭拉分校文凭|购买UCSB毕业证录取通知书学位证书
1cz3lou8
 
Accounting for Managers and businesses .pptx
Nikita Bhardwaj
 
Gilgit Baltistan Budget 2025-26 PRESENTATION.pptx
SheikhInayat7
 
Judaism-group-1.pptx for reporting grade 11
ayselprettysomuch
 
Cryptocurrency Wallet Security Protecting Your Digital Assets.pdf
Kabir Singh
 
Hard Money Lender Construction Loans: HML Investments
HML Investments
 
Internal-Controls powerpoint presentation
GamePro14
 
Ad

06-08 ppt.pptx

  • 1. A COMPREHENSIVE SURVEY ON VIDEO FRAME INTERPOLATION TECHNIQUES - 2020 BY ANIL SINGH PARIHAR · DISHA VARSHNEY · KSHITIJA PANDYA · ASHRAY AGGARWAL PRESENTED BY : SHAHABUDDIN AHMED ROLL NO - 220EE1473 UNDER THE SUPERVISION OF : DR. DIPTI PATRA
  • 2. CONTENTS Overview of The Survey Benchmark Datasets Video Frame interpolation Methods
  • 3. Key Aspects:  High Visual Quality  Low Complexity  High Efficiency of Interpolated Output Image
  • 5. OVERVIEW OF THIS SURVEY  A comprehensive evaluation of the advanced deep learning-based methods developed in the past decade, thereby assisting readers with a detailed overview of comparative performance and research results of recent state-of-the-art methods.  Insightful analysis of system requirements, algorithm complexity, quality of results, and characterization of methods based on different schemes of image processing and frame rate conversion, emphasizing the pros and cons of these methods outlined in reviewed works, and  discussion of potential challenges of frame rate conversion domain for identification of shortcomings of existing techniques and aligning potential directions for future
  • 7. BENCHMARK DATASETS Most Frequently used Datasets : - UCF101 MIDDLEBURY VIMEO – 90K Other Relevant Datasets : - ADOBE240 KITTI DAVIS
  • 8. UCF101  UCF101 is an action recognition dataset collected from user-uploaded YouTube videos.  There are 13320 videos divided into 101 categories in the UCF101 dataset.  In total, 101 action categories are segregated into 25 groups, each having 4-7 videos of the action  Videos in the same group generally contain some common features like object appearances and background.
  • 9. MIDDLEBURY  There are two sub- sets in Middlebury datasets. First, the Other set, which provides the ground-truth middle frames, second the Evaluation set, which hides the ground truth and is evaluated by uploading the results to the benchmark website.  There are four types of data to test different aspects of optical flow algorithms:  (1) sequences with non-rigid motion where the ground-truth flow is determined by tracking hidden fluorescent texture.  (2) realistic synthetic sequences.  (3) high-frame-rate video used to study interpolation error.  (4) modified stereo sequences of static scenes.
  • 10. VIMEO – 90K  It contains more than 89,000 videos of 720p or higher resolution, which are downloaded from the Vimeo video sharing platform.  The motion of objects in the Vimeo-90k dataset is much larger than that of UCF10.  It includes 51,313 triplets for training. Each triplet is made up of 3 consecutive video frames with a resolution of 448×256 pixels.  Authors have trained their networks to predict the middle frame (i.e., t = 0.5) of each triplet. There are 3,782 triplets in the test set of this dataset, and hence, it is used widely for comparing per- formances of different algorithms due to the high-quality videos.
  • 13. Paper /Article Authors and Year Research Flownet : Learning Optical Flow with Convolutional Networks Dosovitskiy et. al. (2015) The network is entirely convolutional and can be trained using any triplet of an image sequence, which can be of different resolutions. They used the Charbonneir loss function and trained it on the KITTI dataset. Learning Image Matching by Simply Watching Video Long et. Al. (2016) They developed a deep CNN which can be trained without any supervision They exploited the temporal coherency that occurred naturally in the real-world videos and used it to calculate sensitivity maps, i.e., gradient with respect to input via back propagation. Visual Dynamics: Probabilistic future frame synthesis via Cross – Convolutional Networks Xue et al. (2016) Their network predicts multiple extrapolated frames from a single frame. The proposed network consists of five components. First is a variational auto-encoder to encompass motion information. Second is a kernel decoder which learns motion kernels from the output of the above motion encoder. The third is an image encoder which estimated feature maps from an image. Furthermore, the next is their novel cross- convolutional layer which convolves the feature maps with motion kernels. And then finally, a regressor. Video Frame Interpolation via Adaptive Convolution Niklaus et. al. (2017) They proposed a fully convolutional deep neural network which predicts spatially adaptive convolutional kernels for each pixel from the two given successive input frames. They calculated a separate kernel for each pixel in the interpolated frame to estimate the value of the pixel. The predicted spatially adaptive pixel-wise convolution kernels are then convolved with the input frames to generate the interpolated frame.
  • 14. Paper /Article Authors and Year Research Video Frame Interpolation via Adaptive Separable Convolution Niklaus et. al. (2017) They improved on their previous approach by using separable convolutions. Instead of estimating the whole kernel for each pixel, they estimated spatially adaptive pairs of 1D convolution kernels for each pixel, thus reducing the parameters to be estimated. It optimizes the algorithm within the allowable range. Video Frame Synthesis using Deep Voxel Flow Liu et al. (2017) It calculates dense voxel Flow and uses it to generate an interpolated frame. These voxels encase the motion changes in the temporal domain, and intermediate frames can be generated using trilinear interpolation. They proposed an end to end full differentiable network that adopts a fully convolutional encoder–decoder architecture with a bottleneck layer that calculates voxel. Deep Video Frame Interpolation using Cyclic Frame Generation. Liu et al. (2019) It introduces a novel loss called cycle consistency loss, which can be integrated with any frame interpolation method. They postulated that given three consecutive frames I1, I2, and I3, the frames generated by I1-I2 and I2-I3 would generate another frame that will be bounded by frame I2 in a cyclic manner. Tracking and Frame Rate Enhancement for real-time 2D human pose estimation Vidanpathirana et al. (2019) It attempted to reduce optical flow errors by designing a pose tracking system that operates on a system of queues in a multi- threaded environment and provides a fast point tracking solution to boost the frame rate of pose estimation system.
  • 15. Paper /Article Authors and Year Research AdaCoF : Adaptive collaboration of flows for video frame interpolation Lee et al. (2020) designed the latest warping module called adaptive collaboration of flows (AdaCof) based on an operation that uses any number of pixels and any location. Video Frame Interpolation via deformable Separable Convolution Cheng et. al. (2020) They proposed to use more relevant pixels to estimate kernels adaptively calling it deformable separable convolution (DSepConv), hence using smaller kernel size with relevant features for handling large motion. DSepConv uses the encoder–decoder network for feature Extraction. Multiple Video Frame Interpolation via Enhanced Deformable Separable Convolution. Cheng et. al. (2020) They were able to reduce the number of parameters to be trained while maintaining the same results. They were also able to generate multiple interpolated frames between two consecutive frames making it the first kernelbased approach to do so. However, the results for arbitrary time interpolation were not as good as state-of-the-art flowbased approaches.
  • 16. Flow Based Methods  High-quality video frame interpolation often depends on precise motion estimation techniques that train mathematical or deep learning models to establish a strong correlation between consecutive frames in order to preserve the continuity of flow, based on the actual optical displacement of flow vectors and trajectory of visual components via relevant occlusion reasoning and color consistency methods. [1]  So far, flow-based methods have managed to achieve comparable results parallel to the latest GAN [2], CNN, hybrid technologies, and are evolving evidently to outstand heavy computational requirements of deep learning methods on real-time benchmarks. [1] - Yazdi et. Al. ,Real-time object tracking based on an adaptive transition model and extended kalman filter (2019) [2] – Caballero et. al. , Frame interpolation with multiscale deep loss functions and generative adversarial networks. (2017)
  • 17. Path selective interpolation  Dhruv Mahajan et al. (2009) [40] proposed a path framework parallel to an inverse optical flow approach that computes background motion of arbitrary intermediate pixel ‘p’ in the input frames.  Bo Yan et al. (2013) improved the scheme mentioned above by introducing two leading innovations: first, using standard optical flow to supervise path direction by constraining path length.  Yizhou Fan et al. (2016) further guided the optimization process of path construction by collaborating conventional path-based framework of Mahajan and Bo Yan with useful feature points extracted from input frames. Integrating semantic information identifies critical pixels in input frames. It supervises the method for accurate motion pattern recognition via optimal energy minimization , thus avoiding wrong path selection and achieving more natural results.
  • 18. GAN-based interpolation • Inspired by the evident success of deep convolutional neural networks in video processing tasks and thorough research in the field of GANs by Goodfellow since 2014, the first GAN interpolation network designed by Mark Koren et al. (FINNiGAN [111]) in 2016 managed to cut traditional “ghosting” and “tearing” artifacts and compensated for fast-motion discrepancies. The general architecture of a GAN framework. Few developments have been discussed below. • They enhanced frame rate by utilizing the convolutional neural network framework collaborated with generative adversarial networks known as FINNiGAN. • The idea is to prevent common structural information loss during up-sampling by using a SIN (structure interpolation network) that produces the structure of the intermediate frame
  • 19. • ZheHu et al. (2018) [112] proposed amulti-scale structure to share parameters across various layers and cut costly global optimization, unlike usual flow-based methods. MSFSN (multi-scale frame synthesis network) became the first model to provide flexibility by parametrizing the temporal locus of the interest frame. • Chenguang Li et al. (2018) [113] exploited multi-scale CNN architecture to support the long-varied motion and employed additional WGAN-GP (Wasserstein generative adversarial network loss with gradient penalty) [114] to achieve more natural results. The model has a slim generator network structure requiring relatively less storage and high-speed processing due to residual structure and cumulative generation of high-resolution frames.
  • 20. Conclusion This paper puts forward a comprehensive survey of classic video frame interpolation techniques using deep learning. They present a broad overview of existing widely used benchmark evaluations. The five modalities exhibit their unique features and branch to diverse choices of deep learning techniques to utilize their properties productively. The inherent spatial, temporal, and structural attributes of a video sequence are identified. From the aspect of spatiotemporal structural encoding, we highlight the pros and cons of available techniques. Based on key insights underlined by the survey, the problem of video frame interpolation contains promising research opportunities. New techniques in deep learnin will enlarge the scope of improvement of frame interpolation. GAN-based training paradigms show a promising future in the field of frame interpolation. Combining frame interpolation with other video processing tasks also seems to interest researchers.