SlideShare a Scribd company logo
Visual Summary of Egocentric
Photostreams by Representative
Keyframes
Marc Bolaños, Ricard Mestre, Estefanía Talavera, Xavier Giró-i-Nieto and Petia Radeva
1
Motivation
Lifelogging wearable cameras can produce 1,500 images/day, more than 500,000 images/year.
2
Producing automatic summarization methods could help in
many applications. Specially, we are working on:
● Memory aid for Mild Cognitive Impairment patients.
● Automatic nutrition diary.
Extract the visual summary of a whole day capturing the
most representative information for describing the day.
Goal
3
Storytelling
Extract the visual summary of a whole day capturing the
most representative information for describing the day.
Goal
4
Storytelling
Have breakfast
with the family
Extract the visual summary of a whole day capturing the
most representative information for describing the day.
Goal
5
Storytelling
Have breakfast
with the family
Go for a walk
Extract the visual summary of a whole day capturing the
most representative information for describing the day.
Goal
6
Storytelling
Have breakfast
with the family
Go for a walk
Go shopping
Extract the visual summary of a whole day capturing the
most representative information for describing the day.
Goal
7
Storytelling
Have breakfast
with the family
Go for a walk
Go shopping
Take the bus
Extract the visual summary of a whole day capturing the
most representative information for describing the day.
Goal
8
Storytelling
Have breakfast
with the family
Go for a walk
Go shopping
Take the bus
Have a coffee
with a friend
State of the Art
Lu, Zheng, and Kristen Grauman. "Story-driven summarization for egocentric video." Computer Vision and Pattern Recognition (CVPR), 2013
IEEE Conference on. IEEE, 2013.
9
High temporal resolution egocentric data.
1. Event segmentation.
2. Detection of salient objects and people.
3. Subset selection of video shots based on:
a. Story
b. Importance
c. Diversity
State of the Art
Doherty, Aiden R., et al. "Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs." Proceedings of the
2008 international conference on Content-based image and video retrieval. ACM, 2008.
10
Low temporal resolution egocentric data.
1. Event segmentation.
2. Selection of the keyframes comparing
several methods:
a. Middle image of each segment.
b. Image close to the average value in
the segment (centroid-like).
c. Image with highest “quality”.
Methodology ( I )
11
Methodology ( II )
12
Frames Characterization
Convolutional Neural Networks (CNN) trained on ImageNet.
13
Jia, Yangqing, et al. "Caffe: Convolutional architecture for fast feature embedding." Proceedings of the ACM International Conference on Multimedia.
ACM, 2014.
Events Segmentation ( I )
Applying an agglomerative clustering and adapting the cut-off parameter, we can obtain a
good segmentation of all the events in our day.
14
cut-off parameter
Events Segmentation ( II )
Division - Fusion post-processing to obtain a more robust segmentation.
15
a) After Agglomerative Clustering
b) After Division
c) After Fusion
Division: splits and labels differently similar events spaced in time.
Fusion: merges very short sub-events not considered relevant enough.
Keyframe Selection
Visual similarity-based keyframe selection criteria.
16
Distances Matrix
Random Walk
Minimum Distance
Similarity-based probabilities
Summary Results
17
Evaluation ( I )
● 5 days
● 3 users
● 4005 images
● Segmentation ground truth
18
Talavera, E., Dimiccoli, M., Bolaños, M., Aghaei, M., & Radeva, P. R-
clustering for egocentric video segmentation. IbPRIA 2015, Santiago de
Compostela, Spain. Proceedings (Vol. 9117, p. 327). Springer.
Datasets Clustering
● Jaccard Index
Evaluation ( II )
19
Keyframe Selection
Lu, Zheng, and Kristen Grauman. "Story-driven summarization
for egocentric video." Computer Vision and Pattern
Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013.
Figure: brandchannel.com
● Blind taste test to 30 users for quality
evaluation
Representative images of the event #1
Do you think the image on the left can represent the event?
Do you think the image on the center can represent the event?
Yes
No
Yes
No
Yes
No
Do you think the image on the right can represent the event?
What is the most representative image of the event?
Left
Center
Right
Individual Keyframes Quality Evaluation
Evaluation ( III )
20
Keyframe Selection General Summary Quality Evaluation
Yes
No
Do you think that this set can summarize the whole day?
Finally, which one do you think is the best visual summary of the day?
Summary 1
Summary 2
Summary 3
Summary 4
Summary 1
Some of the summaries you will see might be very similar (differentiable
only in some images). In that case you can choose any of them.
Visual summaries of the day
Evaluation - Individual Keyframes
21
What is the most representative image of the
event?
Do you think that the image on the
left/center/right can represent the event?
Evaluation - General Summary
22
Can this set of images represent the complete day? Which summary is the best, in your opinion?
Conclusions
● New keyframe selection methodology taking into account visual and temporal
information.
● Keyframe selection using CNN-based global information and graph-analysis.
● 88-86% user acceptance of our summaries.
● 58% users chose our summaries as the best option.
● Use semantic information (e.g. objects, people, actions).
● Clinical application on Mild Cognitive Impairment patients.
23
Future Work

More Related Content

PDF
Waterveiligheid
SwimCare
 
PPTX
Diversos tipos de simuladores
danielAngelvnzla
 
PPT
Trenchless Technology.Pps
guesta7b431e
 
PPT
資料庫使用方法
sophiya
 
PPTX
Эффективность применения гидрогеля в Зауралье
kulibin
 
PDF
말의여행
mil23
 
PPT
Presentacion id h i
Ismael Dominguez
 
PDF
Green GPS Tracking
ACGlobal Systems
 
Waterveiligheid
SwimCare
 
Diversos tipos de simuladores
danielAngelvnzla
 
Trenchless Technology.Pps
guesta7b431e
 
資料庫使用方法
sophiya
 
Эффективность применения гидрогеля в Зауралье
kulibin
 
말의여행
mil23
 
Presentacion id h i
Ismael Dominguez
 
Green GPS Tracking
ACGlobal Systems
 

Viewers also liked (10)

PPTX
Lab Crafters
BobDeLuca
 
PDF
Contracts_literature_review-JVidaurre
Javier Vidaurre
 
KEY
クックパッドでのスマートフォン開発
Takuto Nishioka
 
PDF
Till Altmann - Industrial Designer in Unisto (Switzerland)
Ramses Cabello
 
PPTX
Apple - Steve Jobs
adeel990
 
PPTX
Time period during chanakya
Rameen khan
 
PPT
бренд консалтинг
desper
 
PPTX
S4 tarea4 garea
AlejandraGarduno
 
PDF
QMS comunicación triunfa en los premios de Cannes 2015
QMS Comunicación
 
PPTX
Experian Consumer Newlywed Survey
Experian_US
 
Lab Crafters
BobDeLuca
 
Contracts_literature_review-JVidaurre
Javier Vidaurre
 
クックパッドでのスマートフォン開発
Takuto Nishioka
 
Till Altmann - Industrial Designer in Unisto (Switzerland)
Ramses Cabello
 
Apple - Steve Jobs
adeel990
 
Time period during chanakya
Rameen khan
 
бренд консалтинг
desper
 
S4 tarea4 garea
AlejandraGarduno
 
QMS comunicación triunfa en los premios de Cannes 2015
QMS Comunicación
 
Experian Consumer Newlywed Survey
Experian_US
 
Ad

Similar to Visual Summary of Egocentric Photostreams by Representative Keyframes (20)

PDF
Torralba_What makes ok aaaaaaaaaaaaaaaaaa
DrMohammadAlDahrawi
 
PDF
Interactive Video Search: Where is the User in the Age of Deep Learning?
klschoef
 
PDF
“Understand the Multimodal World with Minimal Supervision,” a Keynote Present...
Edge AI and Vision Alliance
 
PDF
Paper 153
Guillaume Dupont
 
PPTX
ICT CONTENT DEVELOPMENT
ChristopherEsteban2
 
PDF
Applied Computer Vision - a Deep Learning Approach
Jose Berengueres
 
PDF
Look Based Media Player
IRJET Journal
 
PDF
Dq4301702706
IJERA Editor
 
PPTX
Elderly Assistance- Deep Learning Theme detection
Tanvi Mittal
 
PPTX
DESIGN THINKING PRESENTATION ON SMART MIRROR.pptx
SHAHSHREYA4
 
PDF
Design Mind
Bhaskar Agarwal
 
PDF
0-1--Introduction FPCV-0-1.pdf
PatrickMatthewChan
 
PPTX
LESSON 14 grade twelve and grade 11 shs.pptx
klynth23
 
PPTX
Photo Editing.pptx
MarvinDiadula2
 
PPTX
Lifelogging, egocentric vision and health: how a small wearable camera can he...
Petia Radeva
 
PDF
Intelligent Video Surveillance System using Deep Learning
IRJET Journal
 
PDF
IRJET - Applications of Image and Video Deduplication: A Survey
IRJET Journal
 
PDF
Essentials of Psychology Concepts and Applications 3rd Edition Nevid Test Bank
brookscogswelljxal
 
PDF
ATTENDANCE BY FACE RECOGNITION USING AI
IRJET Journal
 
Torralba_What makes ok aaaaaaaaaaaaaaaaaa
DrMohammadAlDahrawi
 
Interactive Video Search: Where is the User in the Age of Deep Learning?
klschoef
 
“Understand the Multimodal World with Minimal Supervision,” a Keynote Present...
Edge AI and Vision Alliance
 
Paper 153
Guillaume Dupont
 
ICT CONTENT DEVELOPMENT
ChristopherEsteban2
 
Applied Computer Vision - a Deep Learning Approach
Jose Berengueres
 
Look Based Media Player
IRJET Journal
 
Dq4301702706
IJERA Editor
 
Elderly Assistance- Deep Learning Theme detection
Tanvi Mittal
 
DESIGN THINKING PRESENTATION ON SMART MIRROR.pptx
SHAHSHREYA4
 
Design Mind
Bhaskar Agarwal
 
0-1--Introduction FPCV-0-1.pdf
PatrickMatthewChan
 
LESSON 14 grade twelve and grade 11 shs.pptx
klynth23
 
Photo Editing.pptx
MarvinDiadula2
 
Lifelogging, egocentric vision and health: how a small wearable camera can he...
Petia Radeva
 
Intelligent Video Surveillance System using Deep Learning
IRJET Journal
 
IRJET - Applications of Image and Video Deduplication: A Survey
IRJET Journal
 
Essentials of Psychology Concepts and Applications 3rd Edition Nevid Test Bank
brookscogswelljxal
 
ATTENDANCE BY FACE RECOGNITION USING AI
IRJET Journal
 
Ad

Recently uploaded (20)

PPTX
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 
PPTX
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
PDF
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
PPTX
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PDF
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PDF
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PDF
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
PDF
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
PPT
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
PDF
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
PPTX
Role Of Python In Programing Language.pptx
jaykoshti048
 
PDF
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
PDF
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
PDF
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
PPTX
Presentation about variables and constant.pptx
safalsingh810
 
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
Role Of Python In Programing Language.pptx
jaykoshti048
 
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
Presentation about variables and constant.pptx
safalsingh810
 

Visual Summary of Egocentric Photostreams by Representative Keyframes

  • 1. Visual Summary of Egocentric Photostreams by Representative Keyframes Marc Bolaños, Ricard Mestre, Estefanía Talavera, Xavier Giró-i-Nieto and Petia Radeva 1
  • 2. Motivation Lifelogging wearable cameras can produce 1,500 images/day, more than 500,000 images/year. 2 Producing automatic summarization methods could help in many applications. Specially, we are working on: ● Memory aid for Mild Cognitive Impairment patients. ● Automatic nutrition diary.
  • 3. Extract the visual summary of a whole day capturing the most representative information for describing the day. Goal 3 Storytelling
  • 4. Extract the visual summary of a whole day capturing the most representative information for describing the day. Goal 4 Storytelling Have breakfast with the family
  • 5. Extract the visual summary of a whole day capturing the most representative information for describing the day. Goal 5 Storytelling Have breakfast with the family Go for a walk
  • 6. Extract the visual summary of a whole day capturing the most representative information for describing the day. Goal 6 Storytelling Have breakfast with the family Go for a walk Go shopping
  • 7. Extract the visual summary of a whole day capturing the most representative information for describing the day. Goal 7 Storytelling Have breakfast with the family Go for a walk Go shopping Take the bus
  • 8. Extract the visual summary of a whole day capturing the most representative information for describing the day. Goal 8 Storytelling Have breakfast with the family Go for a walk Go shopping Take the bus Have a coffee with a friend
  • 9. State of the Art Lu, Zheng, and Kristen Grauman. "Story-driven summarization for egocentric video." Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013. 9 High temporal resolution egocentric data. 1. Event segmentation. 2. Detection of salient objects and people. 3. Subset selection of video shots based on: a. Story b. Importance c. Diversity
  • 10. State of the Art Doherty, Aiden R., et al. "Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs." Proceedings of the 2008 international conference on Content-based image and video retrieval. ACM, 2008. 10 Low temporal resolution egocentric data. 1. Event segmentation. 2. Selection of the keyframes comparing several methods: a. Middle image of each segment. b. Image close to the average value in the segment (centroid-like). c. Image with highest “quality”.
  • 13. Frames Characterization Convolutional Neural Networks (CNN) trained on ImageNet. 13 Jia, Yangqing, et al. "Caffe: Convolutional architecture for fast feature embedding." Proceedings of the ACM International Conference on Multimedia. ACM, 2014.
  • 14. Events Segmentation ( I ) Applying an agglomerative clustering and adapting the cut-off parameter, we can obtain a good segmentation of all the events in our day. 14 cut-off parameter
  • 15. Events Segmentation ( II ) Division - Fusion post-processing to obtain a more robust segmentation. 15 a) After Agglomerative Clustering b) After Division c) After Fusion Division: splits and labels differently similar events spaced in time. Fusion: merges very short sub-events not considered relevant enough.
  • 16. Keyframe Selection Visual similarity-based keyframe selection criteria. 16 Distances Matrix Random Walk Minimum Distance Similarity-based probabilities
  • 18. Evaluation ( I ) ● 5 days ● 3 users ● 4005 images ● Segmentation ground truth 18 Talavera, E., Dimiccoli, M., Bolaños, M., Aghaei, M., & Radeva, P. R- clustering for egocentric video segmentation. IbPRIA 2015, Santiago de Compostela, Spain. Proceedings (Vol. 9117, p. 327). Springer. Datasets Clustering ● Jaccard Index
  • 19. Evaluation ( II ) 19 Keyframe Selection Lu, Zheng, and Kristen Grauman. "Story-driven summarization for egocentric video." Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013. Figure: brandchannel.com ● Blind taste test to 30 users for quality evaluation Representative images of the event #1 Do you think the image on the left can represent the event? Do you think the image on the center can represent the event? Yes No Yes No Yes No Do you think the image on the right can represent the event? What is the most representative image of the event? Left Center Right Individual Keyframes Quality Evaluation
  • 20. Evaluation ( III ) 20 Keyframe Selection General Summary Quality Evaluation Yes No Do you think that this set can summarize the whole day? Finally, which one do you think is the best visual summary of the day? Summary 1 Summary 2 Summary 3 Summary 4 Summary 1 Some of the summaries you will see might be very similar (differentiable only in some images). In that case you can choose any of them. Visual summaries of the day
  • 21. Evaluation - Individual Keyframes 21 What is the most representative image of the event? Do you think that the image on the left/center/right can represent the event?
  • 22. Evaluation - General Summary 22 Can this set of images represent the complete day? Which summary is the best, in your opinion?
  • 23. Conclusions ● New keyframe selection methodology taking into account visual and temporal information. ● Keyframe selection using CNN-based global information and graph-analysis. ● 88-86% user acceptance of our summaries. ● 58% users chose our summaries as the best option. ● Use semantic information (e.g. objects, people, actions). ● Clinical application on Mild Cognitive Impairment patients. 23 Future Work