You Only Look Once :
Unified, Real-Time Object Detection
Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi
전희선
1. Introduction
• 기존 모델들은 물체 인식과 분류 각각 따로 진행 → 사람 시각체계 모방하기에는 부족
• 하지만 YOLO는 물체 인식 및 분류를 하나의 regression 문제로 간주
1. Introduction
장점
- Extremely fast
- Reasons globally about the image
- Learns generalizable
representation of objects
단점
- Lags behind state-of-the-art
detection systems in accuracy
2. Unified Detection
1. 이미지를 S*S grid로 분할
(총 S*S개의 grid cell 생성)
Hyperparameters :
S (grid 분할 수)
B (bounding box 수)
C (class 수)
2. Unified Detection
2. 각 grid cell별로 B개의 bounding box 유추
+ bounding box별 confidence score 계산
각 bounding box 구성요소
(x, y) : bounding box 중심점 (grid cell에 대한 상대값)
(w, h) : 이미지 width, height (전체 이미지에 대한 상대값)
confidence : 신뢰도
Confidence Score :
Box가 객체 포함하는지에 대한 신뢰도 및
box가 얼마나 정확하게 유추되었는지 반영
Pr 𝑂𝑏𝑗𝑒𝑐𝑡 ∗ 𝐼𝑂𝑈 𝑝𝑟𝑒𝑑
𝑡𝑟𝑢𝑡ℎ
IOU(Intersection Over Union) :
예측 구간과 실제 구간이 얼마나 겹치는지 나타냄
𝐼𝑂𝑈 𝑝𝑟𝑒𝑑
𝑡𝑟𝑢𝑡ℎ
=
𝑡𝑟𝑢𝑡ℎ ∩ 𝑝𝑟𝑒𝑑 영역 넓이
𝑡𝑟𝑢𝑡ℎ ∪ 𝑝𝑟𝑒𝑑 영역 넓이
grid cell에 객체 있으면 1, 없으면 0
2. Unified Detection
3. 각 grid cell별로 C개의 conditional class probability 계산
→ 가장 확률 높은 class 할당
Conditional Class Probability :
Pr 𝐶𝑙𝑎𝑠𝑠𝑖 | 𝑂𝑏𝑗𝑒𝑐𝑡
2. Unified Detection
4. 최종 detection!
Test할 때는 각 box별로
Class-specific confidence score 계산 :
Pr 𝐶𝑙𝑎𝑠𝑠𝑖 𝑂𝑏𝑗𝑒𝑐𝑡) ∗ Pr 𝑂𝑏𝑗𝑒𝑐𝑡 ∗ 𝐼𝑂𝑈 𝑝𝑟𝑒𝑑
𝑡𝑟𝑢𝑡ℎ
= Pr 𝐶𝑙𝑎𝑠𝑠𝑖 ∗ 𝐼𝑂𝑈 𝑝𝑟𝑒𝑑
𝑡𝑟𝑢𝑡ℎ
2.1 Network Design
GoogLeNet 모델 기반으로 생성됨
Inception module에서
1*1 reduction layer,
3*3 conv layer 이용
2.1 Network Design
초반 20개 (GoogLeNet modification된) conv layer : feature extractor
후반 4개 conv layer + FC layer : object classifier
2.1 Network Design
class별
probability
각 bounding box별
x, y, w, h, confidence 값
(슬라이드 5 참고, 여기서
bounding box 개수 = 2개)
최종 출력 Tensor 크기
= S x S x (B*5+C)
= 7 x 7 x (2*5+20)
S(grid 분할 수) = 7
B(bounding box 수) = 2
C(class 수) = 20
Pr 𝐶𝑙𝑎𝑠𝑠𝑖 | 𝑂𝑏𝑗𝑒𝑐𝑡
2.2 Training – Loss Function
2.2 Training – Loss Function
Object가 존재하는 grid cell i의 bounding box j에 대해
x, y의 loss 계산
2.2 Training – Loss Function
Object가 존재하는 grid cell i의 bounding box j에 대해
w, y의 loss 계산
(큰 box에 대하여 small deviation 반영 위해 제곱근)
2.2 Training – Loss Function
Object가 존재하는 grid cell i의 bounding box j에 대해
confidence score의 loss 계산
(𝐶𝑖 = 1)
2.2 Training – Loss Function
Object가 존재하지 않는 grid cell i의 bounding box j에 대해
confidence score의 loss 계산
(𝐶𝑖 = 0)
2.2 Training – Loss Function
Object가 존재하지 않는 grid cell i의 bounding box j에 대해
conditional class probability의 loss 계산
(맞는 class이면 𝑝𝑖 𝑐 = 1, 아니면 𝑝𝑖 𝑐 = 0)
2.2 Training – Loss Function
보통
10배
2.2 Training – hyperparameter
1. 초반 20개 conv layers를 ImageNet 1000-class dataset으로 pretrain
+ 4개 conv layer와 2개 FC layer 넣어서 PASCAL VOC dataset으로 train
2. 𝜆 𝑐𝑜𝑜𝑟𝑑 = 5, 𝜆 𝑛𝑜𝑜𝑏𝑗 = 0.5 (보통 object 있는 곳에 10배 가중치)
3. Batch size = 64
4. Dropout rate = 0.5
5. Activation function = leaky ReLU
2.3 Inference
2.3 Inference
2.3 Inference
2.3 Inference
2.3 Inference
2.3 Inference
2.3 Inference
2.3 Inference
2.3 Inference
2.3 Inference
2.4 Limitations of YOLO
각 cell이 하나의 box 유추 → 그룹으로 객체가 묶여 있으면 예측 어려움
새로운, 독특한 형태의 bounding box 정확히 예측 불가
참고자료
http://www.navisphere.net/6028/you-only-look-once-unified-real-time-object-detection/
https://curt-park.github.io/2017-03-26/yolo/
https://www.youtube.com/watch?v=eTDcoeqj1_w&t=1572s
https://www.youtube.com/watch?v=4eIBisqx9_g
https://www.youtube.com/watch?v=8DjIJc7xH5U
https://www.youtube.com/watch?v=Cgxsv1riJhI

More Related Content

PPTX
You only look once (YOLO) : unified real time object detection
PPTX
YOLO v1
PPTX
You only look once
PPTX
PDF
Yolo v1 urop 발표자료
PDF
PR-207: YOLOv3: An Incremental Improvement
PPTX
PDF
Codetecon #KRK 3 - Object detection with Deep Learning
You only look once (YOLO) : unified real time object detection
YOLO v1
You only look once
Yolo v1 urop 발표자료
PR-207: YOLOv3: An Incremental Improvement
Codetecon #KRK 3 - Object detection with Deep Learning

What's hot (20)

PDF
Anatomy of YOLO - v1
PPTX
Yolov3
PPTX
You Only Look Once: Unified, Real-Time Object Detection
PPTX
PDF
Faster R-CNN - PR012
PPTX
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
PPTX
You only look once: Unified, real-time object detection (UPC Reading Group)
PDF
Yolo v2 ai_tech_20190421
PDF
Feature Pyramid Network, FPN
PPTX
PDF
Object Detection and Recognition
PDF
Yolov3
PPTX
PPTX
입체충돌처리
PDF
Screen Space Decals in Warhammer 40,000: Space Marine
PPTX
전리품 분배 시스템 기획 배상욱
PPTX
Yolo releases gianmaria
PDF
Rendering AAA-Quality Characters of Project A1
Anatomy of YOLO - v1
Yolov3
You Only Look Once: Unified, Real-Time Object Detection
Faster R-CNN - PR012
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
You only look once: Unified, real-time object detection (UPC Reading Group)
Yolo v2 ai_tech_20190421
Feature Pyramid Network, FPN
Object Detection and Recognition
Yolov3
입체충돌처리
Screen Space Decals in Warhammer 40,000: Space Marine
전리품 분배 시스템 기획 배상욱
Yolo releases gianmaria
Rendering AAA-Quality Characters of Project A1
Ad

Similar to YOLO (20)

PDF
You Only Look Once: Unified, Real-Time Object Detection
PDF
제 8회 BOAZ 빅데이터 컨퍼런스 -04 YOLO WOW (You Only Look Once at What yOu Want)
PDF
Yolo v2 urop 발표자료
PDF
Summary in recent advances in deep learning for object detection
PDF
Summary in recent advances in deep learning for object detection
PDF
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
PDF
Deep Object Detectors #1 (~2016.6)
PDF
제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [추적 24시] : 완전 자동결제를 위한 무인점포 이용자 Tracking System 개발
PPTX
Convolutional neural networks
PPTX
FaceNet: A Unified Embedding for Face Recognition and Clustering
PDF
Learning Less is More - 6D Camera Localization via 3D Surface Regression
PPTX
A normalized gaussian wasserstein distance for tiny object detection 1
PPTX
호서대학교 - 다양한 오픈소스 활용법 (Colab을 이용하여)
PPTX
Image net classification with deep convolutional neural networks
PDF
AnoGAN을 이용한 철강 소재 결함 검출 AI
PDF
[UNET]Segmentation model, a representative UNet, and a slide for understanding
PDF
FCN to DeepLab.v3+
PDF
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
PDF
Nationality recognition
PDF
Loss function discovery for object detection via convergence simulation drive...
You Only Look Once: Unified, Real-Time Object Detection
제 8회 BOAZ 빅데이터 컨퍼런스 -04 YOLO WOW (You Only Look Once at What yOu Want)
Yolo v2 urop 발표자료
Summary in recent advances in deep learning for object detection
Summary in recent advances in deep learning for object detection
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
Deep Object Detectors #1 (~2016.6)
제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [추적 24시] : 완전 자동결제를 위한 무인점포 이용자 Tracking System 개발
Convolutional neural networks
FaceNet: A Unified Embedding for Face Recognition and Clustering
Learning Less is More - 6D Camera Localization via 3D Surface Regression
A normalized gaussian wasserstein distance for tiny object detection 1
호서대학교 - 다양한 오픈소스 활용법 (Colab을 이용하여)
Image net classification with deep convolutional neural networks
AnoGAN을 이용한 철강 소재 결함 검출 AI
[UNET]Segmentation model, a representative UNet, and a slide for understanding
FCN to DeepLab.v3+
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Nationality recognition
Loss function discovery for object detection via convergence simulation drive...
Ad

More from KyeongUkJang (20)

PDF
Photo wake up - 3d character animation from a single photo
PPTX
AlphagoZero
PPTX
GoogLenet
PDF
GAN - Generative Adversarial Nets
PDF
Distilling the knowledge in a neural network
PDF
Latent Dirichlet Allocation
PDF
Gaussian Mixture Model
PDF
CNN for sentence classification
PDF
Visualizing data using t-SNE
PPTX
Playing atari with deep reinforcement learning
PDF
Chapter 20 - GAN
PDF
Chapter 20 - VAE
PPTX
Chapter 20 Deep generative models
PDF
Chapter 19 Variational Inference
PDF
Natural Language Processing(NLP) - basic 2
PDF
Natural Language Processing(NLP) - Basic
PPTX
Chapter 17 monte carlo methods
PDF
Chapter 16 structured probabilistic models for deep learning - 2
PPTX
Chapter 16 structured probabilistic models for deep learning - 1
PPTX
Chapter 15 Representation learning - 2
Photo wake up - 3d character animation from a single photo
AlphagoZero
GoogLenet
GAN - Generative Adversarial Nets
Distilling the knowledge in a neural network
Latent Dirichlet Allocation
Gaussian Mixture Model
CNN for sentence classification
Visualizing data using t-SNE
Playing atari with deep reinforcement learning
Chapter 20 - GAN
Chapter 20 - VAE
Chapter 20 Deep generative models
Chapter 19 Variational Inference
Natural Language Processing(NLP) - basic 2
Natural Language Processing(NLP) - Basic
Chapter 17 monte carlo methods
Chapter 16 structured probabilistic models for deep learning - 2
Chapter 16 structured probabilistic models for deep learning - 1
Chapter 15 Representation learning - 2

YOLO