Visual Explanations from Hadamard Product in Multimodal Deep Networks

Kim, Jin-Hwa; Zhang, Byoung-Tak

Computer Science > Computer Vision and Pattern Recognition

arXiv:1712.06228 (cs)

[Submitted on 18 Dec 2017]

Title:Visual Explanations from Hadamard Product in Multimodal Deep Networks

Authors:Jin-Hwa Kim, Byoung-Tak Zhang

View PDF

Abstract:The visual explanation of learned representation of models helps to understand the fundamentals of learning. The attentional models of previous works used to visualize the attended regions over an image or text using their learned weights to confirm their intended mechanism. Kim et al. (2016) show that the Hadamard product in multimodal deep networks, which is well-known for the joint function of visual question answering tasks, implicitly performs an attentional mechanism for visual inputs. In this work, we extend their work to show that the Hadamard product in multimodal deep networks performs not only for visual inputs but also for textual inputs simultaneously using the proposed gradient-based visualization technique. The attentional effect of Hadamard product is visualized for both visual and textual inputs by analyzing the two inputs and an output of the Hadamard product with the proposed method and compared with learned attentional weights of a visual question answering model.

Comments:	8 pages, 5 figures, including appendix, NIPS 2017 Workshop on Visually-Grounded Interaction and Language (ViGIL)
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:1712.06228 [cs.CV]
	(or arXiv:1712.06228v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1712.06228

Submission history

From: Jin-Hwa Kim [view email]
[v1] Mon, 18 Dec 2017 02:37:20 UTC (1,391 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2017-12

Change to browse by:

cs
cs.AI
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Jin-Hwa Kim
Byoung-Tak Zhang

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Visual Explanations from Hadamard Product in Multimodal Deep Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Visual Explanations from Hadamard Product in Multimodal Deep Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators