Guiding the flowing of semantics: Interpretable video captioning via POS tag
X Xiao, L Wang, B Fan, S Xiang… - Proceedings of the 2019 …, 2019 - aclanthology.org
In the current video captioning models, the video frames are collected in one network and
the semantics are mixed into one feature, which not only increase the difficulty of the caption
decoding, but also decrease the interpretability of the captioning models. To address these
problems, we propose an Adaptive Semantic Guidance Network (ASGN), which instantiates
the whole video semantics to different POS-aware semantics with the supervision of part of
speech (POS) tag. In the encoding process, the POS tag activates the related neurons and …
the semantics are mixed into one feature, which not only increase the difficulty of the caption
decoding, but also decrease the interpretability of the captioning models. To address these
problems, we propose an Adaptive Semantic Guidance Network (ASGN), which instantiates
the whole video semantics to different POS-aware semantics with the supervision of part of
speech (POS) tag. In the encoding process, the POS tag activates the related neurons and …
Showing the best result for this search. See all results