戚金清
开通时间:..
最后更新时间:..
点击次数:
论文类型:期刊论文
发表时间:2019-12-01
发表刊物:PATTERN RECOGNITION
收录刊物:EI、SCIE
卷号:96
ISSN号:0031-3203
关键字:Saliency detection; Natural language; Textual-visual pairwise; Self-supervision
摘要:Natural Language Processing has achieved remarkable performance in multitudinous computer tasks, but the potential capability of textual information has not been completely explored in visual saliency detection. In this paper, we learn to detect salient object from natural language by addressing the two essential issues: finding a semantic content matching the corresponding linguistic concept and recovering fine details without any pixel-level annotations. We first propose the Feature Matching Network (FMN) to explore the internal relation between the linguistic concept and visual image in the semantic space. The FMN simultaneously establishes the textual-visual pairwise affinities and generates a language aware coarse saliency map. to refine the coarse map, the Recurrent Fine-tune Network (RFN) is proposed to enhance its predicted performance progressively by self-supervision. Our approach only leverages the caption to provide important cues of salient object, but generates a fine-detailed foreground map at a detecting speed of 72 FPS without any post-processing. Extensive experiments demonstrate that our method takes full advantage of textual information of natural language in saliency detection, and performs favorably against state-of-the-art approaches on the most existing datasets. (C) 2019 Elsevier Ltd. All rights reserved.