赵哲焕
开通时间:..
最后更新时间:..
点击次数:
论文类型:会议论文
发表时间:2021-06-21
页面范围:1436-1440
关键字:Referring Image Segmentation; Convolutional Neural Networks; Dense Convolution
摘要:Referring image segmentation is an important task aiming at segmenting out the object referred by a natural language expression. Current works usually employ the methods of concatenating the visual and linguistic features. They underestimate the importance of language-to-vision and object-to-object relationships when the natural language expression has multiple entities. Therefore, we propose a new network named Context-Based Network(CBN) to improve the accuracy of locating the correct referent. The CBN is composed of two modules: Intra Relation Selection(Intra-RS) and Inter Relation Selection(Inter-RS). The Intra-RS can capture object-to-object relationships in an embedding visual and linguistic feature space and the Inter-RS uses the multi-scale linguistic features as a guide to match the most similar region from the image feature maps. Besides, we apply spatial pyramid pooling to get global information to solve the limited receptive field problem. Experimental results on four public datasets showed that CBN achieved comparable performance to the other state-of-art methods.