• 更多栏目

    孙亮

    • 副教授       硕士生导师
    • 性别:男
    • 毕业院校:吉林大学
    • 学位:博士
    • 所在单位:计算机科学与技术学院
    • 学科:计算机应用技术
    • 办公地点:创新园大厦B802
    • 联系方式:15998564404
    • 电子邮箱:liangsun@dlut.edu.cn

    访问量:

    开通时间:..

    最后更新时间:..

    Image Captioning with Bidirectional Semantic Attention-Based Guiding of Long Short-Term Memory

    点击次数:

    论文类型:期刊论文

    发表时间:2021-01-09

    发表刊物:NEURAL PROCESSING LETTERS

    卷号:50

    期号:1

    页面范围:103-119

    ISSN号:1370-4621

    关键字:Image captioning; Semantic attention mechanism; Convolution neural network; Bidirectional guiding LSTM

    摘要:Automatically describing contents of an image using natural language has drawn much attention because it not only integrates computer vision and natural language processing but also has practical applications. Using an end-to-end approach, we propose a bidirectional semantic attention-based guiding of long short-term memory (Bag-LSTM) model for image captioning. The proposed model consciously refines image features from previously generated text. By fine-tuning the parameters of convolution neural networks, Bag-LSTM obtains more text-related image features via feedback propagation than other models. As opposed to existing guidance-LSTM methods which directly add image features into each unit of an LSTM block, our fine-tuned model dynamically leverages more text-conditional image features, acquired by the semantic attention mechanism, as guidance information. Moreover, we exploit bidirectional gLSTM as the caption generator, which is capable of learning long term relations between visual features and semantic information by making use of both historical and future contextual information. In addition, variations of the Bag-LSTM model are proposed in an effort to sufficiently describe high-level visual-language interactions. Experiments on the Flickr8k and MSCOCO benchmark datasets demonstrate the effectiveness of the model, as compared with the baseline algorithms, such as it is 51.2% higher than BRNN on CIDEr metric.