大连理工大学主页平台管理系统孙亮--中文主页-- An attention mechanism based convolutional LSTM network for video action recognition

孙亮

副教授硕士生导师
性别：男
毕业院校：吉林大学
学位：博士
所在单位：计算机科学与技术学院
学科：计算机应用技术
办公地点：创新园大厦B802
联系方式：15998564404
电子邮箱：liangsun@dlut.edu.cn

访问量：

开通时间：..

最后更新时间：..

当前位置: 中文主页 >> 科学研究 >> 论文成果

An attention mechanism based convolutional LSTM network for video action recognition

点击次数：

论文类型：期刊论文

发表时间：2019-07-01

发表刊物：MULTIMEDIA TOOLS AND APPLICATIONS

收录刊物：SCIE、EI

卷号：78

期号：14

页面范围：20533-20556

ISSN号：1380-7501

关键字：Attention mechanism; Convolutional LSTM; Spatial transformer; Video action recognition

摘要：As an important issue in video classification, human action recognition is becoming a hot topic in computer vision. The ways of effectively representing the spatial static and temporal dynamic information of videos are important problems in video action recognition. This paper proposes an attention mechanism based convolutional LSTM action recognition algorithm to improve the accuracy of recognition by extracting the salient regions of actions in videos effectively. First, GoogleNet is used to extract the features of video frames. Then, those feature maps are processed by the spatial transformer network for the attention. Finally the sequential information of the features is modeled via the convolutional LSTM to classify the action in the original video. To accelerate the training speed, we adopt the analysis of temporal coherence to reduce the redundant features extracted by GoogleNet with trivial accuracy loss. In comparison with the state-of-the-art algorithms for video action recognition, competitive results are achieved on three widely-used datasets, UCF-11, HMDB-51 and UCF-101. Moreover, by using the analysis of temporal coherence, desirable results are obtained while the training time is reduced.

上一条：A Many-Objective Evolutionary Algorithm With Two Interacting Processes: Cascade Clustering and Reference Point Incremental Learning

下一条：Non-negative matrix factorization based modeling and training algorithm for multi-label learning