大连理工大学主页平台管理系统罗钟铉--中文主页-- On the tag localization of web video

罗钟铉

教授博士生导师硕士生导师
主要任职：校长助理兼软件学院院长
性别：男
毕业院校：大连理工大学
学位：博士
所在单位：软件学院、国际信息与软件学院
学科：软件工程. 计算机应用技术
办公地点：大连理工大学主楼
联系方式：+86-411-84708315
电子邮箱：zxluo@dlut.edu.cn

访问量：

开通时间：..

最后更新时间：..

当前位置: 中文主页 >> 科学研究 >> 论文成果

On the tag localization of web video

点击次数：

论文类型：期刊论文

发表时间：2016-07-01

发表刊物：MULTIMEDIA SYSTEMS

收录刊物：SCIE、EI、Scopus

卷号：22

期号：4,SI

页面范围：405-412

ISSN号：0942-4962

关键字：Video annotation; Tag localization; Video retrieval

摘要：Nowadays, numerous social videos have pervaded on the web. Social web videos are characterized with the accompanying rich contextual information which describe the content of videos and thus greatly facilitate video search and browsing. Generally, those contextual data such as tags are provided at the whole video level, without temporal indication of when they actually appear in the video, let alone the spatial annotation of object related tags in the video frames. However, many tags only describe parts of the video content. Therefore, tag localization, the process of assigning tags to the underlying relevant video segments or frames even regions in frames is gaining increasing research interests and a benchmark dataset for the fair evaluation of tag localization algorithms is highly desirable. In this paper, we describe and release a dataset called DUT-WEBV, which contains about 4,000 videos collected from YouTube portal by issuing 50 concepts as queries. These concepts cover a wide range of semantic aspects including scenes like "mountain", events like "flood", objects like "cows", sites like "gas station", and activities like "handshaking", offering great challenges to the tag (i.e., concept) localization task. For each video of a tag, we carefully annotate the time durations when the tag appears in the video and also label the spatial location of object with mask in frames for object related tag. Besides the video itself, the contextual information, such as thumbnail images, titles, and YouTube categories, is also provided. Together with this benchmark dataset, we present a baseline for tag localization using multiple instance learning approach. Finally, we discuss some open research issues for tag localization in web videos.

上一条：Convergence analysis of a family of 14-node brick elements

下一条：ARAP plus plus : an extension of the local/global approach to mesh parameterization