姚念民

个人信息Personal Information

教授

博士生导师

硕士生导师

任职 : 智能计算教研室主任

性别:男

毕业院校:吉林大学

学位:博士

所在单位:计算机科学与技术学院

学科:计算机应用技术. 计算机软件与理论

办公地点:创新园大厦A820

联系方式:13304609362

电子邮箱:lucos@dlut.edu.cn

扫描关注

论文成果

当前位置: 姚念民欢迎报考硕博士 >> 科学研究 >> 论文成果

Generating word and document matrix representations for document classification

点击次数:

论文类型:期刊论文

发表时间:2020-07-01

发表刊物:NEURAL COMPUTING & APPLICATIONS

收录刊物:SCIE

卷号:32

期号:14

页面范围:10087-10108

ISSN号:0941-0643

关键字:Document-level classification; Word matrix; Document matrix; Subwindows

摘要:We present an effective word and document matrix representation architecture based on a linear operation, referred to as doc2matrix, to learn representations for document-level classification. It uses a matrix to present each word or document, which is different from the traditional form of vector representation. Doc2matrix defines proper subwindows as the scale of text. A word matrix and a document matrix are generated by stacking the information of these subwindows. Our document matrix not only contains more fine-grained semantic and syntactic information than the original representation but also introduces abundant two-dimensional features. Experiments conducted on four document-level classification tasks demonstrate that the proposed architecture can generate higher-quality word and document representations and outperform previous models based on linear operations. We can see that compared to different classifiers, a convolutional-based classifier is more suitable for our document matrix. Furthermore, we also demonstrate that the convolution operation can better capture the two-dimensional features of the proposed document matrix by the analysis from both theoretical and experimental perspectives.