李燕

个人信息Personal Information

副教授

硕士生导师

性别:女

毕业院校:大连理工大学

学位:博士

所在单位:化工学院

电子邮箱:yanli@dlut.edu.cn

扫描关注

论文成果

当前位置: 中文主页 >> 科学研究 >> 论文成果

Understanding the Aquatic Toxicity of Pesticide: Structure-Activity Relationship and Molecular Descriptors to Distinguish the Ratings of Toxicity

点击次数:

论文类型:期刊论文

发表时间:2009-12-01

发表刊物:QSAR & COMBINATORIAL SCIENCE

收录刊物:SCIE、Scopus

卷号:28

期号:11-12

页面范围:1418-1431

ISSN号:1611-020X

关键字:Genetic algorithm; Linear discriminant analysis; Pesticide; QSAR; Structure-property relationships; Aquatic toxicity; Medicinal chemistry

摘要:The purpose of this work is to develop robust, interpretable structure-activity relationship (SAR) models for assessing the aquatic toxicity of pesticides. A data set of 1600 chemicals involving 533 nontoxic (C0), 287 slightly toxic (C1), 329 moderately toxic (C2), 231 highly toxic (C3), and 220 very highly toxic compounds (C4) to aquatic organisms were collected in this work. Their chemical structures were encoded into 196 molecular descriptors including the 2D topological, electrotopological state variables as well as the MlogP and AlogP parameters. Two variable selection techniques, i.e., the Stepwise procedure and the Genetic Algorithms (GA), coupled with the linear discriminant analysis (LDA) were used to obtain stable and thoroughly validated QSARs. Our results reveal that the AlogP is capable of classifying the C0 versus C4 compounds with an accuracy rate of 70.4%, but is poor between other groups, while the MlogP does not show any pronounced correlation for aquatic toxicity for all the groups. By using all the theoretical descriptors, the GA-LDA models for C(0,4) C(1,3), C(1,4), and C(2,4) classifications are acceptable with external prediction accuracies ranging from 66.3% to 80.6%. All these selected descriptors accounting for the molecular size, electrotopological state, and hydrophobicity were found to be crucial to modeling the aquatic toxicity. The robustness and the predictive performance of the proposed models were verified using both the internal (cross-validation by leave-one out, Y-scrambling) and external statistical validations (randomly selected). Our results demonstrate that the Genetic Algorithms have a huge advantage over the Stepwise procedure for generating more reliable models, but by using much less descriptors for all the data sets.