by Roberto Todeschini on Wed Mar 14, 2007 11:55 am
Dear Lionello
theoretically, in multiple linear regression (MLR), orthogonal (or orthogonalized) descriptors are usually recommended, while in classification methods the presence of some correlation among descriptors can be useful.
However, using regression methods such PLS (Partial Least Squares) or PCR (Principal Component Regression), the problem of the descriptor correlations is avoided.
Personally, I prefere methods of variable selection (using, for example, genetic algorithms) joined to MLR, providing simple final models constituted by a small number of original descriptors. The combined use of validation techniques usually avoid the selection of too correlated descriptors.
In the following papers, these aspects are discussed:
Todeschini R., Consonni V., Mauri A. and Pavan M. (2004). Detecting "bad" regression models: multicriteria fitness functions in regression analysis. Anal.Chim.Acta, 515, 199-208.
Todeschini R., Consonni V. and Maiocchi A. (1998). The K Correlation Index: Theory Development and its Applications in Chemometrics. Chemometrics & Intell.Lab.Syst., 46, 13-29.
In any case, the descriptor orthogonalization procedure can be useful to produce orthogonal descriptors, but the meaning of the single descriptors is loss.
Some references about descriptor orthogonalization
Amic D., Davidovic-Amic D. and Trinajstic N. (1995). Calculation of Retention Times of Anthocyanins with Orthogonalized Topological Indices. J.Chem.Inf.Comput.Sci., 35, 136-139.
Amic D., Davidovic-Amic D., Beslo D., Lucic B. and Trinajstic N. (1997). The Use of the Ordered Orthogonalized Multivariate Linear Regression in a Structure-Activity Study of Coumarin and Flavonoid Derivatives as Inhibitors of Aldose Reductase. J.Chem.Inf.Comput.Sci., 37, 581-586.
Araujo O. and Morales D. A. (1996). An Alternative Approach to Orthogonal Graph Theoretical Invariants. Chem.Phys.Lett., 257, 393-396.
Araujo O. and Morales D. A. (1996). A Theorem About the Algebraic Structure Underlying Orthogonal Graph Invariants. J.Chem.Inf.Comput.Sci., 36, 1051-1053.
Araujo O. and Morales D. A. (1998). Properties of New Orthogonal Graph Theoretical Invariants in Structure-Property Correlations. J.Chem.Inf.Comput.Sci., 38, 1031-1037.
Du Yiping, Liang Yi-Zeng, Li Boyan and Xu Chengjian. (2005). Orthogonalization of Block Variables by Subspace-Projection for Quantitative Structure Property Relationship (QSPR) Research. J.Chem.Inf.Comput.Sci., 42, 993-1003.
González DÃaz H., Marrero Y., Hernandez I., Bastida I., Tenorio E., Nasco O., Uriarte E., Castañedo N., Cabrera M. A., Aguila E., Marrero O., Morales
A. and Pérez M. (2003). 3D-MEDNEs: An Alternative "In Silico" Technique for Chemical Research in Toxicology. 1. Prediction of Chemically Induced Agranulocytosis. Chem.Res.Toxicol., 16, 1318-1327.
Ivanciuc O., Taraviras S. L. and Cabrol-Bass D. (2000). Quasi-Orthogonal Basis Sets of Molecular Graph Descriptors as a Chemically Diversity Measure. J.Chem.Inf.Comput.Sci., 40, 126-134.
Klein D. J., Randic M., Babic D., Lucic B., Nikolic S. and Trinajstic N. (1997). Hierarchical Orthogonalization of Descriptors. Int.J.Quant.Chem., 63, 215-222.
Lucic B., Nikolic S., Trinajstic N. and Juretic D. (1995). The Structure-Property Models can be Improved Using the Orthogonalized Descriptors. J.Chem.Inf.Comput.Sci., 35, 532-538.
Lucic B. and Trinajstic N. (1997). New Developments in QSPR/QSAR Modeling Based on Topological Indices. SAR & QSAR Environ.Res. , 7, 45-62.
Mracec M., Muresan S., Simon Z. and Naray-Szabo G. (1997). QSARs with Orthogonal Descriptors on Psychotomimetic Phenylalkylamines. Quant.Struct.-Act.Relat., 16, 459-464.
Pogliani L. (1994). Structure Property Relationships of Amino Acids and Some Dipeptides. Amino Acids, 6 , 141-153.
Pogliani L. (1995). Molecular Modeling by Linear Combinations of Connectivity Indexes. J.Phys.Chem., 99 , 925-937.
Pogliani L. (1996). Modeling Purines and Pyrimidines with the Linear Combination of Connectivity Indices-Molecular Connectivity "LCCI-MC" Method. J.Chem.Inf.Comput.Sci., 36, 1082-1091.
Pogliani L. (1997). Modeling Properties of Biochemical Compounds with Connectivity Terms. Amino Acids, 13, 237-255.
Randic M. (1991). Search for Optimal Molecular Descriptors. Croat.Chem.Acta, 64, 43-54.
Randic M. (1991). Orthogonal Molecular Descriptors. New J.Chem., 15, 517-525.
Randic M. (1991). Correlation of Enthalpy of Octanes with Orthogonal Connectivity Indices. J.Mol.Struct.(Theochem), 233, 45-59.
Randic M. (1991). Resolution of Ambiguities in Structure-Property Studies by Use of Orthogonal Descriptors. J.Chem.Inf.Comput.Sci., 31, 311-320.
Randic M. (1992). Similarity Based on Extended Basis Descriptors. J.Chem.Inf.Comput.Sci., 32, 686-692.
Randic M. and Trinajstic N. (1993). Viewpoint 4 - Comparative Structure-Property Studies: the Connectivity Basis. J.Mol.Struct.(Theochem), 284, 209-221.
Randic M. (1993). Fitting of Nonlinear Regressions by Orthogonalized Power Series. J.Comput.Chem., 14, 363-370.
Randic M. (1996). Orthosimilarity. J.Chem.Inf.Comput.Sci., 36, 1092-1097.
Šoškic M., Plavšic D. and Trinajstic N. (1996). Link Between Orthogonal and Standard Multiple Linear Regression Models. J.Chem.Inf.Comput.Sci., 36, 829-832.
Šoškic M., Plavšic D. and Trinajstic N. (1996). 2-Difluoromethylthio-4,6-bis-(monoalkylamino)-1,3,5-triazines a Inhibitor of Hill Reaction: A QSAR Study with Orthogonalized Descriptors. J.Chem.Inf.Comput.Sci., 36, 146-150.