A QSAR model for predicting the blood brain barrier permeability (BBBP) in a large and heterogeneous variety of compounds (136 compounds) has been developed using approximate similarity (AS) matrices as predictors and PLS as multivariate regression technique. AS values fuse information of both the isomorphic similarity and nonisomorphic dissimilarity with the purpose of achieving an accurate predictive space. In addition to the fact of applying AS values to heterogeneous data sets, a new concept on graph isomorphism based on the extended maximum common subgraph (EMCS) is defined for the building of AS spaces considering the atoms and bonds, which are bridges between the isomorphic and nonisomorphic substructures. This new isomorphism detection has as objective to take into account the position and nature of the nucleus substituents, thus allowing the development of accurate models for large and diverse sets of compounds. After an outliers study, the training and test stages were made and the results obtained using several AS approaches were compared. Several validation processes were carried out by means of employing several test sets, and high predictive ability was obtained for all the cases (Q2 = 0.81 and standard error in prediction, SEP = 0.29).
QSAR models based on isomorphic and nonisomorphic data fusion for predicting the blood brain barrier permeability
J. Comput. Chem. 2007, 28, 1252-1260.