Prediction of pH-dependent aqueous solubility of druglike molecules.
Niclas Tue Hansen, Irene Kouskoumvekaki, Flemming Steen Jørgensen1, Søren Brunak and Svava Ósk Jónsdóttir.
J Chem Inf Model: 46(6): 2601-9, 2006.

Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Lyngby, Denmark
1Danish University of Pharmaceutical Sciences, Universitetsparken 2, DK-2100 Copenhagen, Denmark

PMID: 17125200


In the present work, the Henderson-Hasselbalch (HH) equation has been employed for the development of a tool for the prediction of pH-dependent aqueous solubility of drugs and drug candidates. A new prediction method for the intrinsic solubility was developed, based on artificial neural networks that have been trained on a druglike PHYSPROP subset of 4548 compounds. For the prediction of acid/base dissociation coefficients, the commercial tool Marvin has been used, following validation on a data set of 467 molecules from the PHYSPROP database. The best performing network for intrinsic solubility predictions has a cross-validated root mean square error (RMSE) of 0.70 log S-units, while the Marvin pKa plug-in has an RMSE of 0.71 pH-units. A data set of 27 drugs with experimentally determined pH-solubility curves was assembled from the literature for the validation of the combined pH-dependent model, giving a mean RMSE of 0.79 log S-units. Finally, the combined model has been applied on profiling the solubility space at low pH of five large vendor libraries.