pHSol - 1.1

pH-dependent aqueous solubility of druglike molecules

The pHSol 1.1 server predicts pH-dependent aqueous solubility of druglike molecules.


Paste or import molecules in SMI format (examples):

Submit a file in SMI format directly from your local disk:

Submit a file in SDF format directly from your local disk:


At most 500 compounds per submission.

The compounds are kept confidential and will be deleted after processing.


For publication of results, please cite:

Prediction of pH-dependent aqueous solubility of druglike molecules.
Niclas Tue Hansen, Irene Kouskoumvekaki, Flemming Steen Jørgensen, Søren Brunak and Svava Ósk Jónsdóttir.
J Chem Inf Model: 46(6): 2601-9, 2006.


1. Specify the input molecules

This web-server calculates intrinsic solubility and pH-dependent solubility profiles of drugs and drug-like molecules from molecular structure.

There are three possible ways of entering molecular structure information.
1) Insert the SMILES strings of the molecules in the window to the left.
2) Read the SMILES strings from a file (*.smi)
3) Read the structures from a sdf file (*.sdf) Sample copy

The SMILES entry window and the smi file should have the following format (SMILES-string identifier). Multiple structures can be entered on separate lines.

CC(=O)OC1=CC=CC=C1C(=O)O Aspirin
CCOC(=O)C1=CC=C(C=C1)N Benzocaine
C1C2C(C(C1Cl)Cl)C3(C(=C(C2(C3(Cl)Cl)Cl)Cl)Cl)Cl Chlordane
CCOC(=O)CC(C(=O)OCC)SP(=S)(OC)OC Malathion
C1=CC=C(C=C1)C2(C(=O)NC(=O)N2)C3=CC=CC=C3 Phenytoin
CC12CCC3C(C1CCC2O)CCC4=CC(=O)CCC34C Testosterone

Identifiers can be left out, entering SMILES only on separate lines.

2. Customize your run

Click on the button labelled "Intrinsic solubilities only" if you only want to calculate the intrinsic solubilities.
Click on the button labelled "Generate graphics" if you want to calculate the pH-dependent solubility profiles as well.

3. Submit the job

Click on the "Submit" button. The status of your job (either 'queued' or 'running') will be displayed and constantly updated until it terminates and the server output appears in the browser window.

At any time during the wait you may enter your e-mail address and simply leave the window. Your job will continue; you will be notified by e-mail when it has terminated. The e-mail message will contain the URL under which the results are stored; they will remain on the server for 24 hours for you to collect them.

Prediction of intrinsic aqueous solubility (logS0): The prediction server calculates intrinsic solubility (solubility of the non-ionized compound) values based on the chemical structure information. For the prediction a neural network model trained on a 4548  compound drug-like data set from the PHYSPROP database, and tested on two different of external validation sets, is used. The model is built on nine 2D-MOE (Molecular Operational Environment) descriptors, and the training was done in a three-fold cross validation using a fully connected feed-forward neural network with one hidden layer and nine hidden neurons. (The chemical structures can be fed to the server in SMILES or sdf format, or sketched in the JME applet, the structures are transformed to 2D sdf-format with the Molconvert program from ChemAxon and the descriptors at generated with MOE and fed to the NN-predictor.)


Confidence estimate for the intrinsic solubility prediction: A simple confidence index is assigned to each predicted solubility value, indicating how well the compound matches the chemical space of the training set. Compounds for which the predicted target value or the most important descriptor fall within two standard deviations of the corresponding values of the training set are considered to have high accuracy. For compounds that fall within three standard deviation of either property the prediction accuracy is evaluated to be moderate, and compounds that fall outside this range are evaluated to have low prediction accuracy.


Prediction of pH-dependent aqueous solubility: The predicted pH-solubility profiles (logS) are calculated using the Henderson-Hasselbalch (HH) equation, using the logS0 values predicted with the server described above and acid-base dissociation coefficients (pKa values) computed with the Marvin program from ChemAxon.


In case of a monoprotic acid the HH equation has the form

log S = log S0 + log (1 + 10pH-pKa)


in the case of a monoprotic base becomes

log S = log S0 + log (1 + 10pKa-pH)


and for an ampholyte, the above two equations are combined to give

log S = log S0 + log (1 + 10pH-pKa(acid) + 10pKa(base)-pH)


The salt solubility limit of the compounds are not implemented in the present model, but will be included in the next version, and for this purpose a new data set has been measured by colleges at Warsaw University of Technology in Poland.


# Compound                   p_logS0    p_pKa          p_pKb          Rel.
# ========================================================================
Aspirin                      -1.389     3.41                          high
# ========================================================================


Download the numerical data


Prediction of pH-dependent aqueous solubility of druglike molecules.
Niclas Tue Hansen, Irene Kouskoumvekaki, Flemming Steen Jørgensen1, Søren Brunak and Svava Ósk Jónsdóttir.
J Chem Inf Model: 46(6): 2601-9, 2006.

Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Lyngby, Denmark
1Danish University of Pharmaceutical Sciences, Universitetsparken 2, DK-2100 Copenhagen, Denmark

PMID: 17125200


In the present work, the Henderson-Hasselbalch (HH) equation has been employed for the development of a tool for the prediction of pH-dependent aqueous solubility of drugs and drug candidates. A new prediction method for the intrinsic solubility was developed, based on artificial neural networks that have been trained on a druglike PHYSPROP subset of 4548 compounds. For the prediction of acid/base dissociation coefficients, the commercial tool Marvin has been used, following validation on a data set of 467 molecules from the PHYSPROP database. The best performing network for intrinsic solubility predictions has a cross-validated root mean square error (RMSE) of 0.70 log S-units, while the Marvin pKa plug-in has an RMSE of 0.71 pH-units. A data set of 27 drugs with experimentally determined pH-solubility curves was assembled from the literature for the validation of the combined pH-dependent model, giving a mean RMSE of 0.79 log S-units. Finally, the combined model has been applied on profiling the solubility space at low pH of five large vendor libraries.


Correspondence:        Technical Support: