Current predictor: Aerobic biodegradation

Last update: June 26, 2022


The regression model was built on more than 12,000 data points and included SMILES strings (converted to fingerprints), time (e.g., 28 days), guideline (e.g., OECD 301B), and analytical principle (e.g., CO2 evolution) as the inputs. The predictions give specific biodegradation percentages.

ML algorithms:
A total of 12 ML algorithms were examined, including Ridge, Lasso, K nearest neighbors, Support vector regression, Decision tree, Random forest, Extra trees, Bagging, Adaptive boosting, Gradient boosting, and XGBoost.

Chemical representation:
A total of 9 types of chemical representations were examined, including various molecular fingerprints, molecular descriptors, and the combinations of them.

Other notes:
Hyperparameter optimization was performed to tune the model hyperparameters. Chemical similarity calculation was conducted using the fingerprint similarity based on Tanimoto index to determine the model applicability domain.