Aerobic biodegradation is a natural process where complex organic substances are broken down into smaller and simpler compounds by the enzymes produced by the microorganisms when oxygen is present. It usually involves the metabolic and enzymatic processes forming the final products of carbon dioxide and water.
The fundamental units of biodegradation are functional groups, as these are the ‘elements’ that undergo the transformation during the course of microbial catabolism.
Different chemicals may share similarities in terms of such elements or structures, and physicochemical properties. If a relationship can be built between the biodegradability and the chemical properties, predictions on the biodegradability can therefore be made for unknown compounds.
With the advances of data science in recent years, the development of predictive models has been encouraged using machine learning based on the documented experimental data for easy and quick prioritization of newly developed compounds.
At Aropha, in addition to experimental determination of the biodegradability, we are also excited about bringing artificial intelligence for in silico prediction based on existing experimental studies. We currently mainly focus on the aquatic environment and have established a dataset containing more than 12,000 data points based on the standardized experimental guidelines such as OECD 301 methods.
Various machine learning algorithms, chemical representations, and categorical encoding methods were investigated to obtain the best model performance. Two models were obtained, i.e., classification and regression models.