Advancing Biodegradability Assessment Through AI


Existing machine learning models built at Aropha, mainly covering biodegradation of organic chemicals in aquatic environment. New models are being added.

Aerobic biodegradation in water

Aerobic biodegradation -- regression

Predicts continuous biodegradation percentages under different times of incubation, based on more than 12,000 data points considering both ready and inherent biodegradation in water. Multiple standard guidelines were included such as OECD 301 and OECD 302.

Aerobic biodegradation in water

Aerobic biodegradation -- classification

Predicts if an organic chemical is readily biodegradable (passing 60% or 70% of degradation in 28 days), based on more than 6,000 data points considering ready biodegradation in water. Data covers multiple standard guidelines and test principles.


The datasets used for the development of above models

Aerobic biodegradation regression

Containing over 12,000 data points and SMILES strings, guideline (e.g., OECD 301F), and principle (e.g., closed respirometer) as the inputs. The biodegradation percentages are the output.

Aerobic biodegradation classification

Containing over 6,000 chemicals with SMILES strings (converted to fingerprints) as the inputs and the classes (0 or 1) as the outputs. Only ready biodegradation data with time of 28 are considered.





Sample Python Code

The example python code (in JupyterNotebook) for making predictions with model files.

Aerobic biodegradation regression

A downloadable Jupyter Notebook guiding you to perform your own predictions step by step using the model file, including data preparation, prediction, accuracy evaluation, and results export.

Aerobic biodegradation classification

A downloadable Jupyter Notebook guiding you to perform your own predictions step by step using the model file, including data preparation, prediction, accuracy evaluation, and results export.


Useful frameworks/libraries used for the development of these models


The most widely used programming language for machine learning.

Jupyter Notebook

One of the most widely used web application for machine learning, which allows users to create and share documents that contain live code, equations, visualizations and narrative text.


One of the most useful tools providing dozens of ML models for classification, regression, clustering, and so on. It is a simple and efficient tool for predictive data analysis.


One of the most popular tools for working with Excel or CSV files, or dataframe. It is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool.


One of the most popularly used tools for working with organic chemistry. It allows users to draw chemicals, calculate molecular fingerprints, perform similarity calculations, and more.


One of most widely used libraries for creating static, animated, and interactive visualizations in Python.


An end-to-end open source platform for machine learning, widely used for developing deep neural network models.


An open source machine learning framework that accelerates the path from research prototyping to production deployment.


Pay per test or subscription

Pay Per Test

Each voucher is used with one chemical, but against unlimited models and conditions.

  • Ideal for occasional prioritization
  • High model prediction accuracy
  • High-quality datasets and wide applicability domain
  • Formal prediction report available
  • No software download and installation
  • Make predictions anytime and anywhere
  • High data security
  • Pay per test/chemical


Based on monthly subscription, one license can be used for predicting unlimited chemicals.
$269/month, billed annually

  • All features in Pay Per Test
  • Ideal for large-volume prioritization
  • One license for unlimited tests/chemicals
  • Dedicated support from our scientists
  • Receive newest updates on models and datasets
  • Eligible for new models
  • Cancel anytime
  • Monthly subscription (billed annually)