Advancing Biodegradability Assessment Through AI

Models

Existing machine learning models built at Aropha, mainly covering biodegradation of organic chemicals in aquatic environment. New models are being added.

Aerobic biodegradation in water

Aerobic biodegradation -- regression

Predicts continuous biodegradation percentages under different times of incubation, based on more than 12,000 data points considering both ready and inherent biodegradation in water. Multiple standard guidelines were included such as OECD 301 and OECD 302.

Aerobic biodegradation in water

Aerobic biodegradation -- classification

Predicts if an organic chemical is readily biodegradable (passing 60% or 70% of degradation in 28 days), based on more than 6,000 data points considering ready biodegradation in water. Data covers multiple standard guidelines and test principles.

Datasets

The datasets used for the development of above models

Aerobic biodegradation regression

Containing over 12,000 data points and SMILES strings, guideline (e.g., OECD 301F), and principle (e.g., closed respirometer) as the inputs. The biodegradation percentages are the output.

Aerobic biodegradation classification

Containing over 6,000 chemicals with SMILES strings (converted to fingerprints) as the inputs and the classes (0 or 1) as the outputs. Only ready biodegradation data with time of 28 are considered.

DATA POINTS

CHEMICALS

ALGORITHMS

MODELS

Sample Python Code

The example python code (in JupyterNotebook) for making predictions with model files.

Aerobic biodegradation regression

A downloadable Jupyter Notebook guiding you to perform your own predictions step by step using the model file, including data preparation, prediction, accuracy evaluation, and results export.

Aerobic biodegradation classification

A downloadable Jupyter Notebook guiding you to perform your own predictions step by step using the model file, including data preparation, prediction, accuracy evaluation, and results export.

Tools

Useful frameworks/libraries used for the development of these models

Python3

The most widely used programming language for machine learning.

Jupyter Notebook

One of the most widely used web application for machine learning, which allows users to create and share documents that contain live code, equations, visualizations and narrative text.

Scikit-learn

One of the most useful tools providing dozens of ML models for classification, regression, clustering, and so on. It is a simple and efficient tool for predictive data analysis.

Pandas

One of the most popular tools for working with Excel or CSV files, or dataframe. It is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool.

RDKit

One of the most popularly used tools for working with organic chemistry. It allows users to draw chemicals, calculate molecular fingerprints, perform similarity calculations, and more.

Matplotlib

One of most widely used libraries for creating static, animated, and interactive visualizations in Python.

TensorFlow

An end-to-end open source platform for machine learning, widely used for developing deep neural network models.

PyTorch

An open source machine learning framework that accelerates the path from research prototyping to production deployment.

Licensing

Pay per test or subscription

Pay Per Test

Each voucher is used with one chemical, but against unlimited models and conditions.
$69/voucher


  • Ideal for occasional prioritization
  • High model prediction accuracy
  • High-quality datasets and wide applicability domain
  • Formal prediction report available
  • No software download and installation
  • Make predictions anytime and anywhere
  • High data security
  • Pay per test/chemical

Subscription

Based on monthly subscription, one license can be used for predicting unlimited chemicals.
$269/month, billed annually


  • All features in Pay Per Test
  • Ideal for large-volume prioritization
  • One license for unlimited tests/chemicals
  • Dedicated support from our scientists
  • Receive newest updates on models and datasets
  • Eligible for new models
  • Cancel anytime
  • Monthly subscription (billed annually)