A Comparative Analysis of Machine Learning Algorithms for Multi-Task Prediction of the Parameters of the Pectin Hydrolysis--Extraction Process
Title: Evaluating Machine Learning Models for Multi-Task Parameter Prediction in Pectin Hydrolysis and Extraction
Abstract
This research tackles the difficulty of managing the intricate, multi-parameter pectin hydrolysis and extraction process by employing machine learning techniques. The study is grounded in a distinctive dataset derived from 1,000 laboratory trials performed under strictly controlled conditions. These experiments utilized seven distinct types of plant raw materials and manipulated four key process variables: temperature (ranging from 85 to 130°C), pressure (0.9 to 2.2 atm), holding time (3 to 10 minutes), and pH levels (1.5 to 2.0). The study monitored four specific output metrics: the degree of esterification, molecular weight, galacturonic acid content, and overall pectin yield.
To address the multi-task regression challenge, the researchers trained and evaluated 11 different algorithms. These included regularised linear models, support vector regression, k-nearest neighbours, a multilayer perceptron, and various ensemble methods such as Random Forest, Gradient Boosting, Extra Trees, XGBoost, and CatBoost. Following hyperparameter optimization, CatBoost emerged as the top-performing model, achieving an average R-squared value of approximately 0.946.
An analysis of feature importance highlighted that the type of raw material was the most significant factor, accounting for 63.6% of the total importance, while temperature and holding time also played substantial roles. The resulting analytical pipeline was packaged for production and made accessible via an interactive web interface. The results indicate that integrating ensemble methods with robust statistical analysis and interpretable AI can significantly lower the reliance on physical experimentation, thereby establishing a foundation for intelligent control in pectin manufacturing.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





