Analyzing the Performance of CO₂ Solubility Prediction Models in Imidazolium-Based Ionic Liquids
Introduction to CO₂ Solubility Prediction
The intriguing study of CO₂ solubility in imidazolium-based ionic liquids (ILs) has become a critical focus in the realm of environmental science and engineering. Given their potential in carbon capture technologies, accurately predicting CO₂ solubility is essential for optimizing these systems. The research explored various machine learning models, comparing their performances using graphical analyses and statistical error indicators, thus identifying the best-performing model.
Statistical Error Evaluation
In assessing the accuracy of the different predictive models, researchers employed various statistical error metrics to compare predicted solubility ((x{{co}{2}\:pred})) against experimental solubility ((x{{co}{2}\:exp})). The five metrics used include:
- Mean Absolute Error (MAE): Reflecting the average magnitude of the errors in a set of predictions, without considering their direction.
- Mean Squared Error (MSE): Providing a comprehensive measure of the average squared differences between predicted and observed values.
- Root Mean Square Error (RMSE): Offering a quadratic scoring rule that measures the average magnitude of the errors.
- Standard Deviation (SD): Indicating the spread of error values, showcasing the precision of the predictions.
- Coefficient of Determination (R²): Assessing the proportion of variance in the dependent variable that can be explained by the independent variable(s).
The mathematical representations for these metrics were established as follows:
- MSE is defined as:
$$MSE = \frac{1}{n}\mathop \sum \limits{{i = 1}}^{n} \left( {x{{i,co{2} ~exp}} – x{{i,co_{2} ~pred}} } \right)^{2}$$
- MAE is defined as:
$$MAE = \frac{1}{n}\mathop \sum \limits{{i = 1}}^{n} \left| {x{{i,co{2} ~exp}} – x{{i,co_{2} ~pred}} } \right|$$
- RMSE is calculated as:
$$RMSE = \sqrt {\frac{{\mathop \sum \nolimits{{i = 1}}^{n} \left( {x{{i,co{2} ~exp}} – x{{i,co_{2} ~pred}} } \right)^{2} }}{n}}$$
- SD is expressed as:
$$SD = \sqrt {\frac{{\mathop \sum \nolimits{{i = 1}}^{n} \frac{{\left( {x{{i,co{2} ~exp}} – x{{i,co{2} ~pred}} } \right)}}{{x{{i,co_{2} ~exp}} }}^{2} }}{{n – 1}}$$
- R² is calculated as:
$$R^{2} = 1 – \frac{{\mathop \sum \nolimits{{i = 1}}^{n} \left( {x{{i,co{2} ~\exp }} – x{{i,co{2} ~pred}} } \right)^{2} }}{{\mathop \sum \nolimits{{i = 1}}^{n} \left( {x{{i,co{2} ~pred}} – \bar{x}{{i,co{2} ~\exp }} } \right)^{2} }}$$
As shown in Table 3, various models were analyzed, including DNN, DBN, TabNet, GrowNet, RF, and SVR, leading to the identification of outliers which were removed, yielding a refined dataset comprising 612 training and 153 testing points.
Model Performance Insights
Among the analyzed models, the GrowNet model stood out with a remarkable (R^{2} = 0.9962), (RMSE = 0.0073), (MSE (%) = 0.0054), and (MAE (%) = 0.5324). This model’s superior performance can be attributed to its gradient boosting architecture, allowing it to effectively learn complex, nonlinear relationships within the dataset.
In contrast, traditional models like RF and SVR demonstrated robust performance on training data but struggled when predicting novel data due to their simplistic architectures. Such models tend to capture only rudimentary relationships, which may not encompass the intricate, nonlinear interdependencies inherent in CO₂ solubility datasets.
Graphical Error Analysis
Visual representations are invaluable for interpreting model performances and validating prediction accuracies. Graphical analyses including cross-plots, error distributions, and cumulative frequency curves furnish immediate insights into each model’s strengths and weaknesses.
Cross Plot Analysis
Cross plots, depicting predicted against experimental values, provide a visual guide to predictive accuracy. Models that yield predictions closely aligning with experimental data will exhibit dense clustering near the 45-degree (X=Y) line. In this case, the GrowNet and BBN models exhibited promising results, while ETM-1 and ETM-2 demonstrated high scatter, indicating poor predictive performance.
Error Distribution Curves
An error distribution curve represents the residual error for each model. The closer the cluster of points aligns with the (Y=0) line, the more accurate the model predictions. Models like GrowNet and BNN consistently exhibited lower error distributions, confirming their reliability. Contrarily, ETM-1 and ETM-2 struggled significantly in aligning with experimental readings at solubility levels beyond 0.3.
Cumulative Frequency Curves
These curves plot cumulative frequency against residual errors, serving as critical determinants of model reliability. Figures illustrate that while ETM-1 and ETM-2 demonstrate cumulative frequencies that predict 90% of data with a residual error of 10%, the GrowNet model excels, predicting 90% with residual errors below 1%.
Additional Error Analysis Techniques
Group Error Plots
The absolute error against input parameter values, such as temperature and pressure, provides context for understanding model effectiveness across various scenarios. The analysis confirmed that with increasing temperature, the performance of all models improved, but GrowNet consistently outperformed others across the board.
Model Trend Analysis
By directly examining how variations in pressure and temperature influence CO₂ solubility, researchers can validate model predictions against physical laws. For instance, under constant pressure, CO₂ solubility tends to decrease with rising temperature—a trend accurately captured by the GrowNet model.
SHAP Value Analysis
SHAP (SHapley Additive exPlanations) values reveal the influence and importance of each input feature. The GrowNet model identified pressure as the most significant factor influencing CO₂ solubility, affirming classical thermodynamic expectations.
Conclusion
In the evolving domain of CO₂ solubility predictions, the highlighted statistical and graphical analyses present a clear preference for the GrowNet model. With its ability to capture nonlinear relationships effectively, coupled with robust performances across numerous validation metrics, this model is set to play an essential role in optimizing future carbon capture technologies with imidazolium-based ionic liquids.
Through rigorous assessments, we can progressively enhance our predictive capabilities, pivotal for environmental sustainability and industrial applications related to carbon capture.