Performance Analysis of Hybrid Deep Learning Models
Introduction
In the realm of machine learning, hybrid models that combine various architectures often present significant performance advantages. This article delves into the performance of hybrid deep learning models—specifically, CNN-LSTM-RNN, LSTM-CNN-RNN, and RNN-CNN-LSTM—in the classification of stabilized NS (numerical slope) slopes into stable and unstable categories. Each model’s efficacy is scrutinized through various classification metrics, including accuracy, F1 score, precision, recall, Kappa, and Matthews correlation coefficient (MCC).
Model Performance Overview
Table 6 provides a detailed evaluation of each model’s performance across training and testing phases. Notably, the CNN-LSTM-RNN model boasts the highest training accuracy of 0.996, showcasing its ability to fit the training data exceptionally well. However, when evaluated on unseen data, the LSTM-CNN-RNN model triumphs, achieving an accuracy of 0.996, slightly edging out the CNN-LSTM-RNN’s 0.992.
The RNN-CNN-LSTM model lags marginally, recording 0.992 in training and 0.994 during testing. Delving deeper, the F1 score—a crucial metric for balancing precision and recall—indicates that CNN-LSTM-RNN is the best performer with 0.995 during training and 0.990 during testing. The LSTM-CNN-RNN follows closely with 0.992 for training and similar results in testing, while RNN-CNN-LSTM displays a slightly lower F1 score.
Accuracy Metrics
Accuracy is a pivotal measure that outlines the proportion of correctly predicted positive instances. Here, RNN-CNN-LSTM outshines with impressive scores of 0.999 during training and a perfect 1.000 on testing, underscoring its effectiveness at minimizing false positives. Following this, the CNN-LSTM-RNN and LSTM-CNN-RNN models show lower testing accuracies, with 0.983 and 1.000, respectively.
In terms of precision, which evaluates how well the model captures actual positive instances, the CNN-LSTM-RNN model excels with a withdrawal score of 0.997 during training and 0.998 in testing. The LSTM-CNN-RNN shows a slightly lower withdrawal during training but performs adequately in testing, while RNN-CNN-LSTM lags behind in both scenarios.
Kappa Statistics
Kappa statistics quantify the agreement between expected and actual classifications. Yet again, CNN-LSTM-RNN proves its mettle by recording 0.991 during training and 0.983 in testing, signifying the best alignment between anticipated and actual classifications. Both the other models show similar strengths, with LSTM-CNN-RNN at 0.986 and RNN-CNN-LSTM at 0.984 during training, generally maintaining comparable performance in the testing phase.
Matthews Correlation Coefficient (MCC)
The Matthews correlation coefficient is essential for appraising model performance concerning all potential results—true positives, false positives, true negatives, and false negatives. Here, the CNN-LSTM-RNN model once again leads with an MCC of 0.991 in training and 0.983 in testing. LSTM-CNN-RNN and RNN-CNN-LSTM report close MCC values, further highlighting the robustness of hybrid models.
Hyperparameter Optimization
To achieve optimal performance, a thorough examination of hyperparameters was conducted, leading to the elaboration of Table 7, which outlines the best-optimized parameters for hybrid model development. The integration of CNN and RNN architectures, along with long short-term memory (LSTM) units, enables the precise analysis of both spatial and temporal data related to slope stability.
- CNN filters (ranging from 39 to 57) draw critical spatial properties from key geotechnical parameters such as soil index, cohesion, and pore pressure.
- LSTM units (ranging from 85 to 128) concentrate on long-term dependencies crucial for understanding slope behavior.
- Learning rates varied from 0.000357 to 0.005664, ensuring robust training without sacrificing accuracy.
Hyperparameter Importance
Figure 5 highlights the relative importance of each hyperparameter in optimizing the hybrid model, where the learning rate emerges as the most critical factor (importance score: 0.54). This aligns with the notion that careful tuning of this parameter can significantly influence model accuracy and convergence.
- CNN filters account for 0.24, indicating their crucial role in extracting spatial features.
- The RNN’s contribution stands at 0.12 for modeling sequential dependencies.
- LSTM units show a lesser importance score of 0.06, suggesting simpler recurrent architectures suffice for current tasks or that temporal variations aren’t as complex.
Confusion Matrix Analysis
The confusion matrices presented in Figures 6, 7, and 8 reveal the ability of each model to generalize well to unseen data. The CNN-LSTM-RNN model showcases 2567 True Positives and 1790 True Negatives with a mere 18 False Positives and 47 False Negatives during training. Similarly, the LSTM-CNN-RNN model displays 2598 True Positives and 1804 True Negatives, suggesting stable performance across both training and testing phases.
Overfitting Score Analysis
To analyze overfitting, we turn to Table 8, which details performance differences between training and testing phases. Here’s a closer look at model architectures:
- RNN-CNN-LSTM shows the smallest deviation across multiple metrics, indicating robust generalization capabilities and a lower tendency toward overfitting.
- LSTM-CNN-RNN exhibits slightly larger performance differences.
- In contrast, the CNN-LSTM-RNN architecture suffers from the highest performance discrepancies.
The percentage variation totals (Table 9) further indicate that the RNN-CNN-LSTM is the most consistent performer, showcasing strong generalization ability. Consequently, LSTM-CNN-RNN follows closely, while CNN-LSTM-RNN indicates higher variability, confirming potential overfitting challenges.
Training and Validation Loss
The training versus validation loss graphs in Figures 9, 10, and 11 indicate the models’ robustness. For the CNN-LSTM-RNN, both losses show a significant decline without noticeable divergence, signifying effective learning without overfitting. Similar patterns are recognized in the other two architectures, reinforcing that none of the models are memorizing the training dataset but effectively capturing essential patterns, promoting their reliability.
In conclusion, each hybrid model presents unique strengths tailored to application-specific requirements. Such detailed analyses ensure a clearer understanding of how these architectures perform across different metrics, paving the way for informed decisions in future implementations.