Evaluating the Efficiency and Superiority of FCResNet5 Through Experimental Validation
In the pursuit of advancing ship-radiated noise classification, we designed two experiments aimed at validating the efficiency and superiority of our proposed method, FCResNet5. The first experiment is an ablation study focused on understanding the impact of key model components, while the second is a comparative experiment juxtaposing our method against state-of-the-art approaches.
Ablation Experiment: Dissecting FCResNet5
Investigating Key Design Factors
Our ablation study dives deep into the design factors contributing to the performance of FCResNet5. This comprehensive analysis examines various aspects: the comparison between time-frequency and non-time-frequency input features, the influence of frequency bandwidth selection, the effects of window overlap during feature extraction, the role of frequency channelization, and the exploration of suitable network architectures. Collectively, these analyses demonstrate how FCResNet5 strikes a balance between accuracy and computational efficiency, making it particularly well-suited for real-world applications.
Comparing Time-Frequency and Non-Time-Frequency Features
To assess the effectiveness of different input representations for ship-radiated noise classification, we employed ResNet18 across seven feature types: four time-frequency features (STFT, Mel, CQT, and Gamma-tone) and three non-time-frequency features (MFCC, Wavelet, and Cepstrum). Our experiments were rigorously structured, including five randomly generated data splits and averaging results over ten repeated runs.
Table 4 summarizes our findings, revealing that time-frequency representations consistently outperform their non-time-frequency counterparts. Notably, STFT achieved the highest average accuracy of 72.38%, followed closely by Mel at 71.20%. These findings underscore the critical importance of time-frequency features, which preserve more discriminative information essential for effective classification.
To further visually assess this performance, we utilized t-SNE embeddings of extracted features. The visualizations illustrated in Fig. 7 depict more distinct and compact clusters for time-frequency representations, indicating stronger inter-class separability. In contrast, non-time-frequency features, particularly Wavelet and Cepstrum, displayed diffuse distributions, which limits their discriminative power. This analysis led us to adopt the four time-frequency features as primary input representations for subsequent experiments.
Bandwidth Selection Justification
We next focused on validating the rationale behind our 2kHz bandwidth selection. Using ResNet18, we explored various upper and lower frequency limits on model performance, presenting results in Fig. 8 and Table 5. Our findings indicated a general decrease in classification performance as bandwidth widened, suggesting that extending the upper limit introduces unwanted interference. The results confirmed the value of limiting data bandwidth to within 2 kHz, aligning with the optimal parameters needed for effective classification.
The Role of Window Overlap in Feature Extraction
To dissect the impact of windowing strategies on model performance, we performed an ablation study, varying the overlap ratio during time-frequency feature extraction. We tested four settings: no overlap (0%), 25%, 50%, and 75%. The results, shown in Table 6, indicated that the application of overlap generally improved classification performance. For instance, STFT accuracy rose from 78.03% without overlap to 78.78% with 75% overlap. However, increased overlap also led to longer training times, emphasizing a trade-off between performance and computational cost.
Ultimately, we adopted the no-overlap setting for our default configuration to maintain a balance between accuracy and efficiency.
Evaluating Frequency Channelization
Next, we evaluated the effectiveness of Frequency Channelization (FC), applying it to three models: ResNet18, RCMoE-balance, and CFTAnet. We found that introducing FC led to a modest increase in parameter count but a significant reduction in computational cost, particularly for ResNet18 and RCMoE-balance, which exhibited over 90% drop in FLOPs. This reduction translated to shorter training times, affirming that FC enhances training efficiency while preserving, and occasionally improving, classification accuracy, especially for lightweight models like CFTAnet.
Exploring Optimal Network Architectures
In seeking to find the most suitable network architecture for frequency channelization, we compared descending and ascending channel configurations across varying depths. Our experiments indicated that a descending channel configuration consistently yielded higher accuracy. However, this approach required more parameters and computational resources, highlighting a crucial trade-off. Our final design choice for the FCResNet5 reflects a blend of efficiency and performance, featuring an optimal architecture suitable for the frequency-segmented inputs we employed.
Comparative Experiment: FCResNet5 versus the State-of-the-Art
To thoroughly evaluate FCResNet5’s effectiveness, we conducted two complementary comparative experiments. The first assessed classification performance across four time-frequency spectral features: STFT, Mel, CQT, and Gamma-tone. In the second part, we evaluated the robustness of different models under varying signal-to-noise ratio (SNR) conditions, simulating real-world scenarios with degraded acoustic quality.
Performance Across Spectral Features
In our first comparative study, we contrasted the classification performance of FCResNet5 with established models like RCMoE-balance, CFTAnet, and ResNet18. The rigorous dataset split ensured diverse distribution coverage. Results in Tables 11 and 12 sketched a diverse landscape of performance under different overlap conditions.
The statistics revealed that, while ResNet18 often excelled under STFT and Mel inputs, FCResNet5 outperformed all models when using CQT and Gamma-tone features. The results indicated that FCResNet5 achieves competitive performance while offering substantial efficiency benefits, making it a solid candidate for deployment in resource-constrained settings.
Robustness Evaluation Under Varying SNR Conditions
To evaluate model robustness in noisy circumstances, we simulated Gaussian noise across several SNR levels. As illustrated in Fig. 13, while all models suffered accuracy declines with increasing noise levels, FCResNet5 consistently achieved the highest accuracy when SNR was above 0 dB, underscoring its suitability in cleaner environments. However, its performance notably dropped at lower SNR levels, highlighting an avenue for future research to enhance low-SNR resilience while maintaining efficiency.
This rich tapestry of experimental findings paints an informative picture of the efficacy of FCResNet5 in the domain of ship-radiated noise classification. From the meticulous ablation study to the thorough comparisons with state-of-the-art methodologies, our results provide compelling insights into the model’s performance, efficiency, and suitability for practical applications.