Saturday, August 2, 2025

Comprehensive Pelvic MRI Dataset for Deep Learning in Endometriosis Organ Segmentation

Share

Interrater Agreement Analysis: A Deep Dive into Accuracy in Endometriosis Imaging

Understanding Interrater Agreement

Interrater agreement analysis is crucial in medical imaging, particularly when it comes to assessing the reliability of segmentations made by different raters. In the context of endometriosis, a prevalent gynecological condition, accurate segmentation of key anatomical structures—like the uterus and ovaries—can significantly influence diagnosis and treatment options.

This analysis utilized Krippendorff’s alpha, a robust statistical measure, to evaluate the agreement level among three raters who provided binary segmentation maps for each voxel in the first dataset. The uterus, being a larger and more consistently shaped organ, showed a Krippendorff’s α value of 0.73, indicating strong agreement. On the other hand, the segmentation of the ovaries yielded a lower alpha value of 0.46, highlighting the challenges raters faced due to the complex shape and smaller size of this anatomical structure.

Segmentation Quality Metrics

To measure the overall segmentation quality, we calculated the Dice Similarity Coefficient (DSC), which further elaborated on the differences in accuracy between the uterus and ovaries. The uterus achieved an average DSC of 0.73 ± 0.18, while the ovaries lagged behind at 0.48 ± 0.24. This discrepancy underscored the variability that can occur in manual segmentations, particularly for smaller organs like the ovaries, which possess a more intricate morphology.

Evaluating Agreement Through Gwet’s AC2

Complementing Krippendorff’s α, we also employed Gwet’s AC2 to evaluate the pairwise interrater reliability. Gwet’s coefficients ranged between 0.85 and 0.87 for the uterus, showcasing a high level of agreement among raters. In contrast, the ovaries had a median value of 0.72, reflecting a moderate level of agreement that could be attributed to the challenges inherent in accurately contouring the ovary on MRI scans.

Visualizing Interrater Agreement

For a more visual representation of the segmentations, we examined pairwise agreement and similarity with the help of figures detailing the DSC and Gwet’s AC2 scores. These figures illustrated how different raters performed relative to one another and confirmed that the discrepancies in ovary segmentation were consistent across all raters, suggesting that manual segmentation of ovaries requires closer attention and perhaps more training.

Implications for Clinical Application

The findings of our interrater agreement analysis have significant implications for clinical practices, especially as they relate to developing automated tools for ovary segmentation. The observed lower interrater agreement for ovaries indicates the need for improved methodologies, perhaps through the adoption of advanced automated segmentation techniques.

This analysis also sets the stage for future explorations into the automated segmentation of endometriomas, as our next steps will involve enrolling additional subjects to further evaluate and refine these processes.

Transition to Auto-segmentation Methods

Building upon the insights garnered from our agreement analysis, we pivoted our focus to developing an auto-segmentation method tailored for the uterus and ovaries. This method harnesses deep learning technologies to enhance the segmentation accuracy that various raters struggled to achieve manually.

Data Preprocessing

A critical first step in developing the auto-segmentation pipeline was the careful preprocessing of the selected subjects from the second dataset. The dataset was partitioned into training, validation, and test subsets at the patient level to ensure independence. The MRI data underwent clipping to eliminate extremes in intensity values and normalization processes to standardize the range of values across images. This preprocessing aimed to emphasize the anatomical features of the ovaries, allowing for a more effective training of the segmentation model.

Development of the RAovSeg Pipeline

The RAovSeg pipeline, illustrated in our accompanying figures, operates through two main components: ResClass and AttUSeg. ResClass, our classifier based on a modified ResNet architecture, was trained on 2D MRI slices to efficiently identify those containing ovarian structures. By employing dropout layers and regularization techniques, this component was optimized to mitigate overfitting, ensuring a robust classification performance.

In the segmentation phase, AttUSeg utilizes an Attention U-Net architecture explicitly designed to focus on the intricate details of the ovaries. A custom loss function, the Focal Tversky Loss, was implemented to adeptly handle the imbalanced class distribution typical of segmentation tasks involving small structures.

Quantitative and Qualitative Evaluation

Evaluating the performance of our segmentation method employed both quantitative measures—like DSC—and qualitative assessments, showcasing the generated segmentation outputs against the ground truth. The preliminary results indicated a promising average DSC of 0.290, surpassing that of existing models like nnU-Net, which recorded a DSC of only 0.272.

In terms of qualitative results, we can visually discern improvements in segmentation accuracy pre- and post-processing. Figures illustrating segmentation results reveal how our methodology managed to mitigate false positives effectively, enhancing the clarity and reliability of segmentation outputs, especially near the ovarian structures.

Ultimately, the promising results of the RAovSeg pipeline suggest a pathway for integrating automated segmentation methods into clinical practices, which could significantly improve the diagnostic process for conditions like endometriosis.

By addressing segmentation challenges through advanced technologies and methodologies, we are laying the groundwork for enhancing patient care and outcomes in gynecological health.

Read more

Related updates