Monday, July 21, 2025

Developing an Interpretable Machine Learning Model to Predict Axillary Lymph Node Metastasis in Invasive Breast Cancer Using MRI Radiomics

Share

Understanding the Patient Cohort in Advanced Breast Cancer Research

Overview of the Patient Selection Process

The study under discussion took place at the Affiliated Hospital of North Sichuan Medical College from June 2021 to December 2023, involved a meticulous retrospective analysis of patient data. Significantly, the Institutional Review Board waived the requirement for informed consent, which allows researchers to utilize existing clinical information while complying with ethical standards and national regulations (IRB approval number: 2023ER131-1). The cohort consisted of 344 patients with pathologically confirmed breast cancer (BC).

Inclusion Criteria

To ensure a focused perspective on a specific subgroup of patients, the research established stringent inclusion criteria:

  1. Confirmation of invasive breast cancer (IBC) via surgical or puncture pathology.
  2. Lymph node metastasis (LNM) status confirmed through surgical or pathology methods.
  3. Availability of complete pathological and immunohistochemical results.
  4. No prior history of biopsy, surgery, chemotherapy, or radiotherapy before the MRI evaluation.
  5. Patients had undergone both dynamic contrast-enhanced (DCE) and diffusion-weighted imaging (DWI) scans prior to treatment.

With these criteria, researchers narrowed down the cohort to 183 patients, inclusive of 107 with axillary lymph node metastasis (ALNM) and 76 in a non-ALNM group (depicted in Fig. 1). The patients’ median age was 51 years, spanning a range of 24 to 84 years.

Imaging and Data Acquisition Techniques

For imaging, a sophisticated United Imaging Healthcare 3 T MRI scanner was employed, featuring a specialized 10-channel breast coil to enhance imaging quality. Patients were positioned prone, optimizing the imaging of both breasts and associated axillary regions. Axial T1 weighted imaging was conducted over multiple phases of DCE MRI with specific technical parameters aimed to maximize resolution and accuracy. Following the DCE scan, a standardized contrast agent—Gd-DTPA—was administered, allowing for enhanced visualization of the breast tissues. The entire scanning process was completed in approximately 9 minutes and 36 seconds.

Clinical Data Collected

Alongside imaging data, patient demographic information, such as age and menopause status, was extracted from clinical records. For hormonal receptor analysis, guidelines were adhered to from the American Society of Clinical Oncology (ASCO) and the College of American Pathologists (CAP). Specific definitions were applied to estrogen receptor (ER) and progesterone receptor (PR) statuses based on nuclear staining percentages, while human epidermal growth factor receptor 2 (HER-2) positivity was determined through a combination of immunohistochemical scoring and in situ hybridization results.

Radiomic Feature Extraction

Feature extraction from imaging data involved delineating regions of interest (ROIs) using 3D Slicer software, a common tool in radiomics. Two experienced imaging physicians carefully defined these ROIs layer by layer, ensuring that multiple tumor loci were considered—analysis focused on the largest tumors in cases of multicentric or multifocal diseases. The extraction process produced a substantial dataset consisting of 1,223 features through a Python-based approach using the “Pyradiomics” package.

Inter-Observer Agreement Assessment

To evaluate the reliability of the feature extraction process, a random subset of DCE and DWI images was analyzed to assess inter-observer agreement. Commonly used inter-class correlation coefficients (ICCs) were applied, leading to the filtration of features based on their stability. Quality control measures confirmed a total of 1,138 features for DCE sequences and 923 for DWI sequences, ensuring the robustness of subsequent analyses.

Model Development and Feature Selection

The researchers utilized Python software to conduct model development and feature selection, with a focus on optimizing predictive capabilities. A total of 24 features across DCE and DWI sequences were retained, organized through various methodologies including variance thresholding and advanced machine learning techniques such as the Least Absolute Shrinkage and Selection Operator (LASSO). These features ultimately contributed to constructing three predictive models, centered around the three imaging modalities.

Statistical Analysis

All analyses were executed using SPSS and Python, incorporating a range of statistical tests to evaluate both continuous and categorical data. Techniques like the Mann-Whitney U test were used for skewed data distributions, while the χ2 test assessed relationships in count data. The effectiveness of the predictive models was assessed using metrics including area under the curve (AUC), sensitivity, specificity, and accuracy, providing a comprehensive understanding of model performance.


This article presents insights into patient selection and data handling strategies in breast cancer research, detailing the complexities of MRI imaging, feature extraction, and statistical methodologies that contribute to advancing our understanding of breast cancer prognosis and treatment efficacy.

Read more

Related updates