Monday, December 29, 2025

Enhancing Transfusion Decisions in Acute Upper Gastrointestinal Bleeding: A Novel Machine Learning Approach with Clinical Validation

Share

Clinical Data Collection in AUGIB: A Deep Dive

Introduction to the Study

Clinical data collection is vital in understanding how to manage acute upper gastrointestinal bleeding (AUGIB). This study, approved by the Ethics Review Committee of the First Hospital of Shanxi Medical University, follows strict ethical guidelines as outlined in the Declaration of Helsinki. Conducted as a single-center retrospective observational cohort study, it sheds light on the complexities surrounding AUGIB management.

Data Source and Population

The study analyzed data from the electronic medical records of 1,251 patients who presented to the emergency department with AUGIB from January 2022 to December 2023. After an initial screening, 1,177 patients met the inclusion criteria for a robust final analysis. Additionally, to ensure external validation of the findings, 209 more patients with identical criteria were sourced from Fenyang Hospital during the same timeframe.

Inclusion and Exclusion Criteria

Inclusion criteria were focused on adults aged 18 years or older diagnosed with AUGIB, confirmed via endoscopy or notable clinical symptoms such as hematemesis, melena, or coffee-ground emesis. Patients were excluded if they were under 18, experienced concurrent lower gastrointestinal bleeding, lacked outcome indicators, or had more than 20% missing data for crucial variables. This careful selection process aimed to maintain the integrity of the dataset, illustrated clearly in the accompanying flowchart of data processing.

Original Data Set Overview

The dataset comprised 67 clinical features categorized into four key groups:

  1. Demographics: Patient age, gender, comorbidities (like hypertension and liver cirrhosis), and medication history (including anticoagulants).
  2. Vital Signs: Data collected upon admission included systolic blood pressure, pulse rate, shock index, and the Glasgow Coma Scale score.
  3. Emergency Laboratory Indicators: Key laboratory results, including hemoglobin levels, coagulation markers, and liver and kidney function markers, were recorded before any transfusion.
  4. Time-Related Parameters: This included the time from hospital arrival to the establishment of intravenous access and blood collection.

Furthermore, non-structured data from emergency records were extracted using natural language processing to capture vital diagnostic information, while the outcome variables analyzed included whether a blood transfusion was administered and the volumes of different blood products.

Data Preprocessing Techniques

Data preprocessing is essential to ensure that the raw data can be utilized effectively in analyses and modeling. Categorical variables were transformed into numerical representations using One-Hot Encoding. For example, variables like gender and presence of diseases were appropriately coded. Missing values were handled using a multimodal imputation strategy, which catered to mixed data types. For categorical variables, Multivariate Imputation by Chained Equations (MICE) was employed, ensuring clinical medication patterns were maintained, while integer and continuous values were imputed using median and random forest regression respectively.

To tackle class imbalances, specifically between the transfusion group (comprising 40.4% of cases) and the non-transfusion group, the Synthetic Minority Over-sampling Technique (SMOTE) was applied. Additional transformations like log1p transformation and Min-Max normalization corrected skewed distributions and preserved data variations.

Statistical Analysis

Categorical data were presented as counts and percentages, while continuous data were expressed as means with standard deviations or medians with interquartile ranges. Differences between groups were evaluated using the χ² test or Fisher’s exact test. Statistical significance was set at p < 0.05, ensuring robust analyses of outcomes between the transfusion and non-transfusion cohorts.

Model Construction Insights

A novel hierarchical ensemble model based on Multi-Task Learning (MTL) was devised, focusing on simultaneous classification and regression tasks. This innovative model mimics the clinical decision-making process, incorporating hierarchical feature selection and task-level collaborative optimization. The ensemble model enhances interpretability and predictive robustness, streamlining the approach to blood transfusion decision-making.

Key Innovations in the Model

  1. Hierarchical Feature Selection: The two-stage approach decreases dimensionality and boosts interpretability while maintaining pertinent features.
  2. Hybrid Ensemble Learning: A combination of models like CatBoost and XGBoost ensures effective handling of both categorical and continuous variables in AUGIB data.
  3. Multi-Task Collaborative Optimization: This strategy enhances performance by utilizing shared feature spaces across tasks, improving overall outcomes.

Evaluation of Model Performance

The study employed stratified sampling to create a training cohort of 942 patients and a testing cohort of 235 patients, with 5-fold cross-validation as an evaluation technique. External validation was conducted using the independent cohort, ensuring the model’s robustness.

The model’s evaluation metrics included not only traditional parameters like accuracy, precision, and recall but also specific measures such as the area under the receiver operating characteristic curve (AUCROC). For regression tasks, mean squared error (MSE) and mean absolute error (MAE) were calculated after inverse transformations to assess actual transfusion volume deviations properly.

Conclusion

This comprehensive study highlights the significance of sophisticated data collection and modeling techniques in enhancing the understanding and management of AUGIB. Through a robust combination of methods and technologies, it paves the way for improved clinical decision-making and patient outcomes in emergency medicine.

Read more

Related updates