Saturday, August 2, 2025

Enhancing Patient Care: Developing a Natural Language Processing Model for Extracting Patient-Reported Symptoms

Share

Exploring NLP in Pharmaceutical Care: A Study on Extracting Symptoms and Adverse Events

Introduction

In the fast-evolving realm of healthcare, the integration of technology is revolutionizing how clinical information is processed and interpreted. This article delves into a significant study approved by the ethics committee for research involving human subjects at the Keio University Faculty of Pharmacy, aimed at enhancing the extraction of symptoms and adverse events (AEs) from clinical texts within pharmaceutical care records.

Overview of the Study

The goal of the research was to harness Natural Language Processing (NLP) methods tailored for extracting critical patient-reported information from the “Subjective” sections of pharmaceutical care records maintained by community pharmacists. The study took a two-pronged approach: firstly, evaluating the performance of an existing general-purpose NLP model specifically designed for Japanese clinical documents, and secondly, developing a new Named Entity Recognition (NER) model customized for this research.

Methodology

The research utilized two layers of annotation: patient-centric (PCA) and clinician-centric (CCA). PCA focused on expressions from the patient’s perspective, whereas CCA reflected clinician-oriented annotations. This dual approach allowed for a comprehensive assessment of symptom extraction capabilities.

A pivotal part of the study involved creating and refining ground-truth data, which became the foundation for evaluating the models’ performance. Researchers used these annotations to compare against model outputs, establishing a robust framework for accuracy and reliability.

Data Collection: Pharmaceutical Care Records

A staggering 2,180,902 pharmaceutical care records covering 291,150 patients were reviewed as part of the study. These records, collected from Nakajima Pharmacy—a community pharmacy network in Hokkaido, Japan—were specifically filtered to isolate patient statements from the “Subjective” sections. The focus was on patients prescribed anticancer drugs, allowing the researchers to enrich the dataset with records likely to contain descriptions of AEs.

To maintain the study’s clarity, records were confined to a manageable time frame, and a random selection of entries was conducted across various pharmacies, culminating in a dataset of 1,054 records pertaining to 314 unique patients.

Data from Patient Blog Articles

In an effort to evaluate the newly developed NER model’s flexibility, patient-authored blogs from the “Life Palette” web community were also utilized. This practice not only broadened the scope of the study but also introduced a diverse set of patient expressions related to symptoms and complaints.

Between March 2008 and November 2014, relevant blog articles were collected and analyzed. This additional data source helped validate the model’s applicability across various contexts, thus enriching the overall findings of the research.

Annotation Guidelines

The study employed two distinct annotation methodologies—CCA and PCA—focusing on extracting symptoms and AEs.

Clinician-Centric Annotations (CCA)

The CCA method adjusted conventional NER guidelines to align with expected outcomes from the MedNER-CR-JA model, a pre-trained NER framework designed to identify medical entities in clinical texts.

This approach detailed specific rules for recognizing terminologies, even accommodating colloquial expressions and identifying ambiguities common in clinical documentation. Noteworthy examples from this methodology included how to categorize expressions of symptom absence and phrases referencing medication control.

Patient-Centric Annotations (PCA)

Conversely, the PCA methodology zeroed in on the patients’ linguistic expressions, crafting a custom guideline that included nuanced definitions and tagging rules relevant to AEs. This customization allowed annotators to address both medical and everyday language, crucial for capturing a patient’s subjective experience accurately.

The PCA guidelines also included logic for categorizing symptoms based on temporality and expression nuances, reinforcing the study’s patient-centered approach.

Development of New Named Entity Recognition Models

To advance the capability of symptom extraction, the researchers developed new NER models leveraging PCA data. These models utilized advanced architectures like BERT and LUKE, both of which are known for their efficacy in understanding complex language patterns.

The development process involved a two-stage training approach: initial pre-training and subsequent fine-tuning using the specially curated PCA dataset. During fine-tuning, additional layers such as a Conditional Random Field (CRF) were incorporated to optimize sequence labeling.

Performance Evaluation Metrics

To validate the model performance, a series of evaluations were implemented based on position and exact matching criteria. The study utilized standard metrics of precision, recall, and F1 scores, ensuring a comprehensive understanding of each model’s effectiveness.

  1. Precision measures how many of the predicted symptoms were correct.
  2. Recall assesses how many actual symptoms were correctly predicted.
  3. F1 Score provides a balance between precision and recall, particularly valuable when dealing with imbalanced datasets.

These metrics facilitated a clear comparison of model outputs against annotated data, contributing to a thorough evaluation of both the pre-existing MedNER-CR-JA model and the newly developed models.

Applicability to Broader Patient Narratives

Additionally, the researchers investigated the model’s applicability to diverse data sources by utilizing blog entries from patients with breast cancer. By cross-referencing the annotated complaints with clinical texts, the team aimed to determine how well the new model could extract clinically relevant symptoms from patient narratives, further emphasizing its relevance in real-world applications.

In essence, the study not only contributes to advancing NLP capabilities in clinical settings but also highlights the importance of adapting technology to meet the nuanced needs of both healthcare professionals and patients alike. The integration of these models promises to enhance patient care by facilitating a more accurate understanding of their experiences and symptoms, ultimately improving treatment outcomes.

Read more

Related updates