Understanding the Use of GenAI in Patient Message Drafting
Overview of GenAI Research in Healthcare
The realm of healthcare is witnessing an inspiring shift toward leveraging innovative technologies, particularly Generative AI (GenAI), for improving patient communication. A synthesis of the latest studies reveals promising insights into how GenAI is being used to draft responses to patient messages, aiming to enhance both efficiency and content quality in clinical settings.
Study Selection and Characteristics
A comprehensive literature search was conducted across five prominent databases: ACM Digital Library, IEEE Xplore, PubMed, Scopus, and Web of Science, yielding 3,980 potentially relevant papers. After eliminating duplicates, 1,977 articles remained for screening. Rigorous evaluation based on titles and abstracts led to 693 papers progressing to full-text review. Ultimately, 23 studies met the established inclusion criteria, all published between 2023 and 2025, with the majority originating from the United States.
Among these, 16 papers were featured in medical and informatics journals such as JAMA Network Open and Journal of the American Medical Informatics Association, while seven were published in specialty-focused journals like Urology Practice and Ophthalmology Science. The publication types varied, with 17 being full-length research articles, along with a mix of research letters, brief communications, a perspective, and a commentary.
Settings and Objectives
These studies primarily evaluated GenAI applications in two distinct settings: live Electronic Health Record (EHR) systems and simulated environments. Seven studies investigated GenAI’s performance in real-world EHR systems, specifically within the Epic system, utilizing OpenAI’s GPT-4 to generate draft replies. Most sought to assess the content quality of AI-generated drafts, while others examined the operational efficiency of GenAI implementations and its effect on user perceptions.
Conversely, 16 studies conducted assessments in controlled, simulated environments, focusing largely on content quality, response accuracy, and user feedback on AI-generated drafts. Different versions of GPT models, including customized and retrieval-augmented forms, were frequently tested against various communication scenarios.
Evaluating GenAI in Clinical Contexts
Primary care—encompassing internal medicine, family medicine, and pediatrics—was the backdrop for many of the studies, with researchers also addressing various specialty domains such as dermatology, urology, and oncology. Topics of patient communication ranged from basic administrative inquiries, like appointment scheduling, to more complex medical inquiries.
Participant Dynamics in Studies
Participants in these studies included a diverse range of clinical experts—physicians, advanced practice providers, nurses, and medical trainees—who reviewed AI drafts and provided feedback. A smaller pool of studies also included non-clinical participants, mainly patient advisors, to assess the responses’ tone and overall satisfaction.
Methods of Evaluation
The studies employed an array of evaluation methods, primarily utilizing human ratings on Likert scales to assess GenAI responses. Additional computational metrics were applied in some cases to measure text length, response time, and overall accuracy.
Evaluation outcomes were categorized into five primary groups:
-
Information Quality: This included assessments of accuracy, completeness, relevance, and factual integrity of AI-generated replies.
-
Communication Quality: Here, aspects such as empathy, tone, readability, and overall clarity were key focal points.
-
User Perception, Experience, and Preference: Studies captured how actual users perceived the AI responses, including trust in the AI’s capabilities.
-
Utilization and Efficiency: Metrics were gathered on how often AI drafts were used versus blank replies, alongside time spent on reading and editing.
- Composite Measures: These combined elements of quality and appropriateness in assessing overall response adequacy.
Consensus from Early Findings
Findings across the 23 studies indicated a favorable reception of GenAI drafts, which often matched or surpassed human-created responses in several areas. Notably, drafts were frequently rated as comparable in information accuracy and more effective in communication style, especially in terms of empathy. GPT-4 emerged as the most successful model in these evaluations.
However, several risks were identified, including inconsistent AI performance, especially in complex clinical inquiries. Alarmingly, a few studies highlighted scenarios where AI drafts posed potential risks to patient safety.
Perception of AI Responses
Participants struggled to distinguish between AI and human-generated content, failing to accurately identify authorship in blinded evaluations. Interestingly, while many perceived AI responses positively, there was a slight dip in satisfaction when AI involvement was disclosed.
Adoption Challenges Despite Positive Attitudes
Despite recognizing the benefits of GenAI, real-world adoption remained subdued. Clinical experts appreciated the value of AI-drafted templates for easing their workload, but the actual utilization rates were relatively low, staying around 20% across pilot implementations.
Efficiency and Burnout Insights
While no significant evidence was found indicating time savings, perceived improvements in efficiency and reduced cognitive load were reported among clinicians. Although AI drafts were longer than human counterparts—adding to editing burdens—clinicians often felt that GenAI tools alleviated workload pressures.
The Role of Prompt Engineering
Prompt engineering emerged as a crucial strategy in enhancing GenAI’s performance. Studies showed that iterative improvements in prompts led to better user acceptance and increased satisfaction with AI-generated messages. Specific prompts were crafted to assist in producing clearer and more relevant content.
By focusing on structured evaluation and methodological precision, these studies illuminate the burgeoning role of GenAI in modern healthcare communication, setting the stage for further exploration and development in this ever-evolving field.