News News Today | Latest News Updates

Comparison of Emotional Content in Text Responses From Physicians and AI Chatbots to Patient Health Queries: Cross-Sectional Study

Background: Surveys show that many people are willing to use generative artificial intelligence (AI) for health questions. Prior research has largely focused on chatbot accuracy, with some studies finding that both physicians and consumers overwhelmingly prefer chatbot-generated text over physician responses. Objective: This study aimed to characterize and compare the emotional content of responses from physicians and 2 AI chatbots (OpenAI’s ChatGPT and Google’s Gemini) and to assess differences in reading level and use of medical disclaimers. Methods: A public, patient-deidentified telehealth website was used to compile 100 physician-answered questions. The same questions were posed to both chatbots between May 18 and 19, 2025. Two coders classified the emotional content of each sentence using a predefined codebook and reviewed for agreement. Emotions were ranked as primary, secondary, and tertiary by the proportion of sentences classified as each emotion per response. Multinomial logistic regression compared emotional rankings using physician responses as the reference. Word count, Flesch Reading Ease, and Flesch-Kincaid Grade Level were analyzed via ANOVA with the Tukey honestly significant difference test. Disclaimer use was compared between chatbots using a χ2 test. Results: Primary emotions were overwhelmingly neutral, except for one response from each chatbot in which anger was primary. For secondary emotions, the odds ratio of hope was 80.28% (95% CI 37.71%-93.76%) lower for ChatGPT, while the odds ratio of fear was 3.29 (95% CI 1.44-7.49) times higher for Gemini. For tertiary emotions, the odds ratio of compassion was 1.94 (95% CI 1.06-3.54) times higher, and the odds ratio of having no tertiary emotion was 84.33% (95% CI 64.72%-93.04%) lower for Gemini. Gemini responses averaged 889.1 (SD 305.7) words, ChatGPT 476.5 (SD 109.5), and physicians 193.5 (SD 113.6). Gemini had the lowest average Flesch Reading Ease score at 39.9 (SD 8.8), followed by ChatGPT at 45.8 (SD 12.8), while physicians had the highest at 51.9 (SD 13.6). Gemini had the highest average Flesch-Kincaid Grade Level at 11.3 (SD 1.5), followed by ChatGPT at 9.9 (SD 1.9), and physicians at 9.2 (SD 2.4). Gemini was significantly more likely to include a disclaimer than ChatGPT (χ21=49.2; P<.001). Conclusions: Chatbot responses were significantly (P<.001) longer and more difficult to read than physician responses and were more likely to contain a wider range of emotions. Qualitatively, chatbot responses were more varied in their presentation as well as in the breadth of the emotions themselves. The findings of this study could be used to inform more emotionally connected physician responses to patient message queries. Introduction Health communication extends beyond facts to include emotional reassurance, persuasion, and support, all of which are central to how people make health decisions. As more people turn to the internet for health information, generative art... [25166 chars]

Comparison of Emotional Content in Text Responses From Physicians and AI Chatbots to Patient Health Queries: Cross-Sectional Study

In This Category

Get in Touch

Latest Post

Quick Links

Newsletter