With healthcare systems facing mounting strain from long wait times and rising expenses, a growing number of people are turning to AI chatbots like ChatGPT for help diagnosing their symptoms. According to a recent survey, around one in six American adults now seek health advice from chatbots at least once a month.
Yet depending too heavily on chatbot responses can be problematic, partly because users often don’t know how to phrase their questions to get accurate or useful health suggestions. A recent Oxford-led study found that this gap in communication significantly undermines the effectiveness of these tools.
“There’s a breakdown on both sides,” said Adam Mahdi, director of graduate studies at Oxford’s Internet Institute and a co-author of the study, in an interview with TechCrunch. “People using these tools weren’t making better decisions than those relying on traditional methods like online searches or personal judgment.”
Researchers gave roughly 1,300 UK participants doctor-created medical scenarios and asked them to determine possible conditions and next steps—like whether to go to a doctor or the hospital—using chatbots as well as their own usual approaches. They tested three popular AI models: OpenAI’s GPT-4o (which powers ChatGPT), Cohere’s Command R+, and Meta’s Llama 3.
The results were concerning. Not only did chatbot users struggle to correctly identify relevant conditions, but they also tended to downplay how serious those conditions might be. Participants frequently left out important details when prompting the chatbots, and in return, received advice that mixed solid insights with questionable guidance.
“The answers often blended strong and weak recommendations, which made them hard to interpret,” Mahdi noted. He also pointed out that current methods for evaluating AI models don’t adequately capture the complexity of human interaction.
This study comes at a time when major tech companies are investing heavily in AI for health. Apple is reportedly working on an AI assistant focused on wellness, while Amazon is developing tools to analyze health-related social data, and Microsoft is building AI systems to help triage patient messages.
Still, there’s hesitancy across the medical field. The American Medical Association discourages doctors from using tools like ChatGPT to make clinical decisions, and even the developers of these models caution against relying on them for diagnoses.
“Healthcare decisions should come from trusted sources,” Mahdi emphasized. “Just like new medications go through clinical trials, AI systems need real-world testing before they’re rolled out.”