A new survey has found that one in five general practitioners (GPs) in the UK are using AI tools like ChatGPT to assist with daily tasks such as suggesting diagnoses and writing patient letters.
The research, published in the journal BMJ Health and Care Informatics, surveyed 1,006 GPs across the about their use of AI chatbots in clinical practice.
Some 20% reported using generative AI tools, with ChatGPT being the most popular. Of those using AI, 29% said they employed it to generate documentation after patient appointments, while 28% used it to suggest potential diagnoses.
“These findings signal that GPs may derive value from these tools, particularly with administrative tasks and to support clinical reasoning,” the study authors noted.
We have no idea how many papers OpenAI used to train their models, but it’s certainly more than any doctor could have read. It gives quick, convincing answers and is very easy to use, unlike searching research papers manually.
Does that mean ChatGPT is generally accurate for clinical advice? Absolutely not. Large language models (LLMs) like ChatGPT are pre-trained on massive amounts of general data, making them more flexible but dubiously accurate for specific medical tasks.
It’s easy to lead them on, with the AI model tending to side with your assumptions in problematically sycophantic behavior.
Moreover, some researchers state that ChatGPT can be conservative or prude when handling delicate topics like sexual health.
As Stephen Hughes from Anglia Ruskin University wrote in The Conservation, “I asked ChatGPT to diagnose pain when passing urine and a discharge from the male genitalia after unprotected sexual intercourse. I was intrigued to see that I received no response. It was as if ChatGPT blushed in some coy computerised way. Removing mentions of sexual intercourse resulted in ChatGPT giving a differential diagnosis that included gonorrhoea, which was the condition I had in mind.”
As Dr. Charlotte Blease, lead author of the study, commented: “Despite a lack of guidance about these tools and unclear work policies, GPs report using them to assist with their job. The medical community will need to find ways to both educate physicians and trainees about the potential benefits of these tools in summarizing information but also the risks in terms of hallucinations, algorithmic biases and the potential to compromise patient privacy.”
That last point is key. Passing patient information into AI systems likely constitutes a breach of privacy and patient trust.
Dr. Ellie Mein, medico-legal adviser at the Medical Defence Union, agreed on the key issues: “Along with the uses identified in the BMJ paper, we’ve found that some doctors are turning to AI programs to help draft complaint responses for them. We have cautioned MDU members about the issues this raises, including inaccuracy and patient confidentiality. There are also data protection considerations.”
She added: “When dealing with patient complaints, AI drafted responses may sound plausible but can contain inaccuracies and reference incorrect guidelines which can be hard to spot when woven into very eloquent passages of text. It’s vital that doctors use AI in an ethical way and comply with relevant guidance and regulations.”
Probably the most critical questions amid all this are: How accurate is ChatGPT in a medical context? And how great might the risks of misdiagnosis or other issues be if this continues?
Generative AI in medical practice
As GPs increasingly experiment with AI tools, researchers are working to evaluate how they compare to traditional diagnostic methods.
A study published in Expert Systems with Applications conducted a comparative analysis between ChatGPT, conventional machine learning models, and other AI systems for medical diagnoses.
The researchers found that while ChatGPT showed promise, it was often outperformed by traditional machine learning models specifically trained on medical datasets. For example, multi-layer perceptron neural networks achieved the highest accuracy in diagnosing diseases based on symptoms, with rates of 81% and 94% on two different datasets.
Researchers concluded that while ChatGPT and similar AI tools show potential, “their answers can be often ambiguous and out of context, so providing incorrect diagnoses, even if it is asked to provide an answer only considering a specific set of classes.”
This aligns with other recent studies examining AI’s potential in medical practice.
For example, research published in JAMA Network Open tested GPT-4’s ability to analyze complex patient cases. While it showed promising results in some areas, GPT-4 still made errors, some of which could be dangerous in real clinical scenarios.
There are some exceptions, though. One study conducted by the New York Eye and Ear Infirmary of Mount Sinai (NYEE) demonstrated how GPT-4 can meet or exceed human ophthalmologists in diagnosing and treating eye diseases.
For glaucoma, GPT-4 provided highly accurate and detailed responses that exceeded those of real eye specialists.
AI developers such as OpenAI and NVIDIA are training purpose-built medical AI assistants to support clinicians, hopefully making up for shortfalls in base frontier models like GP-4.
OpenAI has already partnered with health tech company Color Health to create an AI “copilot” for cancer care, demonstrating how these tools are set to become more specific to clinical practice.
Weighing up benefits and risks
There are countless studies comparing specially trained AI models to humans in identifying diseases from diagnostics images such as MRI and X-ray.
AI techniques have outperformed doctors in everything from cancer and eye disease diagnosis to Alzheimer’s and Parkinson’s early detection. One, named “Mia,” proved effective in analyzing over 10,000 mammogram scans, flagging known cancer cases, and uncovering cancer in 11 women that doctors had missed.
However, these purpose-built AI tools are certainly not the same as parsing notes and findings into a language model like ChatGPT and asking it to infer a diagnosis from that alone.
Nevertheless, that’s a difficult temptation to resist. It’s no secret that healthcare services are overwhelmed. NHS waiting times continue to soar at all-time highs, and even obtaining GP appointments in some areas is a grim task.
AI tools target time-consuming admin, such is their allure for overwhelmed doctors. We’ve seen this mirrored across numerous public sector fields, such as education, where teachers are widely using AI to create materials, mark work, and more.
So, will your doctor parse your notes into ChatGPT and write you a prescription based on the results for your next doctor’s visit? Quite possibly. It’s just another frontier where the technology’s promise to save time is just so hard to deny.
The best path forward may be to develop a code of use. The British Medical Association has called for clear policies on integrating AI into clinical practice.
“The medical community will need to find ways to both educate physicians and trainees and guide patients about the safe adoption of these tools,” the BMJ study authors concluded.
Aside from advice and education, ongoing research, clear guidelines, and a commitment to patient safety will be essential to realizing AI’s benefits while offsetting risks.