A new study evaluating the quality of eye health advice generated by an artificial intelligence (AI) chatbot compared with ophthalmologists has found they are comparable for a range of patient questions, regardless of their complexity.
The Stanford University-led team said large language models (LLMs) like ChatGPT appear capable of performing various tasks, including answering patient eyecare questions, but have not yet been evaluated in direct comparison with ophthalmologists.
It comes after a separate study earlier in 2023 that found ChatGPT answered almost 50% of questions correctly in an exam resource commonly used by trainees preparing for ophthalmology certification.
This latest cross-sectional study was based on 200 eyecare questions from an online medical forum, which received responses written by American Academy of Ophthalmology (AAO)–affiliated ophthalmologists.
Next, a masked panel of eight board-certified ophthalmologists were asked to distinguish between answers generated by the ChatGPT chatbot and human answers. They did this with with 61% accuracy.
Ratings of the quality of chatbot and human answers were not significantly different regarding inclusion of incorrect information, likelihood of harm caused, extent of harm, or deviation from ophthalmologist community standards, the study found.
“These results suggest ophthalmologists and a large language model may provide comparable quality of ophthalmic advice for a range of patient questions, regardless of their complexity,” the researchers stated in JAMA Ophthalmology.
In other findings, of the 800 evaluations of chatbot-written answers, 168 of these (21.0%) were marked as human-written, while 517 of 800 human-written answers (64.6%) were marked as AI-written.
Compared with human answers, chatbot answers were more frequently rated as probably or definitely written by AI. The likelihood of chatbot answers containing incorrect or inappropriate material was comparable with human answers and did not differ from human answers in terms of likelihood of harm nor extent of harm.
In their conclusion, the researchers noted additional research was needed to assess patient attitudes toward LLM-augmented ophthalmologists vs fully autonomous AI content generation.
“[This is] to evaluate clarity and acceptability of LLM-generated answers from the patient perspective, to test the performance of LLMs in a greater variety of clinical contexts, and to determine an optimal manner of utilizing LLMs that is ethical and minimises harm,” the wrote.
More reading
ChatGPT scores nearly 50% on ophthalmology practice exam