Research exploring the applications of large language models like ChatGPT in ophthalmology has exploded onto the scene in the last two years. Experts discuss how far the technology has come, and how far it still must go if it’s to help carry the workload of eyecare professionals.
A reality where autonomous artificial intelligence (AI) systems are integrated in everyday practise and used for the diagnosis and management of disease, seemed implausible not too long ago.
Some part of the Australian optometry landscape saw a shift in its approach to disease management with the emergence of technology like Eyetelligence (since rebranded to Optain) – an AI software tool, developed by Australian experts that helps clinicians screen, grade and make decisions for the management of glaucoma, neovascular age-related macular degeneration (nAMD), and diabetic retinopathy (DR).
Developed by Professor Mingguang He from the Centre for Eye Research Australia (CERA), the platform has been rolled out in optometry practices across Australia and New Zealand, and international markets such as Japan, Europe and the United States.
And in 2023, the Food and Drug Administration (FDA) approved EyeArt – the first and only AI technology to be cleared by the FDA to detect both mild and vision-threatening DR – without human oversight.
Now, a body of research is emerging demonstrating that AI – and large language models (LLMs) in particular – are on track to matching the knowledge of eyecare professionals in practise. Alongside the ability to analyse ophthalmic images, AI now offers the potential to aid more complicated diagnosis and treatment plans.
LLMs only recently came on to the scene in a big way with the debut of ChatGPT in November 2023. As computing power has increased, the emergence of deep learning about a decade ago brought about numerous developments in AI, including LLMs. Now, within the past two years, LLMs in ophthalmology have appeared in medical literature.
Currently, it has applications in medical transcription, electronic health record enhancement, clinical decision support, and automated patient communication, among others. Beyond the practice, AI in physician education and research is being explored – incorporating into lectures to familiarise the new generation of physicians withvthe emerging technology.
In one recent study published in Eye, the Google Gemini AI chatbot, formerly known as Bard, was shown to pass ophthalmology board exams.
Meanwhile, another study published in February 2024 in JAMA Ophthalmology, showed that LLMs can match or outperform ophthalmologists in the diagnosis and recommended treatment of patients with glaucoma and retinal disease.
This research from New York and Ear Infirmary of Mount Sinai (NYEE) and conducted by ophthalmology resident Dr Andy Huang and his team, explored the potential of LLMs – specifically GPT-4 – to assess the diagnostic accuracy and comprehensiveness of LLM-generated responses compared to those of fellowship-trained glaucoma and retina specialists.
It intended to validate the potential utility of LLMs as an adjunct in practising evidence-based medicine, particularly in ophthalmic subspecialties that may be limited.
For Dr Huang, it was his own application of LLMs in his day-to-day tasks that inspired him to explore the concept further.
“Personally, I found the application of GPT-4 extremely useful and enjoyable, as it significantly increased efficiency and broadened my diagnostic capabilities for day-to-day tasks in both the hospital and clinic,” Dr Huang tells Insight. “This experience inspired me to explore its utility on an evidence-based level and ultimately led to the initiation of this project.”
The team recruited 12 attending specialists and three senior trainees from the Department of Ophthalmology at the Icahn School of Medicine at Mount Sinai.
A basic set of 20 questions (10 each for glaucoma and retina) from the American Academy of Ophthalmology’s list of commonly asked questions by patients was randomly selected, along with 20 de-identified patient cases from Mount Sinai-affiliated eye clinics.
Responses from both the GPT-4 system and human specialists were then statistically analysed and rated for accuracy.
The results showed that AI matched or outperformed human specialists in both accuracy and completeness of its medical advice and assessments. More specifically, AI demonstrated superior performance in response to glaucoma questions and case-management advice, while reflecting a more balanced outcome in retina questions, where AI matched humans in accuracy but exceeded them in completeness.
The results, for Dr Huang and the research team, were both surprising and anticipated.
“While we expected the LLM to perform well due to its advanced training on vast amounts of big data and medical data, the extent to which it matched or outperformed human specialists, especially generating assessments and plans for glaucoma cases, was remarkable,” he says.
He adds that the results highlighted the model’s potential to support and enhance clinical decision-making in ways the team had not fully anticipated.
“This research and its findings also inspired us to look into more specific utility within each subspecialty.”
In terms of its applications, the findings have significant implications for the future of eyecare. Dr Huang says he envisions a future where AI tools are routinely used to augment clinical practice.
“This could lead to more accurate and comprehensive patient care, and reduced workload for ophthalmologists, which may ultimately improve patient outcomes,” he says.
AI could also provide valuable support in under-resourced settings, helping to bridge gaps in access to specialised care.
For the future, Dr Huang says studies should focus on validating these research findings across larger and more diverse patient populations, as well as in different clinical settings and sites, and to also explore the utility and integration of AI tools into existing clinical workflows and electronic health record systems.
However, he notes that the unprecedented capability of this technology comes with new challenges – which should be the focus of subsequent research.
“Future research must address the ethical, legal, and regulatory challenges associated with the use of AI in healthcare, ensuring that these technologies are used safely and equitably,” Dr Huang says.
“There are countless factors at play, and we do not currently have a grasp of the implications that usage of AI may have if fully integrated into healthcare.”
However, to reach integration into healthcare, AI must still be improved in terms of accuracy, reliability, and comprehensiveness.
“Technologically, these systems must be able to integrate seamlessly with clinical workflows and handle a wide range of ophthalmic conditions and medical conditions/physical limitations – which I predict will be a massive hurdle,” Dr Huang says.
Utility in practise
Dr Jorge Cuadros from the School of Optometry at the University of California, Berkeley in the US, says although LLMs are just beginning to emerge in everyday ophthalmology practise, many patients have been using chatbots such as ChatGPT, LLaMa, or Bard to describe symptoms and receive a diagnosis from conjunctivitis to retinal detachment.
Meanwhile, in eyecare clinics he says they have been used as support tools for differential diagnosis – sometimes detecting rare diseases from free text inputs.
The models have also been used to expedite tedious and repetitive tasks, such as generating progress notes and providing personalised summaries for patients.
“LLMs applied to electronic medical records have been used with success to identify patients who could be enrolled in clinical trials or could benefit from uncommon treatments,” Dr Cuadros says.
“Google’s MedLM and several other models have adopted foundation models to refine their use to healthcare from simple decision tasks to complex workflows.”
He says that experts within the field have noted practitioners are experiencing burnout caused by the demands and complexity of information systems in healthcare.
“LLMs can relieve and potentially prevent this burnout by performing many repetitive and tedious tasks and allow practitioners to focus more on their patients,” Dr Cuadros says.
AI promises opportunities for patients to participate in their medical management, particularly related to chronic diseases. According to Dr Cuadros, patients will be able to compile and organise more data – much of it captured at home – to improve existing conditions and predict future conditions.
“Testing has already become less invasive as LLMs and other forms of AI allow us to more accurately determine, for example, whose glaucoma will progress faster, or who will develop myopia and other conditions,” he says.
Humans are irreplaceable
Meanwhile, in optometry, LLM applications haven’t been widely explored and the extent of its current use in the field is unknown. However, clinician-scientist Dr Angelica Ly from the University of New South Wales (UNSW) sees its utility and potential in some contexts but warns that it is not a replacement to critical thinking.
According to Dr Ly, a common reason for using LLMs is to save time and improve efficiency as they are useful for automating tedious tasks.
“Some possible applications of LLMs in optometry include as an aid to write clinical notes, referrals and patient education materials. They might also be used for patient education, or clinical decision support,” she says.
“Unlike a healthcare practitioner, who has been trained to critically evaluate and apply their information sources, LLMs were not designed to do so. They simply generate ‘one word at a time’ based on probability. They cannot fact check. They regularly make things up and even generate references that do not exist.”
She adds that it is reasonable and common to think of LLMs as search engines for information retrieval, especially because some LLM services have in-built web-browsing capabilities. However, LLMs should not be used as a source of information.
Because of the potential for LLMs to provide factually inaccurate or missing information, there is the risk that they may misinform patients and perpetuate existing biases – potentially causing harm.
“While LLMs show great promise in being able to support and enhance the efficiency of healthcare, they should augment, rather than replace clinician-led care,” Dr Ly says.
Although the responses look and sound convincing, Dr Ly says current LLMs are unable to reason and therefore are vulnerable to factual inaccuracies.
“As when chatting with humans, it is useful to consider how a person’s experiences or knowledge influence their speech. Likewise, the data sources used to train LLMs heavily influence its responses,” she says.
With LLMs, the more data that is available, the better the tool performs. Therefore, scarcity and specificity of optometric data on the internet limits the performance of most general LLMs when attempting to address optometry-related queries.
Dr Ly says LLMs are not designed to reason and thus cannot assess the quality of its input data making it susceptible to bias and favours data that appear frequently on the internet – regardless of accuracy.
“A fun way to remember this is using the expression, ‘Garbage in, garbage out’ – GIGO for short – which describes when flawed input data produces nonsense output. Input data includes unvalidated sources from the world wide web, where anyone, including non-experts, can publish content,” she says.
In patient settings, one important attribute to bear is empathy – something unique to humans. And although LLMs can be taught ‘artificial empathy’, this is not interchangeable with both verbal and non-verbal cues that humans are sensitive to.
Dr Cuadros agrees, saying although recent studies have shown that generative AI models were judged as more empathetic than human clinicians, there is still much value in creating a human connection between patients and their providers.
“Patients and practitioners create a bond that can sometimes last generations and adds to general wellbeing,” he says.
Dr Ly adds that LLMs can be trained to use empathetic language, providing responses comparable to experts in terms of empathy. However, in a healthcare context, empathy includes more than just words – including both verbal and non-verbal cues such as gestures, which often encourages patients to divulge more about their experience, including data required for a correct diagnosis.
For Dr Ly, best practise utilising LLMs, is using it as an adjunct because, for now, humans are irreplaceable.
“A good rule of thumb for eyecare professionals using LLMs is to always de-identify and double check, meaning be aware of the potential for privacy breaches using these technologies, and always double check for missing, biased or factual inaccurate content,” she says.
Similarly, Dr Cuadros says that while the models may act autonomously, the legal and moral liability is still born by the practitioner.
“Just as with clinical trials that guide our treatment of diseases, we must ensure that the ground truth that is used to develop algorithms are free from bias. This is more difficult to do when the ground truth is from millions of subjects,” he says.
Adopting in practise
As with many innovations in healthcare, it takes time to adapt them into daily practice. While clinical trials and regulatory oversight ensure safety, there is a growing interest in real-world validation before widespread adoption of new methods and technologies.
“This validation is more important due to the ‘black box’ nature of the models,” Dr Cuadros says. “Recent innovations allow us to view the features and aspects of the data that are used by the AI to make a decision, however, most of the time we still don’t know how they arrive at their conclusions.”
An even greater barrier to adoption, he says, is the need to change deeply entrenched habits and workflows that have developed over years based on traditional eyecare practise.
“Just as electronic health records initially disrupted care, eyecare providers may initially feel disrupted as they incorporate AI into their day-to-day operations.”
Dr Cuadros says regulations to ensure access to new LLM tools by under-served communities is important – especially because it is often the data from these communities that is used to create the models.
“We need to ensure that new technology reduces disparities among different populations,” he says.
“Dr Naama Hammel said, ‘AI won’t replace clinicians, but clinicians who use AI may replace those who don’t’.”
He adds that the adoption of tools to enhance, rather than replace, clinicians is the best short-term approach to LLMs.
“Healthcare is changing in many ways, including breakthrough drug discovery, genomics, robotics, and collaboration. AI and LLMs are just one of several transformative changes to healthcare.”
More reading
AI shown to match or outperform ophthalmologists
AI models predict long-term visual acuity for high myopia
AI in eyecare: myths and attitudes