Artificial intelligence (AI) applications known as large language models (LLMs) are beginning to be used in some practices to generate encounter notes from verbal input to replace the use of human scribes who perform a similar role. Just like today's scribes, these applications promise to reduce the drudgery of clinicians' endless "clicking" and other aspects of burn-out-inducing use of EHRs.
But will they? Our view is...maybe.
Today's LLMs are known to "make things up," which has been termed "hallucination." Therefore, anything they produce that goes into a patient's record requires careful review and, where necessary, editing. Over the long-term, we have two concerns about the need for such proofreading and its possible impact on reducing the drudgery of EHRs.
- Such review is time consuming if done compulsively and adds its own drudgery.
- We don't think clinicians will do it very well, and many will skip it altogether.
Keep in mind that clinicians who don't use scribes rarely review the encounter notes they hastily produce, despite the voluminous "clicks" required to produce them. Further, the resulting notes often contain to produce normal findings on the history or physical exam, use "pull-forward" components from , and utilize other gimmicks to ostensibly save time. However, these shortcuts can also produce errors.
There is a solution, but it will take some development, testing, and proper regulation. We believe that future EHRs with embedded LLMs will take as input clinicians' verbalization during an encounter. The clinician will purposely verbalize not only the history taken from the patient, but also his or her findings during the physical exam -- like what is done today with scribes. This normally constitutes the subjective and objective components of an encounter SOAP note. The clinician can verbalize other objective components as well, such as the results of laboratory and other procedures performed at the time of the encounter. Finally, the clinician may even verbalize the assessment and plan components as input for the LLM.
We believe that future LLMs can be developed to produce encounter notes in which the LLM is strictly limited to producing a SOAP note based solely on this verbalized input. With this limitation, we expect that the error rate in doing so can be reduced to no greater than results produced by laboratory and other medical devices whose output currently enters EHRs without individual review and verification by a clinician. A SOAP note produced in this manner would not require clinician review, thereby saving considerable time without sacrificing accuracy. In fact, the accuracy would be likely greater than current EHR-generated notes because the LLM would ensure that anything contained in the note actually occurred during the encounter. These new LLM-produced notes also will be much more readable since they will not be voluminous lists of structured and repetitive text nor shorthand notes.
We believe that future EHRs with embedded LLMs will eventually go well beyond producing encounter notes based solely on this verbalized input. That is, their role will also be to provide cognitive enhancement of the clinicians well beyond what today's scribes can do. In many or most cases, rather than the clinician verbalizing the assessment and plan, the LLM will generate a suggested assessment and plan for review by the clinician. In doing so, the LLM will utilize input far beyond what was verbalized during the encounter. It will utilize other patient information available on the internet as well as knowledge sources such as guidelines and protocols. This output, as distinguished from the output from the verbalized input, we refer to as the "generative content" of the encounter note. Although future LLMs will likely hallucinate less than current versions, this generative output will still require review by a clinician before entering the patient record or being acted upon.
Why do we think clinicians will be more willing to review the generative content? For starters, the consequences of errors in the assessment and plan are potentially greater than those in the subjective and objective components. We also believe that clinicians will review generative content more reliably than verbalized content. Since generative content is produced from broader sources requiring more complicated algorithms, this component will be more difficult to ensure accuracy without clinician review. Further, this component deals with the cognitive enhancement of the clinicians' decision-making and reflects directly on their legal responsibilities and clinical acumen. Clinicians will feel compelled to ensure this content reflects their own thinking.
Thus, we are proposing that future regulation of EHRs that utilize LLMs distinguish between those portions of the encounter note produced solely based on verbalized input from those portions that contain generative content. The regulations for the verbalized portion should assure virtually error-free content that can go directly into the EHR without clinician review. In contrast, the regulations for the generative content should require documentation of clinician review.
In conclusion, we need two important changes to achieve the desired outcome in using LLMs to generate encounter notes:
- Modifying LLM software to limit input solely to what was verbalized during an encounter (for those verbalized portions of the encounter note), and demonstration that the error rate in doing so is no greater than for other medical devices.
- Developing regulations for approval of future EHRs that utilize LLMs to produce encounter notes that distinguish between the verbalized content and the review documentation required for the generative content.
With these changes, we look forward to more accurate and less burdensome EHRs. Without them, we are skeptical of the value of LLMs for this purpose.
Donald W. Simborg, MD, is a founding member of the American College of Medical Informatics, Co-founder of HL7, former founder and CEO of two EHR companies, and former CIO of the University of California San Francisco, currently retired. Eta S. Berner, EdD, is a professor emerita in the Department of Health Services Administration in the School of Health Professions at University of Alabama at Birmingham.