Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Application of generative language models to orthopaedic practice

Caterson, Jessica, Ambler, Olivia, Cereceda-Monteoliva, Nicholas, Horner, Matthew, Jones, Andrew and Poacher, Arwel Tomos 2024. Application of generative language models to orthopaedic practice. BMJ Open 14 (3) , e076484. 10.1136/bmjopen-2023-076484

[thumbnail of bmjopen-2023-076484.pdf] PDF - Published Version
Available under License Creative Commons Attribution Non-commercial.

Download (369kB)

Abstract

Objective: To explore whether large language models (LLMs) Generated Pre-trained Transformer (GPT)-3 and ChatGPT can write clinical letters and predict management plans for common orthopaedic scenarios. Design: Fifteen scenarios were generated and ChatGPT and GPT-3 prompted to write clinical letters and separately generate management plans for identical scenarios with plans removed. Main outcome measures: Letters were assessed for readability using the Readable Tool. Accuracy of letters and management plans were assessed by three independent orthopaedic surgery clinicians. Results: Both models generated complete letters for all scenarios after single prompting. Readability was compared using Flesch-Kincade Grade Level (ChatGPT: 8.77 (SD 0.918); GPT-3: 8.47 (SD 0.982)), Flesch Readability Ease (ChatGPT: 58.2 (SD 4.00); GPT-3: 59.3 (SD 6.98)), Simple Measure of Gobbledygook (SMOG) Index (ChatGPT: 11.6 (SD 0.755); GPT-3: 11.4 (SD 1.01)), and reach (ChatGPT: 81.2%; GPT-3: 80.3%). ChatGPT produced more accurate letters (8.7/10 (SD 0.60) vs 7.3/10 (SD 1.41), p=0.024) and management plans (7.9/10 (SD 0.63) vs 6.8/10 (SD 1.06), p<0.001) than GPT-3. However, both LLMs sometimes omitted key information or added additional guidance which was at worst inaccurate. Conclusions: This study shows that LLMs are effective for generation of clinical letters. With little prompting, they are readable and mostly accurate. However, they are not consistent, and include inappropriate omissions or insertions. Furthermore, management plans produced by LLMs are generic but often accurate. In the future, a healthcare specific language model trained on accurate and secure data could provide an excellent tool for increasing the efficiency of clinicians through summarisation of large volumes of data into a single clinical letter.

Item Type: Article
Date Type: Published Online
Status: Published
Schools: Biosciences
Additional Information: License information from Publisher: LICENSE 1: URL: http://creativecommons.org/licenses/by-nc/4.0/, Start Date: 2024-03-14, Type: open-access
Publisher: BMJ Publishing Group
Date of First Compliant Deposit: 21 March 2024
Date of Acceptance: 8 January 2024
Last Modified: 21 Mar 2024 14:30
URI: https://orca.cardiff.ac.uk/id/eprint/167430

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics