Performance of Artificial Intelligence Chatbots in Answering Clinical Questions on Japanese Practical Guidelines for Implant-based Breast Reconstruction

Makoto Shiraishi; Yoshihiro Sowa; Koichi Tomita; Yasunobu Terao; Toshihiko Satake; Mayu Muto; Yuhei Morita; Shino Higai; Yoshihiro Toyohara; Yasue Kurokawa; Ataru Sunaga; Mutsumi Okazaki

doi:10.1007/s00266-024-04515-y

Performance of Artificial Intelligence Chatbots in Answering Clinical Questions on Japanese Practical Guidelines for Implant-based Breast Reconstruction

Makoto Shiraishi, Yoshihiro Sowa^*, Koichi Tomita, Yasunobu Terao, Toshihiko Satake, Mayu Muto, Yuhei Morita, Shino Higai, Yoshihiro Toyohara, Yasue Kurokawa, Ataru Sunaga, Mutsumi Okazaki

^*Corresponding author for this work

Department of Plastic, Reconstructive and Aesthetic Surgery

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Background: Artificial intelligence (AI) chatbots, including ChatGPT-4 (GPT-4) and Grok-1 (Grok), have been shown to be potentially useful in several medical fields, but have not been examined in plastic and aesthetic surgery. The aim of this study is to evaluate the responses of these AI chatbots for clinical questions (CQs) related to the guidelines for implant-based breast reconstruction (IBBR) published by the Japan Society of Plastic and Reconstructive Surgery (JSPRS) in 2021. Methods: CQs in the JSPRS guidelines were used as question sources. Responses from two AI chatbots, GPT-4 and Grok, were evaluated for accuracy, informativeness, and readability by five Japanese Board-certified breast reconstruction specialists and five Japanese clinical fellows of plastic surgery. Results: GPT-4 outperformed Grok significantly in terms of accuracy (p < 0.001), informativeness (p < 0.001), and readability (p < 0.001) when evaluated by plastic surgery fellows. Compared to the original guidelines, Grok scored significantly lower in all three areas (all p < 0.001). The accuracy of GPT-4 was rated to be significantly higher based on scores given by plastic surgery fellows compared to those of breast reconstruction specialists (p = 0.012), whereas there was no significant difference between these scores for Grok. Conclusions: The study suggests that GPT-4 has the potential to assist in interpreting and applying clinical guidelines for IBBR but importantly there is still a risk that AI chatbots can misinform. Further studies are needed to understand the broader role of current and future AI chatbots in breast reconstruction surgery. Level of Evidence IV: This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine Ratings, please refer to Table of Contents or online Instructions to Authors www.springer.com/00266.

Original language	English
Article number	e042099
Pages (from-to)	1947-1953
Number of pages	7
Journal	Aesthetic Plastic Surgery
Volume	49
Issue number	7
DOIs	https://doi.org/10.1007/s00266-024-04515-y
State	Published - 2025/04

Keywords

Artificial intelligence
Breast implant
Breast reconstruction
ChatGPT
Grok

ASJC Scopus subject areas

Surgery

Access to Document

10.1007/s00266-024-04515-y

Cite this

Shiraishi, M., Sowa, Y., Tomita, K., Terao, Y., Satake, T., Muto, M., Morita, Y., Higai, S., Toyohara, Y., Kurokawa, Y., Sunaga, A., & Okazaki, M. (2025). Performance of Artificial Intelligence Chatbots in Answering Clinical Questions on Japanese Practical Guidelines for Implant-based Breast Reconstruction. Aesthetic Plastic Surgery, 49(7), 1947-1953. Article e042099. https://doi.org/10.1007/s00266-024-04515-y

@article{b151bea6529b4ff1b8f1005d2b49b118,

title = "Performance of Artificial Intelligence Chatbots in Answering Clinical Questions on Japanese Practical Guidelines for Implant-based Breast Reconstruction",

abstract = "Background: Artificial intelligence (AI) chatbots, including ChatGPT-4 (GPT-4) and Grok-1 (Grok), have been shown to be potentially useful in several medical fields, but have not been examined in plastic and aesthetic surgery. The aim of this study is to evaluate the responses of these AI chatbots for clinical questions (CQs) related to the guidelines for implant-based breast reconstruction (IBBR) published by the Japan Society of Plastic and Reconstructive Surgery (JSPRS) in 2021. Methods: CQs in the JSPRS guidelines were used as question sources. Responses from two AI chatbots, GPT-4 and Grok, were evaluated for accuracy, informativeness, and readability by five Japanese Board-certified breast reconstruction specialists and five Japanese clinical fellows of plastic surgery. Results: GPT-4 outperformed Grok significantly in terms of accuracy (p < 0.001), informativeness (p < 0.001), and readability (p < 0.001) when evaluated by plastic surgery fellows. Compared to the original guidelines, Grok scored significantly lower in all three areas (all p < 0.001). The accuracy of GPT-4 was rated to be significantly higher based on scores given by plastic surgery fellows compared to those of breast reconstruction specialists (p = 0.012), whereas there was no significant difference between these scores for Grok. Conclusions: The study suggests that GPT-4 has the potential to assist in interpreting and applying clinical guidelines for IBBR but importantly there is still a risk that AI chatbots can misinform. Further studies are needed to understand the broader role of current and future AI chatbots in breast reconstruction surgery. Level of Evidence IV: This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine Ratings, please refer to Table of Contents or online Instructions to Authors www.springer.com/00266.",

keywords = "Artificial intelligence, Breast implant, Breast reconstruction, ChatGPT, Grok",

author = "Makoto Shiraishi and Yoshihiro Sowa and Koichi Tomita and Yasunobu Terao and Toshihiko Satake and Mayu Muto and Yuhei Morita and Shino Higai and Yoshihiro Toyohara and Yasue Kurokawa and Ataru Sunaga and Mutsumi Okazaki",

note = "Publisher Copyright: {\textcopyright} Springer Science+Business Media, LLC, part of Springer Nature and International Society of Aesthetic Plastic Surgery 2024.",

year = "2025",

month = apr,

doi = "10.1007/s00266-024-04515-y",

language = "英語",

volume = "49",

pages = "1947--1953",

journal = "Aesthetic Plastic Surgery",

issn = "0364-216X",

publisher = "Springer New York",

number = "7",

}

Shiraishi, M, Sowa, Y, Tomita, K, Terao, Y, Satake, T, Muto, M, Morita, Y, Higai, S, Toyohara, Y, Kurokawa, Y, Sunaga, A & Okazaki, M 2025, 'Performance of Artificial Intelligence Chatbots in Answering Clinical Questions on Japanese Practical Guidelines for Implant-based Breast Reconstruction', Aesthetic Plastic Surgery, vol. 49, no. 7, e042099, pp. 1947-1953. https://doi.org/10.1007/s00266-024-04515-y

TY - JOUR

T1 - Performance of Artificial Intelligence Chatbots in Answering Clinical Questions on Japanese Practical Guidelines for Implant-based Breast Reconstruction

AU - Shiraishi, Makoto

AU - Sowa, Yoshihiro

AU - Tomita, Koichi

AU - Terao, Yasunobu

AU - Satake, Toshihiko

AU - Muto, Mayu

AU - Morita, Yuhei

AU - Higai, Shino

AU - Toyohara, Yoshihiro

AU - Kurokawa, Yasue

AU - Sunaga, Ataru

AU - Okazaki, Mutsumi

N1 - Publisher Copyright: © Springer Science+Business Media, LLC, part of Springer Nature and International Society of Aesthetic Plastic Surgery 2024.

PY - 2025/4

Y1 - 2025/4

N2 - Background: Artificial intelligence (AI) chatbots, including ChatGPT-4 (GPT-4) and Grok-1 (Grok), have been shown to be potentially useful in several medical fields, but have not been examined in plastic and aesthetic surgery. The aim of this study is to evaluate the responses of these AI chatbots for clinical questions (CQs) related to the guidelines for implant-based breast reconstruction (IBBR) published by the Japan Society of Plastic and Reconstructive Surgery (JSPRS) in 2021. Methods: CQs in the JSPRS guidelines were used as question sources. Responses from two AI chatbots, GPT-4 and Grok, were evaluated for accuracy, informativeness, and readability by five Japanese Board-certified breast reconstruction specialists and five Japanese clinical fellows of plastic surgery. Results: GPT-4 outperformed Grok significantly in terms of accuracy (p < 0.001), informativeness (p < 0.001), and readability (p < 0.001) when evaluated by plastic surgery fellows. Compared to the original guidelines, Grok scored significantly lower in all three areas (all p < 0.001). The accuracy of GPT-4 was rated to be significantly higher based on scores given by plastic surgery fellows compared to those of breast reconstruction specialists (p = 0.012), whereas there was no significant difference between these scores for Grok. Conclusions: The study suggests that GPT-4 has the potential to assist in interpreting and applying clinical guidelines for IBBR but importantly there is still a risk that AI chatbots can misinform. Further studies are needed to understand the broader role of current and future AI chatbots in breast reconstruction surgery. Level of Evidence IV: This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine Ratings, please refer to Table of Contents or online Instructions to Authors www.springer.com/00266.

AB - Background: Artificial intelligence (AI) chatbots, including ChatGPT-4 (GPT-4) and Grok-1 (Grok), have been shown to be potentially useful in several medical fields, but have not been examined in plastic and aesthetic surgery. The aim of this study is to evaluate the responses of these AI chatbots for clinical questions (CQs) related to the guidelines for implant-based breast reconstruction (IBBR) published by the Japan Society of Plastic and Reconstructive Surgery (JSPRS) in 2021. Methods: CQs in the JSPRS guidelines were used as question sources. Responses from two AI chatbots, GPT-4 and Grok, were evaluated for accuracy, informativeness, and readability by five Japanese Board-certified breast reconstruction specialists and five Japanese clinical fellows of plastic surgery. Results: GPT-4 outperformed Grok significantly in terms of accuracy (p < 0.001), informativeness (p < 0.001), and readability (p < 0.001) when evaluated by plastic surgery fellows. Compared to the original guidelines, Grok scored significantly lower in all three areas (all p < 0.001). The accuracy of GPT-4 was rated to be significantly higher based on scores given by plastic surgery fellows compared to those of breast reconstruction specialists (p = 0.012), whereas there was no significant difference between these scores for Grok. Conclusions: The study suggests that GPT-4 has the potential to assist in interpreting and applying clinical guidelines for IBBR but importantly there is still a risk that AI chatbots can misinform. Further studies are needed to understand the broader role of current and future AI chatbots in breast reconstruction surgery. Level of Evidence IV: This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine Ratings, please refer to Table of Contents or online Instructions to Authors www.springer.com/00266.

KW - Artificial intelligence

KW - Breast implant

KW - Breast reconstruction

KW - ChatGPT

KW - Grok

UR - http://www.scopus.com/inward/record.url?scp=105003943586&partnerID=8YFLogxK

U2 - 10.1007/s00266-024-04515-y

DO - 10.1007/s00266-024-04515-y

M3 - 学術論文

C2 - 39592492

AN - SCOPUS:105003943586

SN - 0364-216X

VL - 49

SP - 1947

EP - 1953

JO - Aesthetic Plastic Surgery

JF - Aesthetic Plastic Surgery

IS - 7

M1 - e042099

ER -

Performance of Artificial Intelligence Chatbots in Answering Clinical Questions on Japanese Practical Guidelines for Implant-based Breast Reconstruction

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Fingerprint

Cite this