TY - JOUR
T1 - Developing Artificial Intelligence Models for Extracting Oncologic Outcomes from Japanese Electronic Health Records
AU - Araki, Kenji
AU - Matsumoto, Nobuhiro
AU - Togo, Kanae
AU - Yonemoto, Naohiro
AU - Ohki, Emiko
AU - Xu, Linghua
AU - Hasegawa, Yoshiyuki
AU - Satoh, Daisuke
AU - Takemoto, Ryota
AU - Miyazaki, Taiga
N1 - Publisher Copyright:
© 2022, The Author(s).
PY - 2023/3
Y1 - 2023/3
N2 - Introduction: A framework that extracts oncological outcomes from large-scale databases using artificial intelligence (AI) is not well established. Thus, we aimed to develop AI models to extract outcomes in patients with lung cancer using unstructured text data from electronic health records of multiple hospitals. Methods: We constructed AI models (Bidirectional Encoder Representations from Transformers [BERT], Naïve Bayes, and Longformer) for tumor evaluation using the University of Miyazaki Hospital (UMH) database. This data included both structured and unstructured data from progress notes, radiology reports, and discharge summaries. The BERT model was applied to the Life Data Initiative (LDI) data set of six hospitals. Study outcomes included the performance of AI models and time to progression of disease (TTP) for each line of treatment based on the treatment response extracted by AI models. Results: For the UMH data set, the BERT model exhibited higher precision accuracy compared to the Naïve Bayes or the Longformer models, respectively (precision [0.42 vs. 0.47 or 0.22], recall [0.63 vs. 0.46 or 0.33] and F1 scores [0.50 vs. 0.46 or 0.27]). When this BERT model was applied to LDI data, prediction accuracy remained quite similar. The Kaplan–Meier plots of TTP (months) showed similar trends for the first (median 14.9 [95% confidence interval 11.5, 21.1] and 16.8 [12.6, 21.8]), the second (7.8 [6.7, 10.7] and 7.8 [6.7, 10.7]), and the later lines of treatment for the predicted data by the BERT model and the manually curated data. Conclusion: We developed AI models to extract treatment responses in patients with lung cancer using a large EHR database; however, the model requires further improvement.
AB - Introduction: A framework that extracts oncological outcomes from large-scale databases using artificial intelligence (AI) is not well established. Thus, we aimed to develop AI models to extract outcomes in patients with lung cancer using unstructured text data from electronic health records of multiple hospitals. Methods: We constructed AI models (Bidirectional Encoder Representations from Transformers [BERT], Naïve Bayes, and Longformer) for tumor evaluation using the University of Miyazaki Hospital (UMH) database. This data included both structured and unstructured data from progress notes, radiology reports, and discharge summaries. The BERT model was applied to the Life Data Initiative (LDI) data set of six hospitals. Study outcomes included the performance of AI models and time to progression of disease (TTP) for each line of treatment based on the treatment response extracted by AI models. Results: For the UMH data set, the BERT model exhibited higher precision accuracy compared to the Naïve Bayes or the Longformer models, respectively (precision [0.42 vs. 0.47 or 0.22], recall [0.63 vs. 0.46 or 0.33] and F1 scores [0.50 vs. 0.46 or 0.27]). When this BERT model was applied to LDI data, prediction accuracy remained quite similar. The Kaplan–Meier plots of TTP (months) showed similar trends for the first (median 14.9 [95% confidence interval 11.5, 21.1] and 16.8 [12.6, 21.8]), the second (7.8 [6.7, 10.7] and 7.8 [6.7, 10.7]), and the later lines of treatment for the predicted data by the BERT model and the manually curated data. Conclusion: We developed AI models to extract treatment responses in patients with lung cancer using a large EHR database; however, the model requires further improvement.
KW - Artificial intelligence
KW - BERT
KW - Electronic health records database
KW - Lung cancer
KW - Real-world data
KW - Retrospective study
UR - http://www.scopus.com/inward/record.url?scp=85144687846&partnerID=8YFLogxK
U2 - 10.1007/s12325-022-02397-7
DO - 10.1007/s12325-022-02397-7
M3 - 学術論文
C2 - 36547809
AN - SCOPUS:85144687846
SN - 0741-238X
VL - 40
SP - 934
EP - 950
JO - Advances in Therapy
JF - Advances in Therapy
IS - 3
ER -