Developing Artificial Intelligence Models for Extracting Oncologic Outcomes from Japanese Electronic Health Records

Kenji Araki; Nobuhiro Matsumoto; Kanae Togo; Naohiro Yonemoto; Emiko Ohki; Linghua Xu; Yoshiyuki Hasegawa; Daisuke Satoh; Ryota Takemoto; Taiga Miyazaki

doi:10.1007/s12325-022-02397-7

Developing Artificial Intelligence Models for Extracting Oncologic Outcomes from Japanese Electronic Health Records

Kenji Araki, Nobuhiro Matsumoto, Kanae Togo^*, Naohiro Yonemoto, Emiko Ohki, Linghua Xu, Yoshiyuki Hasegawa, Daisuke Satoh, Ryota Takemoto, Taiga Miyazaki

^*この論文の責任著者

医療統計学講座

研究成果: ジャーナルへの寄稿 › 学術論文 › 査読

11 被引用数 (Scopus)

抄録

Introduction: A framework that extracts oncological outcomes from large-scale databases using artificial intelligence (AI) is not well established. Thus, we aimed to develop AI models to extract outcomes in patients with lung cancer using unstructured text data from electronic health records of multiple hospitals. Methods: We constructed AI models (Bidirectional Encoder Representations from Transformers [BERT], Naïve Bayes, and Longformer) for tumor evaluation using the University of Miyazaki Hospital (UMH) database. This data included both structured and unstructured data from progress notes, radiology reports, and discharge summaries. The BERT model was applied to the Life Data Initiative (LDI) data set of six hospitals. Study outcomes included the performance of AI models and time to progression of disease (TTP) for each line of treatment based on the treatment response extracted by AI models. Results: For the UMH data set, the BERT model exhibited higher precision accuracy compared to the Naïve Bayes or the Longformer models, respectively (precision [0.42 vs. 0.47 or 0.22], recall [0.63 vs. 0.46 or 0.33] and F1 scores [0.50 vs. 0.46 or 0.27]). When this BERT model was applied to LDI data, prediction accuracy remained quite similar. The Kaplan–Meier plots of TTP (months) showed similar trends for the first (median 14.9 [95% confidence interval 11.5, 21.1] and 16.8 [12.6, 21.8]), the second (7.8 [6.7, 10.7] and 7.8 [6.7, 10.7]), and the later lines of treatment for the predicted data by the BERT model and the manually curated data. Conclusion: We developed AI models to extract treatment responses in patients with lung cancer using a large EHR database; however, the model requires further improvement.

本文言語	英語
ページ（範囲）	934-950
ページ数	17
ジャーナル	Advances in Therapy
巻	40
号	3
DOI	https://doi.org/10.1007/s12325-022-02397-7
出版ステータス	出版済み - 2023/03

ASJC Scopus 主題領域

薬理学（医学）

UN SDG

この成果は、次の持続可能な開発目標に貢献しています

文献へのアクセス

10.1007/s12325-022-02397-7

引用スタイル

@article{f821dfbcff6a4a0cb561851d14d71af5,

title = "Developing Artificial Intelligence Models for Extracting Oncologic Outcomes from Japanese Electronic Health Records",

abstract = "Introduction: A framework that extracts oncological outcomes from large-scale databases using artificial intelligence (AI) is not well established. Thus, we aimed to develop AI models to extract outcomes in patients with lung cancer using unstructured text data from electronic health records of multiple hospitals. Methods: We constructed AI models (Bidirectional Encoder Representations from Transformers [BERT], Na{\"i}ve Bayes, and Longformer) for tumor evaluation using the University of Miyazaki Hospital (UMH) database. This data included both structured and unstructured data from progress notes, radiology reports, and discharge summaries. The BERT model was applied to the Life Data Initiative (LDI) data set of six hospitals. Study outcomes included the performance of AI models and time to progression of disease (TTP) for each line of treatment based on the treatment response extracted by AI models. Results: For the UMH data set, the BERT model exhibited higher precision accuracy compared to the Na{\"i}ve Bayes or the Longformer models, respectively (precision [0.42 vs. 0.47 or 0.22], recall [0.63 vs. 0.46 or 0.33] and F1 scores [0.50 vs. 0.46 or 0.27]). When this BERT model was applied to LDI data, prediction accuracy remained quite similar. The Kaplan–Meier plots of TTP (months) showed similar trends for the first (median 14.9 [95% confidence interval 11.5, 21.1] and 16.8 [12.6, 21.8]), the second (7.8 [6.7, 10.7] and 7.8 [6.7, 10.7]), and the later lines of treatment for the predicted data by the BERT model and the manually curated data. Conclusion: We developed AI models to extract treatment responses in patients with lung cancer using a large EHR database; however, the model requires further improvement.",

keywords = "Artificial intelligence, BERT, Electronic health records database, Lung cancer, Real-world data, Retrospective study",

author = "Kenji Araki and Nobuhiro Matsumoto and Kanae Togo and Naohiro Yonemoto and Emiko Ohki and Linghua Xu and Yoshiyuki Hasegawa and Daisuke Satoh and Ryota Takemoto and Taiga Miyazaki",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s).",

year = "2023",

month = mar,

doi = "10.1007/s12325-022-02397-7",

language = "英語",

volume = "40",

pages = "934--950",

journal = "Advances in Therapy",

issn = "0741-238X",

publisher = "Adis",

number = "3",

}

TY - JOUR

T1 - Developing Artificial Intelligence Models for Extracting Oncologic Outcomes from Japanese Electronic Health Records

AU - Araki, Kenji

AU - Matsumoto, Nobuhiro

AU - Togo, Kanae

AU - Yonemoto, Naohiro

AU - Ohki, Emiko

AU - Xu, Linghua

AU - Hasegawa, Yoshiyuki

AU - Satoh, Daisuke

AU - Takemoto, Ryota

AU - Miyazaki, Taiga

PY - 2023/3

Y1 - 2023/3

N2 - Introduction: A framework that extracts oncological outcomes from large-scale databases using artificial intelligence (AI) is not well established. Thus, we aimed to develop AI models to extract outcomes in patients with lung cancer using unstructured text data from electronic health records of multiple hospitals. Methods: We constructed AI models (Bidirectional Encoder Representations from Transformers [BERT], Naïve Bayes, and Longformer) for tumor evaluation using the University of Miyazaki Hospital (UMH) database. This data included both structured and unstructured data from progress notes, radiology reports, and discharge summaries. The BERT model was applied to the Life Data Initiative (LDI) data set of six hospitals. Study outcomes included the performance of AI models and time to progression of disease (TTP) for each line of treatment based on the treatment response extracted by AI models. Results: For the UMH data set, the BERT model exhibited higher precision accuracy compared to the Naïve Bayes or the Longformer models, respectively (precision [0.42 vs. 0.47 or 0.22], recall [0.63 vs. 0.46 or 0.33] and F1 scores [0.50 vs. 0.46 or 0.27]). When this BERT model was applied to LDI data, prediction accuracy remained quite similar. The Kaplan–Meier plots of TTP (months) showed similar trends for the first (median 14.9 [95% confidence interval 11.5, 21.1] and 16.8 [12.6, 21.8]), the second (7.8 [6.7, 10.7] and 7.8 [6.7, 10.7]), and the later lines of treatment for the predicted data by the BERT model and the manually curated data. Conclusion: We developed AI models to extract treatment responses in patients with lung cancer using a large EHR database; however, the model requires further improvement.

AB - Introduction: A framework that extracts oncological outcomes from large-scale databases using artificial intelligence (AI) is not well established. Thus, we aimed to develop AI models to extract outcomes in patients with lung cancer using unstructured text data from electronic health records of multiple hospitals. Methods: We constructed AI models (Bidirectional Encoder Representations from Transformers [BERT], Naïve Bayes, and Longformer) for tumor evaluation using the University of Miyazaki Hospital (UMH) database. This data included both structured and unstructured data from progress notes, radiology reports, and discharge summaries. The BERT model was applied to the Life Data Initiative (LDI) data set of six hospitals. Study outcomes included the performance of AI models and time to progression of disease (TTP) for each line of treatment based on the treatment response extracted by AI models. Results: For the UMH data set, the BERT model exhibited higher precision accuracy compared to the Naïve Bayes or the Longformer models, respectively (precision [0.42 vs. 0.47 or 0.22], recall [0.63 vs. 0.46 or 0.33] and F1 scores [0.50 vs. 0.46 or 0.27]). When this BERT model was applied to LDI data, prediction accuracy remained quite similar. The Kaplan–Meier plots of TTP (months) showed similar trends for the first (median 14.9 [95% confidence interval 11.5, 21.1] and 16.8 [12.6, 21.8]), the second (7.8 [6.7, 10.7] and 7.8 [6.7, 10.7]), and the later lines of treatment for the predicted data by the BERT model and the manually curated data. Conclusion: We developed AI models to extract treatment responses in patients with lung cancer using a large EHR database; however, the model requires further improvement.

KW - Artificial intelligence

KW - BERT

KW - Electronic health records database

KW - Lung cancer

KW - Real-world data

KW - Retrospective study

UR - http://www.scopus.com/inward/record.url?scp=85144687846&partnerID=8YFLogxK

U2 - 10.1007/s12325-022-02397-7

DO - 10.1007/s12325-022-02397-7

M3 - 学術論文

C2 - 36547809

AN - SCOPUS:85144687846

SN - 0741-238X

VL - 40

SP - 934

EP - 950

JO - Advances in Therapy

JF - Advances in Therapy

IS - 3

ER -

Developing Artificial Intelligence Models for Extracting Oncologic Outcomes from Japanese Electronic Health Records

抄録

ASJC Scopus 主題領域

UN SDG

文献へのアクセス

フィンガープリント

引用スタイル