Information entropy-based differential evolution with extremely randomized trees and LightGBM for protein structural class prediction

Yu Zhang, Shangce Gao*, Pengxing Cai, Zhenyu Lei, Yirui Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

26 Scopus citations

Abstract

The discovery of protein tertiary structure is the basis of current genetic engineering, medicinal design, and other biological applications. Protein structural class plays a significant role in the tertiary structure folding and function analysis of protein. However, the growth rate of new amino acid sequence far exceeds the tertiary structure. Existing research methods of confirming protein folding cannot satisfy massive sequences and protein engineering. A high-accuracy prediction result of low-similarity protein dataset is particularly critical to generate the corresponding tertiary structure from the primary structure. In this paper, we construct a novel super-large-scale feature of the primary structure based on secondary structure, evolutionary information, chemical properties, and global descriptors. The diversified and massive features are utilized to predict the protein class based on a novel feature selection algorithm and a gradient boosting decision tree model. To testify the effectiveness and robustness of our proposed method, namely IDEGBM, we choose the 10-fold cross-validation for evaluating four benchmark datasets 25PDB, FC699, D1189 and D640. Experimental results exhibit that our method improves the accuracy in comparison with other state-of-the-art prediction models in terms of both accuracy and efficiency. Furthermore, a representative protein is used to validate that our proposed IDEGBM can be applied to improve the conformation prediction of protein tertiary structure.

Original languageEnglish
Article number110064
JournalApplied Soft Computing
Volume136
DOIs
StatePublished - 2023/03

Keywords

  • Evolutionary algorithm
  • Feature selection
  • Prediction model
  • Protein structural class
  • Single objective optimization

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Information entropy-based differential evolution with extremely randomized trees and LightGBM for protein structural class prediction'. Together they form a unique fingerprint.

Cite this