Abstract
Vaccines are the most effective tool in preventing and managing infectious diseases. One of the critical challenges in vaccine development is the selection of suitable target antigens from the thousands of proteins produced by pathogens. Artificial intelligence is anticipated to play a significant role in addressing this challenge. In this study, we develop a framework termed PLGDL for protective antigen prediction that employs Protein Language and Geometric Deep Learning models. This framework leverages both primary sequence features and three-dimensional structural features of protein antigens, thereby reducing the biases associated with manually curated features. Our integrated model exhibits robustness across both constructed and public datasets and is applicable to viruses, bacteria, and eukaryotic pathogens. Notably, when applied to the ongoing Mpox outbreak, our model not only quickly identifies multiple known antigens but also discovers a protective antigen: G10R. Here, our study provides a high-performance screening tool for protective vaccine antigen prediction by synergistically utilizing the capabilities of protein language and geometric deep learning models, providing substantive insights and methodological advancements for rapid vaccine development.
