Pichia-CLM: A language model-based codon optimization pipeline for Komagataella phaffii.

阅读:2
作者:Narayanan Harini, Love J Christopher
The preference in synonymous codon usage-the so-called codon usage bias (CUB)-is governed by several factors such as the host organism, context and function of the gene, and the position of the codon within the gene itself. We demonstrated that this mapping can be learned from the host's genome using language models and subsequently applied for codon optimization of heterologous proteins expressed by the host. This pipeline called Pichia-Codon language model (Pichia-CLM) was applied to the industrial host organism, Komagataella phaffii. With this approach, production of heterologous proteins was enhanced up to threefold compared to their native sequences. Furthermore, Pichia-CLM consistently yielded constructs with enhanced productivity for proteins of varied complexity, compared to commercially available tools. Finally, we showed that Pichia-CLM generates sequences resembling the properties of codon usage found in the host's intrinsic host cell proteins and learned features such as avoiding negative cis-regulatory and repeat elements based on patterns in the genome data. These results show the potential of language models to unbiasedly learn patterns and design robust sequences for improved protein production.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。