Scalable search of massively pooled nucleic acid samples enabled by a molecular database query language

Abstract

The surge in nucleic acid analytics requires scalable storage and retrieval systems akin to electronic databases used to organize digital data. Such a system could transform disease diagnosis, ecological preservation, and molecular surveillance of biothreats. Current storage systems use individual containers for nucleic acid samples, requiring single-sample retrieval that falls short compared with digital databases that allow complex and combinatorial data retrieval on aggregated data. Here, we leverage protective microcapsules with combinatorial DNA labeling that enables arbitrary retrieval on pooled biosamples analogous to Structured Query Languages. Ninety-six encapsulated pooled mock SARS-CoV-2 genomic samples barcoded with patient metadata are used to demonstrate queries with simultaneous matches to sample collection date ranges, locations, and patient health statuses, illustrating how such flexible queries can be used to yield immunological or epidemiological insights. The approach applies to any biosample database labeled with orthogonal barcodes, enabling complex post-hoc analysis, for example, to study global biothreat epidemiology.

期刊：	medRxiv	影响因子：
时间：	2024	起止号：	2024 Apr 15:2024.04.12.24305660.
doi：	10.1101/2024.04.12.24305660	研究方向：	信号转导

Scalable search of massively pooled nucleic acid samples enabled by a molecular database query language

通过分子数据库查询语言实现大规模汇集核酸样本的可扩展搜索

Abstract

特别声明