Unique Molecular Identifiers reveal a novel sequencing artefact with implications for RNA-Seq based gene expression analysis

Abstract

Attaching Unique Molecular Identifiers (UMI) to RNA molecules in the first step of sequencing library preparation establishes a distinct identity for each input molecule. This makes it possible to eliminate the effects of PCR amplification bias, which is particularly important where many PCR cycles are required, for example, in single cell studies. After PCR, molecules sharing a UMI are assumed to be derived from the same input molecule. In our single cell RNA-Seq studies of Physcomitrella patens, we discovered that reads sharing a UMI, and therefore presumed to be derived from the same mRNA molecule, frequently map to different, but closely spaced locations. This behaviour occurs in all such libraries that we have produced, and in multiple other UMI-containing RNA-Seq data sets in the public domain. This apparent paradox, that reads of identical origin map to distinct genomic coordinates may be partially explained by PCR stutter, which is often seen in low-entropy templates and those containing simple tandem repeats. In the absence of UMI this artefact is undetectable. We show that the common assumption that sequence reads having different mapping coordinates are derived from different starting molecules does not hold. Unless taken into account, this artefact is likely to result in over-estimation of certain transcript abundances, depending on the counting method employed.

期刊：	Scientific Reports	影响因子：	3.800
时间：	2018	起止号：	2018 Sep 3;8(1):13121.
doi：	10.1038/s41598-018-31064-7	研究方向：	信号转导

Unique Molecular Identifiers reveal a novel sequencing artefact with implications for RNA-Seq based gene expression analysis

独特的分子标识符揭示了一种新的测序假象，这对基于RNA-Seq的基因表达分析具有重要意义。

Abstract

特别声明