Identification and characterization of abundant repetitive sequences in Eragrostis tef cv. Enatite genome

Background

Eragrostis tef is an allotetraploid (2n = 4 × = 40) annual, C4 grass with an estimated nuclear genome size of 730 Mbp. It is widely grown in Ethiopia, where it provides basic nutrition for more than half of the population. Although a draft assembly of the E. tef genome was made available in 2014, characterization of the repetitive portion of the E. tef genome has not been a subject of a detailed analysis. Repetitive sequences constitute most of the DNA in eukaryotic genomes. Transposable elements are usually the most abundant repetitive component in plant genomes. They contribute to genome size variation, cause mutations, can result in chromosomal rearrangements, and influence gene regulation. An extensive and in depth characterization of the repetitive component is essential in understanding the evolution and function of the genome.

Conclusions

Analyzing a large sample of randomly sheared reads we obtained a library of the repetitive sequences of E. tef. The approach we used was designed to avoid underestimation of repeat contribution; such underestimation is characteristic of whole genome assembly projects. The data collected represent a valuable resource for further analysis of the genome of this important orphan crop.

Results

Using new paired-end sequence data and a de novo repeat identification strategy, we identified the most repetitive elements in the E. tef genome. Putative repeat sequences were annotated based on similarity to known repeat groups in other grasses. Altogether we identified 1,389 medium/highly repetitive sequences that collectively represent about 27% of the teff genome. Phylogenetic analyses of the most important classes of TEs were carried out in a comparative framework including paralog elements from rice and maize. Finally, an abundant tandem repeat accounting for more than 4% of the whole genome was identified and partially characterized. Conclusions: Analyzing a large sample of randomly sheared reads we obtained a library of the repetitive sequences of E. tef. The approach we used was designed to avoid underestimation of repeat contribution; such underestimation is characteristic of whole genome assembly projects. The data collected represent a valuable resource for further analysis of the genome of this important orphan crop.

期刊：	BMC Plant Biology	影响因子：	4.300
时间：	2016	起止号：	2016 Feb 1:16:39.
doi：	10.1186/s12870-016-0725-4	研究方向：	信号转导

Identification and characterization of abundant repetitive sequences in Eragrostis tef cv. Enatite genome

鉴定和表征 Eragrostis tef cv. Enatite 基因组中丰富的重复序列

Background

Conclusions

Results

特别声明