Identification and characterization of abundant repetitive sequences in Eragrostis tef cv. Enatite genome

鉴定和表征 Eragrostis tef cv. Enatite 基因组中丰富的重复序列

阅读:9
作者:Yohannes Gedamu Gebre, Edoardo Bertolini, Mario Enrico Pè, Andrea Zuccolo

Background

Eragrostis tef is an allotetraploid (2n = 4 × = 40) annual, C4 grass with an estimated nuclear genome size of 730 Mbp. It is widely grown in Ethiopia, where it provides basic nutrition for more than half of the population. Although a draft assembly of the E. tef genome was made available in 2014, characterization of the repetitive portion of the E. tef genome has not been a subject of a detailed analysis. Repetitive sequences constitute most of the DNA in eukaryotic genomes. Transposable elements are usually the most abundant repetitive component in plant genomes. They contribute to genome size variation, cause mutations, can result in chromosomal rearrangements, and influence gene regulation. An extensive and in depth characterization of the repetitive component is essential in understanding the evolution and function of the genome.

Conclusions

Analyzing a large sample of randomly sheared reads we obtained a library of the repetitive sequences of E. tef. The approach we used was designed to avoid underestimation of repeat contribution; such underestimation is characteristic of whole genome assembly projects. The data collected represent a valuable resource for further analysis of the genome of this important orphan crop.

Results

Using new paired-end sequence data and a de novo repeat identification strategy, we identified the most repetitive elements in the E. tef genome. Putative repeat sequences were annotated based on similarity to known repeat groups in other grasses. Altogether we identified 1,389 medium/highly repetitive sequences that collectively represent about 27% of the teff genome. Phylogenetic analyses of the most important classes of TEs were carried out in a comparative framework including paralog elements from rice and maize. Finally, an abundant tandem repeat accounting for more than 4% of the whole genome was identified and partially characterized. Conclusions: Analyzing a large sample of randomly sheared reads we obtained a library of the repetitive sequences of E. tef. The approach we used was designed to avoid underestimation of repeat contribution; such underestimation is characteristic of whole genome assembly projects. The data collected represent a valuable resource for further analysis of the genome of this important orphan crop.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。