Show simple item record

resumen

Abstract
The annotation of repetitive sequences within plant genomes can help in the interpretation of observed phenotypes. Moreover, repeat masking is required for tasks such as whole-genome alignment, promoter analysis, or pangenome exploration. Although homology-based annotation methods are computationally expensive, k-mer strategies for masking are orders of magnitude faster. Here, we benchmarked a two-step approach, where repeats were first called by k-mer [ver mas...]
dc.contributor.authorContreras-Moreira, Bruno
dc.contributor.authorFilippi, Carla Valeria
dc.contributor.authorNaamati, Guy
dc.contributor.authorGarcía Girón, Carlos
dc.contributor.authorAllen, James E.
dc.contributor.authorFlicek, Paul
dc.date.accessioned2021-12-10T13:45:33Z
dc.date.available2021-12-10T13:45:33Z
dc.date.issued2021-09
dc.identifier.issn1940-3372
dc.identifier.otherhttps://doi.org/10.1002/tpg2.20143
dc.identifier.urihttp://hdl.handle.net/20.500.12123/10882
dc.identifier.urihttps://acsess.onlinelibrary.wiley.com/doi/full/10.1002/tpg2.20143
dc.description.abstractThe annotation of repetitive sequences within plant genomes can help in the interpretation of observed phenotypes. Moreover, repeat masking is required for tasks such as whole-genome alignment, promoter analysis, or pangenome exploration. Although homology-based annotation methods are computationally expensive, k-mer strategies for masking are orders of magnitude faster. Here, we benchmarked a two-step approach, where repeats were first called by k-mer counting and then annotated by comparison to curated libraries. This hybrid protocol was tested on 20 plant genomes from Ensembl, with the k-mer-based Repeat Detector (Red) and two repeat libraries (REdat, last updated in 2013, and nrTEplants, curated for this work). Custom libraries produced by RepeatModeler were also tested. We obtained repeated genome fractions that matched those reported in the literature but with shorter repeated elements than those produced directly by sequence homology. Inspection of the masked regions that overlapped genes revealed no preference for specific protein domains. Most Red-masked sequences could be successfully classified by sequence similarity, with the complete protocol taking less than 2 h on a desktop Linux box. A guide to curating your own repeat libraries and the scripts for masking and annotating plant genomes can be obtained at https://github.com/Ensembl/plant-scripts.e
dc.formatapplication/pdfes_AR
dc.language.isoenges_AR
dc.publisherWileyes_AR
dc.rightsinfo:eu-repo/semantics/openAccesses_AR
dc.sourceThe Plant Genome 14 (3) : e20143 (November 2021)es_AR
dc.subjectGenomases_AR
dc.subjectGenomeseng
dc.subjectFitogenéticaes_AR
dc.subjectPlant Geneticseng
dc.subjectGenéticaes_AR
dc.subjectGeneticseng
dc.titleK-mer counting and curated libraries drive efficient annotation of repeats in plant genomeses_AR
dc.typeinfo:ar-repo/semantics/artículoes_AR
dc.typeinfo:eu-repo/semantics/articlees_AR
dc.typeinfo:eu-repo/semantics/publishedVersiones_AR
dc.description.origenInstituto de Biotecnologíaes_AR
dc.description.filFil: Contreras-Moreira, Bruno. European Bioinformatics Institute. European Molecular Biology Laboratory; Reino Unidoes_AR
dc.description.filFil: Filippi, Carla Valeria. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Agrobiotecnología y Biología Molecular (IABIMO); Argentinaes_AR
dc.description.filFil: Filippi, Carla Valeria. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentinaes_AR
dc.description.filFil: Filippi, Carla Valeria. European Bioinformatics Institute. European Molecular Biology Laboratory; Reino Unidoes_AR
dc.description.filFil: Naamati, Guy. European Bioinformatics Institute. European Molecular Biology Laboratory; Reino Unidoes_AR
dc.description.filFil: García Girón, Carlos. European Bioinformatics Institute. European Molecular Biology Laboratory; Reino Unidoes_AR
dc.description.filFil: Allen, James E. European Bioinformatics Institute. European Molecular Biology Laboratory; Reino Unidoes_AR
dc.description.filFil: Flicek, Paul. European Bioinformatics Institute. European Molecular Biology Laboratory; Reino Unidoes_AR
dc.subtypecientifico


Files in this item

Thumbnail

This item appears in the following Collection(s)

common

Show simple item record