Upcoming Ensembl Platform Transition

This is the final release of its kind on this website.

In summer 2026, this site will bring you to the new Ensembl platform currently at beta.ensembl.org.
Please bookmark this archive to retain access to the current site, tools and functionality until they are available on the new platform -> jun2026.archive.ensembl.org

Automatic annotation using RNA-seq data

RNA-seq Genebuild flowchart

  • For each tissue or individual, the raw reads are aligned to the genome using BWA. This step allows us to quickly identify regions of the genome that are actively transcribed. We use the results from all tissues to create one set of alignment blocks roughly corresponding to exons.

  • Read pairing information is used to group exons into one set of approximate transcript structures called proto-transcripts.

  • Next, both the pooled (merged) reads and also reads from each tissue that were partially mapped by BWA are re-aligned to the proto-transcripts using Exonerate and a short word length to create a merged or tissue-specific set of spliced alignments representing canonical and non-canonical introns.

  • Gene models are created by combining the proto-transcripts with the spliced reads to create all possible variants, the variant with the most read support is displayed.

References

Incorporating RNA-seq data into the zebrafish Ensembl genebuild John E. Collins, Simon White, Stephen M.J. Searle and Derek L. Stemple Genome Research 2012