Repeats
Repetitive sequence is found throughout genomes. It is important to mask repeats before gene annotation, as repeats will cause non-specific gene hits. Repeats are also useful for studying evolution and for DNA fingerprinting.
Types of repeats
| Repeat type | Definition | 
|---|---|
| Centromere | The region of the chromosome at which the two sister chromatids are joined during mitosis and meiosis, mostly composed of satellite DNA. | 
| Low complexity regions | Poly-purine or poly-pyrimidine stretches, or regions of extremely high AT or GC content. | 
| RNA repeats | Non-functional copies of RNA genes which have been reintegrated into the genome with the assistance of a reverse transcriptase. | 
| Satellite repeats | Multiple copies of the same base sequence on a DNA sequence. The repeated pattern can vary in length from a single base to several thousand bases long. | 
| Simple repeats | Duplications of simple sets of DNA bases (typically 1-5bp) such as A, CA, CGG etc. | 
| Tandem repeats | Typically found at the centromeres and telomeres of chromosomes these are duplications of more complex 100-200 base sequences. | 
| LTRs | Long tandem repeats. | 
| Type I Transposons/LINE | Long Interspersed Elements. Retrotransposed elements in the genome containing open reading frames encoding (often inactive) reverse transcription machinery. | 
| Type I Transposons/SINE | Short Interspersed Elements. Retrotransposed elements less than 500 bp that contain tRNA, snRNA and rRNA, which require other mobile elements to be transposed. Alu elements are a type of SINE. | 
| Type II Transposons | Elements that have been transposed and duplicated around the genome by excision and ligation. | 
| Unknown | Repeats that cannot be classified. | 
Masking the repeats
Ensembl mask repeats using Repeat Masker and Dust.
Repeats can be viewed in the browser and extracted using our APIs. You can also download repeat-masked sequence from our FTP site, either hard-masked (rm) where repeats are replaced with Ns, or soft-masked (sm) where repeats are in lower-case text.
