Mappability Track Settings
 
Mappability or Uniqueness of Reference Genome from ENCODE   (All Mapping and Sequencing tracks)

Maximum display mode:       Reset to defaults
Select mappability (Help):
Alignability ▾       Uniqueness ▾       Blacklisted      
 Select all subtracks
List subtracks: only selected/visible    all    ()
  mappability↓1 Lab↓2 Window Size↓3   Track Name↓4  
 
hide
 Configure
 Alignability  CRG-Guigo  100bp  Alignability of 100mers by GEM from ENCODE/CRG(Guigo)    Data format 
 
hide
 Configure
 Alignability  CRG-Guigo  24bp  Alignability of 24mers by GEM from ENCODE/CRG(Guigo)    Data format 
 
hide
 Configure
 Alignability  CRG-Guigo  36bp  Alignability of 36mers by GEM from ENCODE/CRG(Guigo)    Data format 
 
hide
 Configure
 Alignability  CRG-Guigo  40bp  Alignability of 40mers by GEM from ENCODE/CRG(Guigo)    Data format 
 
hide
 Configure
 Alignability  CRG-Guigo  50bp  Alignability of 50mers by GEM from ENCODE/CRG(Guigo)    Data format 
 
hide
 Configure
 Alignability  CRG-Guigo  75bp  Alignability of 75mers by GEM from ENCODE/CRG(Guigo)    Data format 
 
hide
 Blacklisted  DAC  Varied  DAC Blacklisted Regions from ENCODE/DAC(Kundaje)    Data format 
 
hide
 Blacklisted  Duke  Varied  Duke Excluded Regions from ENCODE/OpenChrom(Duke)    Data format 
 
hide
 Configure
 Uniqueness  Duke  20bp  Uniqueness of 20bp Windows from ENCODE/OpenChrom(Duke)    Data format 
 
hide
 Configure
 Uniqueness  Duke  35bp  Uniqueness of 35bp Windows from ENCODE/OpenChrom(Duke)    Data format 
    
Assembly: Human Feb. 2009 (GRCh37/hg19)

Description

These tracks display the level of sequence uniqueness of the reference GRCh37/hg19 genome assembly. They were generated using different window sizes, and high signal will be found in areas where the sequence is unique.

Display Conventions and Configuration

This track is a multi-view composite track that contains multiple data types separated as separate (views). For each view, there are multiple subtracks representing different sequence lengths or methods of preparation. Instructions for configuring multi-view tracks are here. Mappability tracks consist of the following views:

Alignability
These tracks provide a measure of how often the sequence found at the particular location will align within the whole genome. Unlike measures of uniqueness, alignability will tolerate up to 2 mismatches. These tracks are in the form of signals ranging from 0 to 1 and have several configuration options.

Uniqueness
These tracks are a direct measure of sequence uniqueness throughout the reference genome. These tracks are in the form of signals ranging from 0 to 1 and have several configuration options.

Blacklisted Regions
Both tracks of blacklisted regions attempt to identify regions of the reference genome which are troublesome for high throughput sequencing aligners. Troubled regions may be due to repetitive elements or other anomalies. Each track contains a set of regions of varying length with no special configuration options.

Methods

Alignability

The CRG Alignability tracks display how uniquely k-mer sequences align to a region of the genome. To generate the data, the GEM-mappability program has been employed. The method is equivalent to mapping sliding windows of k-mers (where k has been set to 36, 40, 50, 75 or 100 nts to produce these tracks) back to the genome using the GEM mapper aligner (up to 2 mismatches were allowed in this case). For each window, a mappability score was computed (S = 1/(number of matches found in the genome): S=1 means one match in the genome, S=0.5 is two matches in the genome, and so on). The CRG Alignability tracks were generated independently of the ENCODE project, in the framework of the GEM (GEnome Multitool) project.

Uniqueness

The Duke Uniqueness tracks display how unique each sequence is on the positive strand starting at a particular base and of a particular length. Thus, the 20 bp track reflects the uniqueness of all 20 base sequences with the score being assigned to the first base of the sequence. Scores are normalized to between 0 and 1, with 1 representing a completely unique sequence and 0 representing a sequence that occurs more than 4 times in the genome (excluding chrN_random and alternative haplotypes). A score of 0.5 indicates the sequence occurs exactly twice, likewise 0.33 for three times and 0.25 for four times. The Duke Uniqueness tracks were generated for the ENCODE project as tools in the development of the Open Chromatin: DNaseI HS, FAIRE, TFBS and Synthesis tracks.

Blacklisted Regions

The DAC Blacklisted Regions aim to identify a comprehensive set of regions in the human genome that have anomalous, unstructured, high signal/read counts in next gen sequencing experiments independent of cell line and type of experiment. There were 80 open chromatin tracks (DNase and FAIRE datasets) and 20 ChIP-seq input/control tracks spanning ~60 human tissue types/cell lines in total used to identify these regions with signal artifacts. These regions tend to have a very high ratio of multi-mapping to unique mapping reads and high variance in mappability. Some of these regions overlap pathological repeat elements such as satellite, centromeric and telomeric repeats. However, simple mappability based filters do not account for most of these regions. Hence, it is recommended to use this blacklist alongside mappability filters. The DAC Blacklisted Regions track was generated for the ENCODE project.

The Duke Excluded Regions track displays genomic regions for which mapped sequence tags were filtered out before signal generation and peak calling for Open Chromatin: DNaseI HS and FAIRE tracks. This track contains problematic regions for short sequence tag signal detection (such as satellites and rRNA genes). The Duke Excluded Regions track was generated for the ENCODE project.

Release Notes

This is Release 3 (October 2011) of this track, which now includes the DAC Blacklisted regions, Duke Uniqueness and Duke Excluded regions.

Credits

The CRG Alignability track was created by Thomas Derrien and Paolo Ribeca in Roderic Guigo's lab at the Centre for Genomic Regulation (CRG), Barcelona, Spain. Thomas Derrien was supported by funds from NHGRI for the ENCODE project, while Paolo Ribeca was funded by a Consolider grant CDS2007-00050 from the Spanish Ministerio de Educación y Ciencia.

The Duke Uniqueness and Duke Excluded Regions tracks were created by Terry Furey and Debbie Winter at Duke Univerisity's Institute for Genome Sciences & Policy (IGSP); and Stefan Graf at the University of Cambridge, Department of Oncology and CR-UK Cambridge Research Institute (CRI). We thank NHGRI for ENCODE funding support.

The DAC Blacklisted Regions were created by Anshul Kundaje at Stanford University in the labs of Batzoglou and Sidow and in cooperation with Ewan Birney at the European Bioinformatics Insitute (EBI). We thank NHGRI for ENCODE funding support. (Contact: Anshul Kundaje).

References

Derrien T, Estelle J, Marco Sola S, Knowles DG, Raineri E, Guigo R, Ribeca P. Fast computation and applications of genome mappability. PLoS One. 2012;7(1):e30377.

Data Release Policy

Data users may freely use all data in this track. ENCODE labs that contributed annotations have exempted the data displayed here from the ENCODE data release policy restrictions.