Schema for DNase Clusters - DNase I Hypersensitivity Peak Clusters from ENCODE (95 cell types)
  Database: hg38    Primary Table: wgEncodeRegDnaseClustered    Row Count: 2,107,358   Data last updated: 2019-01-08
Format description: BED5+ with a count, list of sources, and list of source scores for combined data
fieldexampleSQL type description
bin 585int(10) unsigned Indexing field to speed chromosome range queries.
chrom chr1varchar(255) Reference sequence chromosome or scaffold
chromStart 9980int(10) unsigned Start position in chromosome
chromEnd 10410int(10) unsigned End position in chromosome
name 8varchar(255) Name of item
score 72int(10) unsigned Display score (0-1000)
sourceCount 8int(10) unsigned Number of sources
sourceIds 88,71,65,9,11,17,66,87,longblob Source ids
sourceScores 72,38,31,29,56,55,38,48,longblob Source scores

Connected Tables and Joining Fields
        hg38.wgEncodeRegDnaseClusteredSources.id (via wgEncodeRegDnaseClustered.sourceIds)

Sample Rows
 
binchromchromStartchromEndnamescoresourceCountsourceIdssourceScores
585chr1998010410872888,71,65,9,11,17,66,87,72,38,31,29,56,55,38,48,
585chr11616016330882836,7,84,91,32,17,87,69,52,11,82,64,47,30,55,50,
585chr156200563501113131,113,
585chr19014090290132165,32,
585chr1965209667014813,48,
585chr1104900105050144142,44,
585chr11156601158102574262,73,574,115,
585chr1118520118670135162,35,
586chr1136680136830129173,29,
586chr1138460138610127173,27,

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

DNase Clusters (wgEncodeRegDnaseClustered) Track Description
 

Description

This track shows clusters of DNaseI hypersensitivity derived from assays in 95 cell types by the John Stamatoyannapoulos lab at the University of Washington from September 2007 to January 2011, as part of the ENCODE project first production phase. Regulatory regions in general, and promoters in particular, tend to be DNase-sensitive.

Additional views of this data sites are displayed from the DNaseI HS track. The peaks in that track are the basis for the clusters shown here, which combine data from peaks from the different cell lines. Please note that track colors for the DNase tracks are based on similiarity of cell types, while there is different coloring for cell types on the ENCODE hg38 Transcription track, Layered H3K4Me1 track, Layered H3K4Me3 track, and Layered H3K27Ac track, which match the coloring used in their previous versions lifted from the hg19 assembly.

Display Conventions and Configuration

A gray box indicates the extent of the hypersensitive region. The darkness is proportional to the maximum signal strength observed in any cell line. The number to the left of the box shows how many cell lines are hypersensitive in the region. The track can be configured to restrict the display to elements above a specified score in the range 1-1000 (where score is based on signal strength).

Methods

Raw sequence data files were processed by the UCSC ENCODE DNase analysis pipeline (July 2014 specification), diagrammed here:

ENCODE DNase Pipeline Credit: Qian Alvin Qin, X. Liu lab

Briefly, sequence files were aligned to the hg38 (GRCh38) genome assembly augmented with 'sponge' sequence (ref). Multi-mapped reads were removed, as were reads that aligned to 'sponge' or mitochondiral sequence. Results from all replicates were pooled, and further processed by the Hotspot program to call peaks.

Peaks of DNaseI hypersensitivity from the ENCODE DNase Analysis Pipeline at UCSC were assigned normalized scores (by UCSC regClusterMakeTableOfTables) in the range 0-1000 based on the narrowPeak signalValue and then clustered on score (by UCSC regCluster) to generate singly-linked clusters. Additional documentation on the methods used to identify hypersensitive sites are available from the DNaseI HS track.

Credits

This track is based on sequence data from the University of Washington ENCODE group, with subsequent processing by UCSC. For additional credits and references, see the DNaseI HS track.