Schema for DNase HS - DNase I Hypersensitivity in 95 cell types from ENCODE
  Database: hg38    Primary Table: wgEncodeRegDnaseUwAg04449Peak    Row Count: 231,172   Data last updated: 2022-11-07
Format description: BED6+4 Peaks of signal enrichment based on pooled, normalized (interpreted) data.
On download server: MariaDB table dump directory
fieldexampleSQL type info description
bin 586smallint(5) unsigned range Indexing field to speed chromosome range queries.
chrom chr1varchar(255) values Reference sequence chromosome or scaffold
chromStart 180800int(10) unsigned range Start position in chromosome
chromEnd 180950int(10) unsigned range End position in chromosome
name p1varchar(255) values Name given to a region (preferably unique). Use . if no name is assigned
score 23int(10) unsigned range Indicates how dark the peak will be displayed in the browser (0-1000)
strand .char(1) values + or - or . for unknown
signalValue 6float range Measurement of average enrichment for the region
pValue 5.6845float range Statistical significance of signal value (-log10). Set to -1 if not used.
qValue -1float range Statistical significance with multiple-test correction applied (FDR -log10). Set to -1 if not used.
peak -1int(11) range Point-source called for this peak; 0-based offset from chromStart. Set to -1 if no point-source called.

Sample Rows
 
binchromchromStartchromEndnamescorestrandsignalValuepValueqValuepeak
586chr1180800180950p123.65.6845-1-1
587chr1267920268070p254.1614.5928-1-1
589chr1629200629350p31000.38128323.306-1-1
589chr1629860630010p41000.11006323.306-1-1
589chr1632380632530p51000.3661323.306-1-1
589chr1632620632770p61000.818323.306-1-1
589chr1633580633730p71000.10878323.306-1-1
589chr1633960634110p81000.32787323.306-1-1
590chr1778680778830p91000.356323.306-1-1
590chr1779780779930p1024.73.68203-1-1

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

DNase HS (wgEncodeRegDnase) Track Description
 

Description

These tracks contain the results of DNase I hypersensitivity experiments performed by the John Stamatoyannapoulos lab at the University of Washington from September 2007 to January 2011, as part of the ENCODE project first production phase. Colors were assigned to cell types based on similarity of signal.

Other views of this data (along with additional documentation) are available from the hg19 ENCODE UW DNaseI HS track.

Display Conventions and Configuration

This track is a composite annotation track containing multiple subtracks, one for each cell type. The display mode and filtering of each subtrack can be individually controlled. For more information about track configuration, see Configuring Multi-View Tracks.

Methods

Raw sequence data files were processed by the UCSC ENCODE DNase analysis pipeline (July 2014 specification), diagrammed here:

ENCODE DNase Pipeline Credit: Qian Alvin Qin, X. Liu lab

Briefly, sequence files were aligned to the hg38 (GRCh38) genome assembly augmented with 'sponge' sequence (ref). Multi-mapped reads were removed, as were reads that aligned to 'sponge' or mitochondiral sequence. Results from all replicates were pooled, and further processed by the Hotspot program to call peaks as well as broader regions of activity ('hotspots'), and to create signal density graphs. Signal graphs were normalized so the average value genome-wide is 1.

The cell types were clustered into a binary tree, a rainbow was cast to the leaf nodes providing coloring based on similarity.

ENCODE cell clustering by similarity Credit: Chris Eisenhart, J. Kent lab
(Please note there is different coloring on the ENCODE hg38 Transcription track, Layered H3K4Me1 track, Layered H3K4Me3 track, and Layered H3K27Ac track, which match the coloring used in their previous versions lifted from the hg19 assembly).

Credits

The processed data for this track were produced by UCSC. Credits for the primary data underlying this track are included in the ENCODE UW DNaseI HS track description.

References

Miga KH, Eisenhart C, Kent WJ. Utilizing mapping targets of sequences underrepresented in the reference assembly to reduce false positive alignments. Nucleic Acids Res. 2015 Nov 16;43(20):e133. PMID: 26163063

Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B et al. The accessible chromatin landscape of the human genome. Nature. 2012 Sep 6;489(7414):75-82. PMID: 22955617; PMC: PMC3721348

See also the references in the ENCODE UW DNaseI HS track.