Schema for Segmental Dups - Duplications of >1000 Bases of Non-RepeatMasked Sequence
  Database: mm10    Primary Table: genomicSuperDups    Row Count: 659,775   Data last updated: 2014-04-14
Format description: Summary of large genomic Duplications (>1KB >90% similar)
fieldexampleSQL type description
bin 611smallint(6) Indexing field to speed chromosome range queries.
chrom chr1varchar(255) Reference sequence chromosome or scaffold
chromStart 3531709int(10) unsigned Start position in chromosome
chromEnd 3535022int(10) unsigned End position in chromosome
name chr5:133839256varchar(255) Other chromosome involved
score 0int(10) unsigned Score based on the raw BLAST alignment score. Set to 0 and not used in later versions.
strand -char(1) Value should be + or -
otherChrom chr5varchar(255) Other chromosome or scaffold
otherStart 133839256int(10) unsigned Start in other sequence
otherEnd 133842605int(10) unsigned End in other sequence
otherSize 3349int(10) unsigned Total size of other chromosome
uid 386int(10) unsigned Unique id shared by the query and subject
posBasesHit 1000int(10) unsigned For future use
testResult N/Avarchar(255) For future use
verdict N/Avarchar(255) For future use
chits N/Avarchar(255) For future use
ccov N/Avarchar(255) For future use
alignfile align_both/0004/both0022602varchar(255) alignment file path
alignL 3371int(10) unsigned spaces/positions in alignment
indelN 19int(10) unsigned number of indels
indelS 80int(10) unsigned indel spaces
alignB 3291int(10) unsigned bases Aligned
matchB 3070int(10) unsigned aligned bases that match
mismatchB 221int(10) unsigned aligned bases that do not match
transitionsB 121int(10) unsigned number of transitions
transversionsB 100int(10) unsigned number of transversions
fracMatch 0.932847float fraction of matching bases
fracMatchIndel 0.927492float fraction of matching bases with indels
jcK 0.0703516float K-value calculated with Jukes-Cantor
k2K 0.0705369float Kimura K

Sample Rows
 
binchromchromStartchromEndnamescorestrandotherChromotherStartotherEndotherSizeuidposBasesHittestResultverdictchitsccovalignfilealignLindelNindelSalignBmatchBmismatchBtransitionsBtransversionsBfracMatchfracMatchIndeljcKk2K
611chr135317093535022chr5:1338392560-chr513383925613384260533493861000N/AN/AN/AN/Aalign_both/0004/both002260233711980329130702211211000.9328470.9274920.07035160.0705369
626chr154234345429545chr10:556534240+chr10556534245565958761635701000N/AN/AN/AN/Aalign_both/0000/both0000536616786061075873234136980.9616830.9604250.03933010.0394048
626chr154234345429532chrX:1085668380+chrX10856683810857299761594981000N/AN/AN/AN/Aalign_both/0004/both0022644615996160985894204124800.9665460.9651220.03422260.0342915
626chr154234355429437chr5:77664450+chr57766445777251860733871000N/AN/AN/AN/Aalign_both/0004/both002262260761177599957322671491180.9554930.9537440.04588270.0459669
626chr154234355429545chr4:970801520+chr4970801529708631761653551000N/AN/AN/AN/Aalign_both/0004/both0022620617296961035877226141850.9629690.9615510.03797640.0380718
626chr154234425429532chr2:1518804290+chr215188042915188658561563011000N/AN/AN/AN/Aalign_both/0004/both002260961831312060635887176102740.9709710.9688940.02960520.0296465
626chr154234555429514chr3:118802520+chr3118802521188634860963171000N/AN/AN/AN/Aalign_both/0004/both002261061192283603655974392392000.927270.9239020.07650270.0767171
626chr154234565429545chr18:891371830+chr1889137183891433336150169721000N/AN/AN/AN/Aalign_both/0004/both0022323615496960855879206130760.9661460.9647190.03464160.0347246
626chr154234565429545chr18:153665280+chr1815366528153726726144169571000N/AN/AN/AN/Aalign_both/0004/both0022172614575760885889199118810.9673130.9662020.0334210.0334797
626chr154234565429437chr11:410805860+chr114108058641086626604014201000N/AN/AN/AN/Aalign_both/0000/both000126160501579597156603111671440.9479150.945540.0539820.0540787

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

Segmental Dups (genomicSuperDups) Track Description
 

Description

This track shows regions detected as putative genomic duplications within the golden path. The following display conventions are used to distinguish levels of similarity:

  • Light to dark gray: 90 - 98% similarity
  • Light to dark yellow: 98 - 99% similarity
  • Light to dark orange: greater than 99% similarity
  • Red: duplications of greater than 98% similarity that lack sufficient Segmental Duplication Database evidence (most likely missed overlaps)
For a region to be included in the track, at least 2.5 Kb of the total sequence (containing at least 500 bp of non-RepeatMasked sequence) had to align and a sequence identity of at least 90% was required.

Methods

Segmental duplications play an important role in both genomic disease and gene evolution. This track displays an analysis of the global organization of these long-range segments of identity in genomic sequence.

Large recent duplications (>= 1 kb and >= 90% identity) were detected by identifying high-copy repeats, removing these repeats from the genomic sequence ("fuguization") and searching all sequence for similarity. The repeats were then reinserted into the pairwise alignments, the ends of alignments trimmed, and global alignments were generated. For a full description of the "fuguization" detection method, see Bailey et al. (2001) in the References section below. This method has become known as WGAC (whole-genome assembly comparison); for example, see Bailey et al. (2002).

Credits

These data were provided by Archana Raja and Evan Eichler at the University of Washington.

References

Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams MD, Myers EW, Li PW, Eichler EE. Recent segmental duplications in the human genome. Science. 2002 Aug 9;297(5583):1003-7. PMID: 12169732

Bailey JA, Yavor AM, Massa HF, Trask BJ, Eichler EE. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 2001 Jun;11(6):1005-17. PMID: 11381028; PMC: PMC311093