Schema for Segmental Dups - Duplications of >1000 Bases of Non-RepeatMasked Sequence
  Database: hg38    Primary Table: genomicSuperDups    Row Count: 69,894   Data last updated: 2014-10-14
Format description: Summary of large genomic Duplications (>1KB >90% similar)
On download server: MariaDB table dump directory
fieldexampleSQL type info description
bin 585smallint(6) range Indexing field to speed chromosome range queries.
chrom chr1varchar(255) values Reference sequence chromosome or scaffold
chromStart 10000int(10) unsigned range Start position in chromosome
chromEnd 87112int(10) unsigned range End position in chromosome
name chr15:101906152varchar(255) values Other chromosome involved
score 0int(10) unsigned range Score based on the raw BLAST alignment score. Set to 0 and not used in later versions.
strand -char(1) values Value should be + or -
otherChrom chr15varchar(255) values Other chromosome or scaffold
otherStart 101906152int(10) unsigned range Start in other sequence
otherEnd 101981189int(10) unsigned range End in other sequence
otherSize 75037int(10) unsigned range Total size of other chromosome
uid 11764int(10) unsigned range Unique id shared by the query and subject
posBasesHit 1000int(10) unsigned range For future use
testResult N/Avarchar(255) values For future use
verdict N/Avarchar(255) values For future use
chits N/Avarchar(255) values For future use
ccov N/Avarchar(255) values For future use
alignfile align_both/0009/both0046049varchar(255) values alignment file path
alignL 77880int(10) unsigned range spaces/positions in alignment
indelN 71int(10) unsigned range number of indels
indelS 3611int(10) unsigned range indel spaces
alignB 74269int(10) unsigned range bases Aligned
matchB 73743int(10) unsigned range aligned bases that match
mismatchB 526int(10) unsigned range aligned bases that do not match
transitionsB 331int(10) unsigned range number of transitions
transversionsB 195int(10) unsigned range number of transversions
fracMatch 0.992918float range fraction of matching bases
fracMatchIndel 0.991969float range fraction of matching bases with indels
jcK 0.00711601float range K-value calculated with Jukes-Cantor
k2K 0.00711937float range Kimura K

Sample Rows
 
binchromchromStartchromEndnamescorestrandotherChromotherStartotherEndotherSizeuidposBasesHittestResultverdictchitsccovalignfilealignLindelNindelSalignBmatchBmismatchBtransitionsBtransversionsBfracMatchfracMatchIndeljcKk2K
585chr11000087112chr15:1019061520-chr1510190615210198118975037117641000N/AN/AN/AN/Aalign_both/0009/both00460497788071361174269737435263311950.9929180.9919690.007116010.00711937
585chr11000020818chr12:100430+chr1210043208531081078221000N/AN/AN/AN/Aalign_both/0003/both001700210947232661068110500181105760.9830540.9809420.01714040.017154
585chr11000019844chrX:1560202160-chrX1560202161560305741035838511000N/AN/AN/AN/Aalign_both/0014/both0071854104372067297659598167107600.9828980.9808890.01729990.0173215
585chr11016937148chr1:1807230+chr11807232076662694311000N/AN/AN/AN/Aalign_both/0014/both0071547270253012826897266282691641050.9899990.9888960.01006840.0100743
585chr11046440733chr2:1135727200-chr21135727201136024092968919801000N/AN/AN/AN/Aalign_both/0014/both0071629303134366829645292853602331270.9878560.9864260.01224310.0122543
585chr11048519844chr16:104260+chr1610426195339107140411000N/AN/AN/AN/Aalign_both/0009/both00463069377142889089897011972470.9869070.9853890.01320840.0132182
585chr11048540733chr9:108430+chr910843405152967235471000N/AN/AN/AN/Aalign_both/0014/both007182530274156282964629475171107640.9942320.9937290.005790360.00579252
585chr11839287112chr19:600000+chr196000012867268672212781000N/AN/AN/AN/Aalign_both/0013/both0065164687994220668593682803131891240.9954370.9948280.004577090.00457824
585chr12086340733chr12:229230+chr1222923449002197778241000N/AN/AN/AN/Aalign_both/0003/both0017004220152221831983219613219136830.9889570.9878610.01112490.0111326
585chr17000787112chr6:600000+chr660000770751707529271000N/AN/AN/AN/Aalign_both/0014/both0071692171361992170441690813685510.9920210.9909160.00802210.00802624

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

Segmental Dups (genomicSuperDups) Track Description
 

Description

This track shows regions detected as putative genomic duplications within the golden path. The following display conventions are used to distinguish levels of similarity:

  • Light to dark gray: 90 - 98% similarity
  • Light to dark yellow: 98 - 99% similarity
  • Light to dark orange: greater than 99% similarity
  • Red: duplications of greater than 98% similarity that lack sufficient Segmental Duplication Database evidence (most likely missed overlaps)
For a region to be included in the track, at least 1 Kb of the total sequence (containing at least 500 bp of non-RepeatMasked sequence) had to align and a sequence identity of at least 90% was required.

Methods

Segmental duplications play an important role in both genomic disease and gene evolution. This track displays an analysis of the global organization of these long-range segments of identity in genomic sequence.

Large recent duplications (>= 1 kb and >= 90% identity) were detected by identifying high-copy repeats, removing these repeats from the genomic sequence ("fuguization") and searching all sequence for similarity. The repeats were then reinserted into the pairwise alignments, the ends of alignments trimmed, and global alignments were generated. For a full description of the "fuguization" detection method, see Bailey et al., 2001. This method has become known as WGAC (whole-genome assembly comparison); for example, see Bailey et al., 2002.

Credits

These data were provided by Ginger Cheng, Xinwei She, Archana Raja, Tin Louie and Evan Eichler at the University of Washington.

References

Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams MD, Myers EW, Li PW, Eichler EE. Recent segmental duplications in the human genome. Science. 2002 Aug 9;297(5583):1003-7. PMID: 12169732

Bailey JA, Yavor AM, Massa HF, Trask BJ, Eichler EE. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 2001 Jun;11(6):1005-17. PMID: 11381028; PMC: PMC311093