1000G Archive 1000G Ph3 Vars Track Settings
 
1000 Genomes Phase 3 Integrated Variant Calls: SNVs, Indels, SVs

Track collection: 1000 Genomes Archive

+  Description
+  All tracks in this collection (4)

Display mode:      Duplicate track

Haplotype sorting display

Enable Haplotype sorting display
Haplotype sorting order:
using middle variant in viewing window as anchor.
If this mode is selected and genotypes are phased or homozygous, then each genotype is split into two independent haplotypes. These local haplotypes are clustered by similarity around a central variant. Haplotypes are reordered for display using the clustering tree, which is drawn in the left label area. Local haplotype blocks can often be identified using this display.
To anchor the sorting to a particular variant, click on the variant in the genome browser, and then click on the 'Use this variant' button on the next page.
using the order in which samples appear in the underlying VCF file
Haplotype clustering tree leaf shape:
draw branches whose samples are all identical as <
draw branches whose samples are all identical as [
Allele coloring scheme:
reference alleles invisible, alternate alleles in black
reference alleles invisible, alternate alleles in red for non-synonymous, green for synonymous, blue for UTR/noncoding, black otherwise
reference alleles in blue, alternate alleles in red
first base of allele (A = red, C = blue, G = green, T = magenta)
Haplotype sorting display height:

Filters

Exclude variants with Quality/confidence score (QUAL) score less than
Exclude variants with these FILTER values:
PASS (All filters passed)
Minimum minor allele frequency (if INFO column includes AF or AC+AN):


Display data as a density graph:

VCF configuration help

Data schema/format description and download
Assembly: Human Feb. 2009 (GRCh37/hg19)
Data last updated at UCSC: 2016-03-01

Description

This track shows 84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants discovered by the 1000 Genomes Project through its Phase 3 sequencing of 2,504 genomes from 16 populations worldwide.

The variant genotypes have been phased by the 1000 Genomes Project (i.e., the two alleles of each diploid genotype have been assigned to two haplotypes, one inherited from each parent). This extra information enables a clustering of independent haplotypes by local similarity for display.

Display Conventions

In "dense" mode, a vertical line is drawn at the position of each variant. In "pack" mode, since these variants have been phased, the display shows a clustering of haplotypes in the viewed range, sorted by similarity of alleles weighted by proximity to a central variant. The clustering view can highlight local patterns of linkage.

In the clustering display, each sample's phased diploid genotype is split into two independent haplotypes. Each haplotype is placed in a horizontal row of pixels; when the number of haplotypes exceeds the number of vertical pixels for the track, multiple haplotypes fall in the same pixel row and pixels are averaged across haplotypes.

Each variant is a vertical bar with white (invisible) representing the reference allele and black representing the non-reference allele(s). Tick marks are drawn at the top and bottom of each variant's vertical bar to make the bar more visible when most alleles are reference alleles. The vertical bar for the central variant used in clustering is outlined in purple. In order to avoid long compute times, the range of alleles used in clustering may be limited; alleles used in clustering have purple tick marks at the top and bottom.

The clustering tree is displayed to the left of the main image. It does not represent relatedness of individuals; it simply shows the arrangement of local haplotypes by similarity. When a rightmost branch is purple, it means that all haplotypes in that branch are identical, at least within the range of variants used in clustering.

Methods

The genomes of 2,504 individuals were sequenced using both whole-genome sequencing (mean depth = 7.4x) and targeted exome sequencing (mean depth = 65.7x). Quoting the Phase 3 publication (1000 Genomes Project Consortium, 2015):

In contrast to earlier phases of the project, we expanded analysis beyond bi-allelic events to include multi-allelic SNPs, indels, and a diverse set of structural variants (SVs). An overview of the sample collection, data generation, data processing, and analysis is given in Extended Data Fig. 1. Variant discovery used an ensemble of 24 sequence analysis tools (Supplementary Table 2), and machine-learning classifiers to separate high-quality variants from potential false positives, balancing sensitivity and specificity. Construction of haplotypes started with estimation of long-range phased haplotypes using array genotypes for project participants and, where available, their first degree relatives; continued with the addition of high confidence bi-allelic variants that were analysed jointly to improve these haplotypes; and concluded with the placement of multi-allelic and structural variants onto the haplotype scaffold one at a time.
See also:

Credits

Thanks to the 1000 Genomes Project for making these data freely available.

References

1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA et al. A global reference for human genetic variation. Nature. 2015 Oct 1;526(7571):68-74. PMID: 26432245