Schema for B.1.1.7 in USA - Early Lineage B.1.1.7 (a.k.a. 501Y.V1, VOC 202012/01) Sequences in the United States: SNVs and Deletions
  Database: wuhCor1    Primary Table: lineageB_1_1_7_US
VCF File: /gbdb/wuhCor1/lineageB_1_1_7_US/lineageB_1_1_7_US_first9.vcf.gz
Format description: The fields of a Variant Call Format data line
See the Variant Call Format specification for more details
chromAn identifier from the reference genome
posThe reference position, with the 1st base having position 1
idSemi-colon separated list of unique identifiers where available
refReference base(s)
altComma separated list of alternate non-reference alleles called on at least one of the samples
qualPhred-scaled quality score for the assertion made in ALT. i.e. give -10log_10 prob(call in ALT is wrong)
filterPASS if this position has passed all filters. Otherwise, a semicolon-separated list of codes for filters that fail
infoAdditional information encoded as a semicolon-separated series of short keys with optional comma-separated values
formatIf genotype columns are specified in header, a semicolon-separated list of of short keys starting with GT
genotypesIf genotype columns are specified in header, a tab-separated set of genotype column values; each value is a colon-separated list of values corresponding to keys in the format column

Sample Rows

B.1.1.7 in USA (lineageB_1_1_7_US) Track Description


Lineage B.1.1.7 (Rambaut et al.), also known as 20B/501Y.V1 (Nextstrain) and Variant of Concern (VOC) 202012/01 (Public Health England), spread rapidly in England in November and December 2020 (Volz et al.). It has a large number of mutations including non-synonymous substitutions and deletions, and as of Jan. 4 2021, over 7,000 sequences from 29 countries have been submitted to GISAID. The first confirmed B.1.1.7 sequence in the United States was announced Dec. 29 2020.

This track shows single-nucleotide substitutions and deletions from the SARS-CoV-2 reference genome in the B.1.1.7 consensus sequence and the first nine genome sequences in the United States that were assigned to B.1.1.7.

The track was generated using hgPhyloPlace, the Genome Browser's web front end to UShER (Turakhia et al., see Methods below). UShER places uploaded sequences in a global phylogenetic tree and also extracts subtrees showing each sample's local phylogenetic context. hgPhyloPlace generates a JSON file for each subtree which can be displayed using The first nine U.S. B.1.1.7 sequences have been placed in five clusters which correlate with geographic location. Here are links to view the subtrees at

Display Conventions

In "dense" mode, a vertical line is drawn at each position where there is a mutation. In "squish" and "pack" modes, the display shows a plot of all samples' mutations, with samples ordered using the phylogenetic tree in order to highlight patterns of linkage. "Full" display mode shows each mutation on its own row, ordered by position instead of lineage.

Each sample is placed in a horizontal row of pixels; when the number of samples exceeds the number of vertical pixels for the track, multiple samples fall in the same pixel row and pixels are averaged across samples.

Each mutation is a vertical bar at its position in the SARS-CoV-2 genome with white (invisible) representing the reference allele; the non-reference allele is shown in red if it changes the protein sequence of a gene, green if it falls within a gene but does not change the protein, and black if it does not fall within a gene. Tick marks are drawn at the top and bottom of each mutation's vertical bar to make the bar more visible when most alleles are reference alleles.

The phylogenetic tree showing inferred relationships between the samples is depicted in the left column of the display. Mousing over this will show the sample identifiers. With the default font size (or smaller), the leaves of the tree are labeled by sample identifiers. For larger font sizes, the track height will need to be increased in order for the labels to fit. The track height can be adjusted in the track controls, which can be reached by clicking on the gray button to the left of the tree or by right-clicking on the image.


B.1.1.7 consensus sequence was determined from COG-UK sequences assigned to B.1.1.7 with early sample collection dates. The nine U.S. B.1.1.7 genome sequences available as of Jan. 2, 2021 were downloaded from GenBank and GISAID and uploaded to hgPhyloPlace, which uses UShER (Turakhia et al.) to place uploaded SARS-CoV-2 genome sequences in a global phylogenetic tree, and generates custom tracks for the Genome Browser showing single-nucleotide substitutions in uploaded sequences. hgPhyloPlace ignores insertion/deletion mutations, working only with substitutions because those are adequate for inferring phylogeny; however, since B.1.1.7 has four deletions, three of which cause amino acid deletions from genes, minimap2 (Li) was used to align B.1.1.7 to the reference genome so that deletions could be displayed in addition to substitutions.

Data Access

The first sequences from California, Colorado, Florida and New York are available from GenBank:

All nine sequences are available from GISAID. GISAID data displayed in the Genome Browser are subject to GISAID's Terms and Conditions. SARS-CoV-2 genome sequences and metadata are available for download from GISAID EpiCoV™.

COG-UK releases daily updates of sequences and metadata; scroll down to the "Latest Sequence Data" section of the Data page for links.

The mutations in the B.1.1.7 consensus sequence and the sequences available from GenBank may be downloaded in Variant Call Format (VCF): lineageB_1_1_7_US_first7.vcf.gz

The mutation-annotated phylogenetic tree file used by UShER to place the sequences may be downloaded in order to run UShER locally:


This work is made possible by the open sharing of genetic data by research groups from all over the world. We gratefully acknowledge the authors and the originating laboratories where the clinical specimen or virus isolate was first obtained and the submitting laboratories, where sequence data have been generated and submitted to public databases, on which this research is based.


Rambaut A, Holmes EC, O'Toole Á, Hill V, McCrone JT, Ruis C, du Plessis L, Pybus OG. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol. 2020 Nov;5(11):1403-1407. PMID: 32669681

Rambaut A, Loman N, Pybus O, Barclay W, Barrett J, Carabelli A, Connor T, Peacock T, Robertson DL, Volz E, et al. Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations. Virological. 2020 Dec 18.

Volz E, Mishra S, Chand M, Barrett JC, Johnson E, Geidelberg L, Hinsley WR, Laydon DJ, Dabrera G, O'Toole Á, et al. Transmission of SARS-CoV-2 Lineage B.1.1.7 in England: Insights from linking epidemiological and genetic data. Virological. 2020 Dec 31.

Turakhia Y, Thornlow B, Hinrichs AS, De Maio M, Gozashti L, Lanfear R, Haussler D, Corbett-Detig R. Ultrafast Sample Placement on Existing Trees (UShER) Empowers Real-Time Phylogenetics for the SARS-CoV-2 Pandemic. bioRxiv. 2020 Sep 28.

Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018 Sep 15;34(18):3094-3100. PMID: 29750242; PMC: PMC6137996