Variants of Concern Track Settings
 
Mutations in Variants of Concern (VOC) or Interest (VOI) identified from GISAID sequences (Feb 5, 2021)   (All Variation and Repeats tracks)

Display mode:       Reset to defaults
 Select all subtracks
List subtracks: only selected/visible    all    ()
  Mutations↓1 Variants↓2   Track Name↓3  
dense
 Amino Acid  B.1.1.7 (UK)  B.1.1.7 VOC ("UK variant", 20I/507.V1) amino acid mutations in 9838 GISAID sequences (Feb 5, 2021)   Schema 
dense
 Amino Acid  B.1.351 (South Africa)  B.1.351 VOC ("South Africa variant", 20H/501Y.V2) amino acid mutations in 793 GISAID sequences (Feb 5, 2021)   Schema 
dense
 Amino Acid  B.1.429 (California)  B.1.429 VOI ("California variant", 20C, CA VUI1) amino acid mutations in 1360 GISAID sequences (Feb 5, 2021)   Schema 
dense
 Amino Acid  P.1 (Brazil)  P.1 VOC ("Brazil variant", 20J/501Y.V3) amino acid mutations in 78 GISAID sequences (Feb 5, 2021)   Schema 
dense
 Nucleotide  B.1.1.7 (UK)  B.1.1.7 VOC ("UK variant", 20I/507.V1) nucleotide mutations in 9838 GISAID sequences (Feb 5, 2021)   Schema 
dense
 Nucleotide  B.1.351 (South Africa)  B.1.351 VOC ("South Africa variant", 20H/501Y.V2) nucleotide mutations in 793 GISAID sequences (Feb 5, 2021)   Schema 
dense
 Nucleotide  B.1.429 (California)  B.1.429 VOI ("California variant", 20C, CA VUI1) nucleotide mutations in 1360 GISAID sequences (Feb 5, 2021)   Schema 
dense
 Nucleotide  P.1 (Brazil)  P.1 VOC ("Brazil variant", 20J/501Y.V3) nucleotide mutations in 78 GISAID sequences (Feb 5, 2021)   Schema 
    
Source data version: Released Feb. 17, 2021

Description

This track displays amino acid and nucleotide mutations in three SARS-CoV-2 Variants of Concern (B.1.1.7, B.1.351, and P.1) and one Variant of Interest (B.1.429), as defined in late January 2021. Mutations were identified from viral sequences at GISAID. Variant incidence and geographic distribution information is available from links to the Outbreak.info web resource on the track details pages.

The related track B.1.1.7 in USA displays a phylogenetic tree of the first B.1.1.7 variant sequences collected in the United States.

Display Conventions

Track colors are based on Pangolin lineage (Rambaut et al.) conventions at the time this track was generated. The Greek-letter names as defined by the World Health Organization (WHO) and published by the CDC are listed in the table.

ColorVariant WHO name
      P.1 Gamma
      B.1.1.7 Alpha
      B.1.351 Beta
      B.429 Epsilon

Mutations in the amino acid track are named with the format:

        [Reference amino acid][1-based coordinate in peptide][Alternate amino acid]. E.g., L452R

Mutations in the nucleotide track are named with the format:

        [Reference nucleotide][1-based coordinate in genome][Alternate nucleotide]. E.g., T22918G
Insertions and deletions in both tracks are named:
        [del/ins]_[1-based genomic coordinate of first affected nucleotide].  E.g., del_21991

Methods

For each virus variant, SARS-CoV-2 genome sequences containing all characteristic mutations of the lineage were downloaded from GISAID using the lineage search feature (restricting to complete, high-coverage genomes, and restricting to earliest sample collection dates when there were too many results for the download limit of 10,000 sequences per query). Sequences were aligned to the SARS-CoV-2 reference genome using the global_profile_alignment.sh script from the sarscov2phylo repository. Single-nucleotide substitutions were extracted from the alignment using the UCSC tool faToVcf (available on the download server here or from bioconda; also requires the SARS-CoV-2 reference sequence). Single nucleotide substitutions present at a frequency of at least 0.95 were retained while all others are discarded.

For indel detection, the Minimap2 suite of tools was used as follows:

        minimap2 --cs [Reference Sequence] [Set of Unaligned Sequences] | paftools.js call -L 10000 -
Indels present at a frequency of at least 0.85 were retained.

The results were then combined and formatted by lineageVariants.py. The entire pipeline was run using lineageVariants.sh.

Data Access

You can download the data files for this track from the UCSC Download Server. The data can be explored interactively with the Table Browser or the Data Integrator. The data can be accessed from scripts through our API.

Credits

This work is made possible by the open sharing of genetic data by research groups from all over the world. We gratefully acknowledge their contributions. We thank Rob Lanfear at the Australia National University for developing and maintaining the sarscov2phylo web resource. We also thank the Su, Wu, and Andersen labs at Scripps Research for creating the Outbreak.info resource. The lineageVariants scripts were developed and run at UCSC by Nick Keener.

References

Rambaut A, Holmes EC, O'Toole Á, Hill V, McCrone JT, Ruis C, du Plessis L, Pybus OG. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol. 2020 Nov;5(11):1403-1407. PMID: 32669681

Rambaut A, Loman N, Pybus O, Barclay W, Barrett J, Carabelli A, Connor T, Peacock T, Robertson DL, Volz E, et al. Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations. Virological. 2020 Dec 18.

Volz E, Mishra S, Chand M, Barrett JC, Johnson E, Geidelberg L, Hinsley WR, Laydon DJ, Dabrera G, O'Toole Á, et al. Transmission of SARS-CoV-2 Lineage B.1.1.7 in England: Insights from linking epidemiological and genetic data. Virological. 2020 Dec 31.

Tegally et al, December 21, 2020. Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in South Africa medRxiv preprint. Zhang al, January 20, 2021. Emergence of a novel SARS-CoV-2 strain in Southern California, USA medRxiv preprint.

Voloch et al, December 26, 2020. Genomic characterization of a novel SARS-CoV-2 lineage from Rio de Janeiro, Brazil medRxiv preprint.

Lanfear, Rob (2020). A global phylogeny of SARS-CoV-2 sequences from GISAID. Zenodo DOI: 10.5281/zenodo.3958883

Li Heng Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018 Sep 15;34(18):3094-3100. PMID: 29750242; PMC: PMC6137996

Gangavarapu, Karthik; Alkuzweny, Manar; Cano, Marco; Haag, Emily; Latif, Alaa Abdel; Mullen, Julia L.; Rush, Benjamin; Tsueng, Ginger; Zhou, Jerry; Andersen, Kristian G.; Wu, Chunlei; Su, Andrew I.; Hughes, Laura D. Outbreak.info