Smoothed PhyloCSF++ PhyloCSF++ +3 Track Settings
 
Smoothed PhyloCSF++ Strand + Frame 3

Track collection: Smoothed PhyloCSF++

+  Description
+  All tracks in this collection (6)

Display mode:   

Type of graph:
Track height: pixels (range: 12 to 60)
Data view scaling: Always include zero: 
Vertical viewing range: min:  max:   (range: -15 to 15)
Transform function:Transform data points by: 
Windowing function: Smoothing window:  pixels
Negate values:
Draw y indicator lines:at y = 0.0:    at y =
Graph configuration help
Data schema/format description and download
Assembly: S. cerevisiae Apr. 2011 (SacCer_Apr2011/sacCer3)
Data last updated at UCSC: 2022-01-20 17:47:24

PhyloCSF++ tracks

PhyloCSF++ tracks

Introduction

PhyloCSF++ scores the coding potential of genomic regions from a whole-genome multiple sequence alignment (MSA). The scores were computed with PhyloCSF++ [1], a fast and easy-to-use implementation of the method PhyloCSF [2, 3]. A more detailed description of the underlying method is available here.

Description

PhyloCSF++ raw tracks

The raw tracks (one for each of the six frames) score each codon. Green tracks represent the frames on the positive strand, red tracks frames on the negative strand. If a score is negative, it indicates that this codon is non-coding, and coding if the score is positive. The scores are unbounded and do not take the other codons in the region into account. Hence, we recommend in general to use the smoothened tracks (named "PhyloCSF++ +1", etc.).

PhyloCSF++ (smoothened) tracks

The scores in smoothened tracks are posterior probabilities, based on the raw tracks (smoothened with an HMM). They are normalized and are in an interval between [-15,+15]. Positive scores indicate codons in coding regions, negative scores indicate codons in non-coding regions.

PhyloCSF++ power

The power track gives a confidence on the PhyloCSF scores, the branch length sum. For each position in the genome it has a confidence score between [0,1] and corresponds to how many species were aligned at that position in the MSA (taking the phylogenetic distances of these species into account). In other words, if only very few and closely related species were aligned at a position, it has a lower confidence score.

Overview of tracks

The tracks can be downloaded here.

Species Assembly Model MSA Last updated Species subset (intersection of model and MSA)
Rat (Rattus norvegicus) rn6 100vertebrates 20way-multiz 2021-03-07 rn6, mm10, ailMel1, ornAna2, galGal5, melGal5, xenTro7, danRer10, micOch1, hg38, panTro5, rheMac8, cavPor3, felCat8, bosTau8, oryCun2, canFam3, monDom5
Fugu (Takifugu rubripes) fr3 / fugu5 100vertebrates 8way-multiz 2021-03-07 fr3, tetNig2, oreNil1, oryLat2, danRer7, gasAcu1, latCha1, gadMor1
Stickleback (Gasterosteus aculeatus) gasAcu1 100vertebrates 8way-multiz 2021-03-07 gasAcu1, danRer4, fr2, oryLat1, tetNig1, galGal3, mm8, hg18
Tarsier (Tarsius syrichta) tarSyr2 29mammals 20way-multiz 2021-03-07 tarSyr2, micMur1, tupBel1, otoGar3, hg38, panTro4, rheMac3, mm10, canFam3
Yeast (Saccharomyces cerevisiae) sacCer3 7yeast 7way-multiz 2021-03-07 sacCer3, sacPar, sacMik, sacKud, sacBay, sacCas, sacKlu

PhyloCSF++ vs. PhyloCSF

You might wonder what the difference is between these tools. Technically speaking they will give you the exact same scores (except very minor differences in the smoothened scores due to randomization in the initialization of the HMM).

PhyloCSF++ was developed to make tracks available for more species. Unfortunately, the original implementation of PhyloCSF does not allow to create tracks without doing additional coding. Furthermore PhyloCSF++ is faster, supports multi-threading and is available as static binaries, on bioconda and as C++ code (making it hopefully easier to compile and run for users). It also comes with additional tools so you can use these tracks to annotate the transcripts in your GFF/GTF files with PhyloCSF and confidence scores.

PhyloCSF++ was developed by a different group. Its underlying method is the only connection to PhyloCSF.

Citation

If you use the tracks or the software in your work, please consider citing the PhyloCSF++ paper [1]. For citing the original method, see [2].

References

  1. Pockrandt C et al. PhyloCSF++: A fast and user-friendly implementation of PhyloCSF with annotation tools. bioRxiv, 2021.
  2. Lin MF at al. PhyloCSF: a comparative genomics method to distinguish protein-coding and non-coding regions. Bioinformatics, 2011.
  3. Mudge JM et al. Discovery of high-confidence human protein-coding genes and exons by whole-genome PhyloCSF helps elucidate 118 GWAS loci. Genome Research, 2019.