Schema for N-SCAN - N-SCAN Gene Predictions
|
|
Database: equCab2 Primary Table: nscanGene Row Count: 19,612   Data last updated: 2008-08-19
Format description: A gene prediction with some additional info. On download server: MariaDB table dump directory
field | example | SQL type | info | description |
bin | 585 | smallint(5) unsigned | range | Indexing field to speed chromosome range queries. |
name | chr1.001.1 | varchar(255) | values | Name of gene (usually transcript_id from GTF) |
chrom | chr1 | varchar(255) | values | Reference sequence chromosome or scaffold |
strand | + | char(1) | values | + or - for strand |
txStart | 6739 | int(10) unsigned | range | Transcription start position (or end position for minus strand item) |
txEnd | 16275 | int(10) unsigned | range | Transcription end position (or start position for minus strand item) |
cdsStart | 7126 | int(10) unsigned | range | Coding region start (or end position for minus strand item) |
cdsEnd | 16275 | int(10) unsigned | range | Coding region end (or start position for minus strand item) |
exonCount | 13 | int(10) unsigned | range | Number of exons |
exonStarts | 6739,11198,11908,12331,1300... | longblob | | Exon start positions (or end positions for minus strand item) |
exonEnds | 7196,11261,11968,12406,1304... | longblob | | Exon end positions (or start positions for minus strand item) |
score | 0 | int(11) | range | score |
name2 | chr1.001 | varchar(255) | values | Alternate name (e.g. gene_id from GTF) |
cdsStartStat | cmpl | enum('none', 'unk', 'incmpl', 'cmpl') | values | Status of CDS start annotation (none, unknown, incomplete, or complete) |
cdsEndStat | cmpl | enum('none', 'unk', 'incmpl', 'cmpl') | values | Status of CDS end annotation (none, unknown, incomplete, or complete) |
exonFrames | 0,1,1,1,1,1,2,2,0,1,2,2,0, | longblob | | Reading frame of the start of the CDS region of the exon, in the direction of transcription (0,1,2), or -1 if there is no CDS region. |
|
| |
|
|
Connected Tables and Joining Fields
|
|
equCab2.nscanPep.name (via nscanGene.name)
| |
|
|
Sample Rows
|
|
bin | name | chrom | strand | txStart | txEnd | cdsStart | cdsEnd | exonCount | exonStarts | exonEnds | score | name2 | cdsStartStat | cdsEndStat | exonFrames |
---|
585 | chr1.001.1 | chr1 | + | 6739 | 16275 | 7126 | 16275 | 13 | 6739,11198,11908,12331,13000,13299,14082,14420,15297,15446,15640,15879,16197, | 7196,11261,11968,12406,13048,13354,14172,14484,15364,15570,15751,15967,16275, | 0 | chr1.001 | cmpl | cmpl | 0,1,1,1,1,1,2,2,0,1,2,2,0, |
585 | chr1.002.1 | chr1 | - | 26182 | 79537 | 26182 | 76729 | 10 | 26182,29598,32905,33246,55771,66546,70706,71047,76714,79171, | 26274,30201,32998,33373,55833,67068,70799,71174,76773,79537, | 0 | chr1.002 | cmpl | cmpl | 1,1,1,0,1,1,1,0,0,-1, |
586 | chr1.003.1 | chr1 | - | 132458 | 143876 | 132458 | 143850 | 9 | 132458,133461,134140,136900,138150,138683,139178,142530,143673, | 132643,133534,134328,137042,138327,138844,139328,142690,143876, | 0 | chr1.003 | cmpl | cmpl | 1,0,1,0,0,1,1,0,0, |
586 | chr1.004.1 | chr1 | - | 171785 | 210215 | 171785 | 210186 | 5 | 171785,183810,200411,201850,209836, | 172632,184727,200819,201989,210215, | 0 | chr1.004 | cmpl | cmpl | 2,0,0,2,0, |
73 | chr1.005.1 | chr1 | - | 234227 | 274259 | 234227 | 274051 | 11 | 234227,237017,258760,259225,266776,268647,269283,271614,272080,272773,273948, | 235295,237104,259100,259547,266971,268983,269601,271935,272386,273082,274259, | 0 | chr1.005 | cmpl | cmpl | 0,0,2,1,1,1,1,1,1,1,0, |
587 | chr1.006.1 | chr1 | + | 293524 | 294551 | 293594 | 294551 | 1 | 293524, | 294551, | 0 | chr1.006 | cmpl | cmpl | 0, |
587 | chr1.007.1 | chr1 | - | 332217 | 333396 | 332217 | 333153 | 1 | 332217, | 333396, | 0 | chr1.007 | cmpl | cmpl | 0, |
587 | chr1.008.1 | chr1 | + | 356462 | 358918 | 357982 | 358918 | 2 | 356462,357892, | 356726,358918, | 0 | chr1.008 | cmpl | cmpl | -1,0, |
587 | chr1.009.1 | chr1 | + | 366271 | 367317 | 366343 | 367317 | 2 | 366271,366883, | 366782,367317, | 0 | chr1.009 | cmpl | cmpl | 0,1, |
73 | chr1.010.1 | chr1 | - | 383043 | 397171 | 383043 | 396889 | 11 | 383043,383438,387319,387801,388467,390099,390550,391090,393238,396000,396864, | 383207,383551,387401,387898,388529,390190,390607,391171,393330,396075,397171, | 0 | chr1.010 | cmpl | cmpl | 1,2,1,0,1,0,0,0,1,1,0, |
|
Note: all start coordinates in our database are 0-based, not
1-based. See explanation
here.
| |
|
|
N-SCAN (nscanGene) Track Description
|
|
Description
This track shows gene predictions using the N-SCAN gene structure prediction
software provided by the Computational Genomics Lab at Washington University
in St. Louis, MO, USA.
Methods
N-SCAN PASA-EST
N-SCAN combines biological-signal modeling in the target genome sequence along
with information from a multiple-genome alignment to generate de novo
gene predictions. It extends the TWINSCAN target-informant genome pair to allow for
an arbitrary number of informant sequences as well as richer models of
sequence evolution. N-SCAN models the phylogenetic relationships between the
aligned genome sequences, context-dependent substitution rates, insertions,
and deletions.
For creating predictions on horse, N-SCAN uses human (hg18) as the informant.
N-SCAN PASA-EST combines EST alignments into N-SCAN. Similar to the conservation
sequence models in TWINSCAN, separate probability models are developed for EST
alignments to genomic sequence in exons, introns, splice sites and UTRs,
reflecting the EST alignment patterns in these regions. N-SCAN PASA-EST is more
accurate than N-SCAN while retaining the ability to discover novel genes to
which no ESTs align.
No manual annotation was performed to generate any of the gene models.
Credits
Thanks to Michael Brent's Computational Genomics Group at Washington
University St. Louis for providing these data.
Special thanks for this implementation of N-SCAN to Aaron Tenney in
the Brent lab, and Robert Zimmermann, currently at Max F. Perutz
Laboratories in Vienna, Austria.
References
Gross SS, Brent MR.
Using multiple alignments to improve gene prediction.
J Comput Biol. 2006 Mar;13(2):379-93.
PMID: 16597247
Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI, Maiti R, Ronning CM,
Rusch DB, Town CD et al.
Improving the Arabidopsis genome annotation using
maximal transcript alignment assemblies.
Nucleic Acids Res. 2003 Oct 1;31(19):5654-66.
PMID: 14500829; PMC: PMC206470
Korf I, Flicek P, Duan D, Brent MR.
Integrating genomic homology into gene structure prediction.
Bioinformatics. 2001;17 Suppl 1:S140-8.
PMID: 11473003
van Baren MJ, Brent MR.
Iterative gene prediction and pseudogene removal improves genome annotation.
Genome Res. 2006 May;16(5):678-85.
PMID: 16651666; PMC: PMC1457044
| |
|
|
|