Schema for N-SCAN - N-SCAN Gene Predictions

Home
Genomes
Genome Browser
Tools
Mirrors
- Euro/Asia Mirrors
- Mirroring Instructions
- US Server
- European Server
- Asian Server
Downloads
My Data
Projects
Help
About Us
- News
- Publications
- Blog
- Cite Us
- Credits
- Release Log
- Staff
- Conditions of Use
- Our History
- Jobs
- Licenses
- Contact Us

field

example

SQL type

info

description

bin

585

smallint(5) unsigned

range

Indexing field to speed chromosome range queries.

name

chr1.001.1

varchar(255)

values

Name of gene (usually transcript_id from GTF)

chrom

chr1

varchar(255)

values

Reference sequence chromosome or scaffold

strand

char(1)

values

+ or - for strand

txStart

6739

int(10) unsigned

range

Transcription start position (or end position for minus strand item)

txEnd

16275

int(10) unsigned

range

Transcription end position (or start position for minus strand item)

cdsStart

7126

int(10) unsigned

range

Coding region start (or end position for minus strand item)

cdsEnd

16275

int(10) unsigned

range

Coding region end (or start position for minus strand item)

exonCount

int(10) unsigned

range

Number of exons

exonStarts

6739,11198,11908,12331,1300...

longblob

Exon start positions (or end positions for minus strand item)

exonEnds

7196,11261,11968,12406,1304...

longblob

Exon end positions (or start positions for minus strand item)

score

int(11)

range

score

name2

chr1.001

varchar(255)

values

Alternate name (e.g. gene_id from GTF)

cdsStartStat

cmpl

enum('none', 'unk', 'incmpl', 'cmpl')

values

Status of CDS start annotation (none, unknown, incomplete, or complete)

cdsEndStat

cmpl

enum('none', 'unk', 'incmpl', 'cmpl')

values

Status of CDS end annotation (none, unknown, incomplete, or complete)

exonFrames

0,1,1,1,1,1,2,2,0,1,2,2,0,

longblob

Reading frame of the start of the CDS region of the exon, in the direction of transcription (0,1,2), or -1 if there is no CDS region.

equCab2.nscanPep.name (via nscanGene.name)

bin

name

chrom

strand

txStart

txEnd

cdsStart

cdsEnd

exonCount

exonStarts

exonEnds

score

name2

cdsStartStat

cdsEndStat

exonFrames

585

chr1.001.1

chr1

6739

16275

7126

16275

6739,11198,11908,12331,13000,13299,14082,14420,15297,15446,15640,15879,16197,

7196,11261,11968,12406,13048,13354,14172,14484,15364,15570,15751,15967,16275,

chr1.001

cmpl

0,1,1,1,1,1,2,2,0,1,2,2,0,

585

chr1.002.1

chr1

26182

79537

26182

76729

26182,29598,32905,33246,55771,66546,70706,71047,76714,79171,

26274,30201,32998,33373,55833,67068,70799,71174,76773,79537,

chr1.002

cmpl

1,1,1,0,1,1,1,0,0,-1,

586

chr1.003.1

chr1

132458

143876

132458

143850

132458,133461,134140,136900,138150,138683,139178,142530,143673,

132643,133534,134328,137042,138327,138844,139328,142690,143876,

chr1.003

cmpl

1,0,1,0,0,1,1,0,0,

586

chr1.004.1

chr1

171785

210215

171785

210186

171785,183810,200411,201850,209836,

172632,184727,200819,201989,210215,

chr1.004

cmpl

2,0,0,2,0,

chr1.005.1

chr1

234227

274259

234227

274051

234227,237017,258760,259225,266776,268647,269283,271614,272080,272773,273948,

235295,237104,259100,259547,266971,268983,269601,271935,272386,273082,274259,

chr1.005

cmpl

0,0,2,1,1,1,1,1,1,1,0,

587

chr1.006.1

chr1

293524

294551

293594

294551

293524,

294551,

chr1.006

cmpl

587

chr1.007.1

chr1

332217

333396

332217

333153

332217,

333396,

chr1.007

cmpl

587

chr1.008.1

chr1

356462

358918

357982

358918

356462,357892,

356726,358918,

chr1.008

cmpl

-1,0,

587

chr1.009.1

chr1

366271

367317

366343

367317

366271,366883,

366782,367317,

chr1.009

cmpl

0,1,

chr1.010.1

chr1

383043

397171

383043

396889

383043,383438,387319,387801,388467,390099,390550,391090,393238,396000,396864,

383207,383551,387401,387898,388529,390190,390607,391171,393330,396075,397171,

chr1.010

cmpl

1,2,1,0,1,0,0,0,1,1,0,

Description

This track shows gene predictions using the N-SCAN gene structure prediction software provided by the Computational Genomics Lab at Washington University in St. Louis, MO, USA.

Methods

N-SCAN PASA-EST

N-SCAN combines biological-signal modeling in the target genome sequence along with information from a multiple-genome alignment to generate de novo gene predictions. It extends the TWINSCAN target-informant genome pair to allow for an arbitrary number of informant sequences as well as richer models of sequence evolution. N-SCAN models the phylogenetic relationships between the aligned genome sequences, context-dependent substitution rates, insertions, and deletions.

For creating predictions on horse, N-SCAN uses human (hg18) as the informant.

N-SCAN PASA-EST combines EST alignments into N-SCAN. Similar to the conservation sequence models in TWINSCAN, separate probability models are developed for EST alignments to genomic sequence in exons, introns, splice sites and UTRs, reflecting the EST alignment patterns in these regions. N-SCAN PASA-EST is more accurate than N-SCAN while retaining the ability to discover novel genes to which no ESTs align.

No manual annotation was performed to generate any of the gene models.

Credits

Thanks to Michael Brent's Computational Genomics Group at Washington University St. Louis for providing these data.

Special thanks for this implementation of N-SCAN to Aaron Tenney in the Brent lab, and Robert Zimmermann, currently at Max F. Perutz Laboratories in Vienna, Austria.

References

Gross SS, Brent MR. Using multiple alignments to improve gene prediction. J Comput Biol. 2006 Mar;13(2):379-93. PMID: 16597247

Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003 Oct 1;31(19):5654-66. PMID: 14500829; PMC: PMC206470

Korf I, Flicek P, Duan D, Brent MR. Integrating genomic homology into gene structure prediction. Bioinformatics. 2001;17 Suppl 1:S140-8. PMID: 11473003

van Baren MJ, Brent MR. Iterative gene prediction and pseudogene removal improves genome annotation. Genome Res. 2006 May;16(5):678-85. PMID: 16651666; PMC: PMC1457044