Searchable Track Hub Quick Start Guide

Track Hubs are a method of displaying remotely-hosted annotation data quickly and flexibly on any UCSC assembly or remotely-hosted sequence. Making your annotation data searchable is an important improvement to the usability of your hub, especially if your annotations are not otherwise represented on the Browser. This Quick Start Guide will go through making a searchable track hub from a GFF3 file; converting to a genePred, bed, and bigBed, then creating a trix search index file. This example will be made with the new "useOneFile" feature to avoid any need for separate genome.txt and trackDb.txt files.

STEP 1: Downloads

Gather our settings and data files in a publicly-accessible directory (such as a university web-server, CyVerse, or Github). For more information on this, please see the hosting guide.

Copy the hub.txt file using wget, curl, or copy-paste:

wget http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubSearchable/hub.txt

Download some example GFF3 data from Gencode. This file happens to be long non-coding RNAs (lncRNAs):

wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/gencode.v32.long_noncoding_RNAs.gff3.gz

Next, you will need to download four Genome Browser utilities to convert the GFF3 file to bigBed format and run the search index command. Similar commands exist to convert other file types. These are operating system specific:

Utility Name MacOS Download Linux Download
gff3ToGenePred Download Download
genePredToBed Download Download
bedToBigBed Download Download
IxIxx Download Download

STEP 2: Format Data

In order to format the data, you will need to run a command to make those commands executable:

chmod +x gff3ToGenePred genePredToBed bedToBigBed IxIxx

Then run the first conversion from GFF3 to genePred, making sure to include -geneNameAttr=gene_name so that gene symbol is used as the name2 instead of ID number, and sorting by chromosome and position:

gff3ToGenePred -geneNameAttr=gene_name gencode.v32.long_noncoding_RNAs.gff3.gz stdout | sort -k2,2 -k4n,4n > gencode.v32.lncRNAs.genePred

Convert that genePred file to a bed file:

genePredToBed gencode.v32.lncRNAs.genePred gencode.v32.lncRNAs.bed

Compress and index that bed file into a bigBed format, adding the -extraIndex=name to allow EnstID searches:

bedToBigBed -extraIndex=name gencode.v32.lncRNAs.bed https://genome.ucsc.edu/goldenPath/help/hg38.chrom.sizes gencode.v32.lncRNAs.bb

If you would like to stop here, you will be able to display your bigBed hub and search for the names that were indexed into the bigBed file (EnstID). You will not be able to use the searchIndex and searchTrix trackDb setting, which require creating a key and value search index for your file as shown below.

STEP 3: Create Search Index

If you want to link your annotation names to anything other than the field referrenced in the -extraIndex command, you will need to make and index file. We will make an input file which will link one identifier (EnstID) with search terms composed of gene symbols and EnstIDs. Below is one example of a command to create an input file for the search indexing command:

cat gencode.v32.lncRNAs.genePred | awk '{print $1, $12, $1}' > input.txt

To examine or download that file, you can click here. Note that the first word is the key referenced in the BED file and the following search terms are associated aliases will be searchable to the location of the key. These search terms are case insensitive and allow partial word searches.

Finally you will make the index file (.ix) and the index of that index (.ixx) which helps the search run quickly even in large files.

ixIxx input.txt out.ix out.ixx

STEP 4: View and Search

Enter the URL to your hub on the My Hubs tab of the Track Data Hubs page. Alternately, you can enter your hub.txt URL in the following web address:

genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&hubUrl=YourUrlHere

If you would like to look at an already-made example, click the following link which includes hideTracks=1 to hide other tracks. After the link is a picture of what the hub should look like:

https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&hideTracks=1&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubSearchable/hub.txt

A display of the Searchable hub track

Once your hub displays, you should be able to type in a gene symbol or Enst ID and scroll down the results page until you see your search results.

Typing a search term in the search box

You can type your search term (fam87b) in the box above the ideogram and press . Note that it is not case sensitive. Scrolling to the bottom of the search results page, you will see your searchable hub keyword that was linked with your search term. Clicking into it will bring you to the position of your search term.

Search hit for fam87b Search results for fam87b

If you are having problems, be sure all your files are publicly-accessible and that your server accepts byte-ranges. You can check using the following command to verify "Accept-Ranges: bytes" displays:

curl -IL http://yourURL/hub.txt

Note that the Browser waits 5 minutes before checking for any changes to these files. When editing hub.txt, genomes.txt,and trackDb.txt, you can shorten this delay by adding udcTimeout=1 to your URL. For more information, see the Debugging and Updating Track Hubs section of the Track Hub User Guide.

Understanding hub.txt with useOneFile

The hub.txt file is a configuration file with names, descriptions, and paths to other files, The example below uses the setting useOneFile on to indicate that all the settings and paths appear in only the hub.txt file as opposed to having two additional settings files (genome.txt and trackDb.txt). To see the actual hub.txt file for the above example, click here.

The most important settings to make the hub searchable appear in the third section, in what would formerly be the trackDb.txt file. The searchIndex and searchTrix indicate which fields are indexed in the bigBed file and where to find the .ix file respectively.

hub MyHubsNameWithoutSpaces
shortLabel My Hub's Name
longLabel Name up to 80 characters versus shortLabel limited to 17 characters
genomesFile genomes.txt
email myEmail@address
descriptionUrl aboutMyHub.html
useOneFile on

genome assembly_database_2

track uniqueNameNoSpacesOrDots
type track_type
bigDataUrl track_data_url
shortLabel label 17 chars
longLabel long label up to 80 chars
visibiltiy hide/dense/squish/pack/full
searchIndex field,field2
searchTrix path/to/.ix/file

Additional Resources