Genome Graphs User's Guide

Contents

Introduction
Formatting, uploading and importing data
Quick start
Displaying data in Genome Graphs
Viewing data in the Genome Browser
Viewing data in the Gene Sorter
Deleting data
Correlating data

Questions and feedback on this User's Guide are welcome.

User questions and answers on Genome Graphs and other topics are available in the Genome Browser mailing list.

Introduction

Genome Graphs is a tool for displaying genome-wide data sets such as the results of genome-wide SNP association studies, linkage studies and homozygosity mapping.

Using the Genome Graphs tool, you can:

To return to Genome Graphs from any other location on the Genome Browser website, use your browser's Back button, or click Home on the blue navigation bar, then click the Genome Graphs link.

Note that only the "standard" chromosomes are displayed in the Genome Graphs display; haplotype and mitochondrial chromosomes are not displayed.

This User's Guide is aimed at both the novice Genome Graphs user as well as the advanced user. If you are new to the Genome Graphs tool, read the Quick Start section to learn about the basics using some sample data. Advanced users may want to proceed directly to the section that addresses a particular area of functionality in detail.

Formatting, uploading and importing data

Formatting data

Genome Graphs allows you to upload data from files that reside on your computer. Several file formats are accepted by the program. For all formats there is a single line for each marker. Each line starts with information on the marker, and ends with the numerical values associated with that marker. The markers can be of one of the following types:

The marker-value pairs in each line of the file can be separated with a single space, a tab, or a comma. The file can contain multiple values for each marker. In that case, a separate graph will be created for each value column in the input file.

For example, chromosome base markers with only one value associated with the marker would be entered like this:

chrX 100000 1.23

dbSNP rsID markers with two values associated with the marker would be entered like this:

rs10218492 0.384 0.882

The Genome Graph program will map the marker IDs to the genome. In cases where the marker maps to more than one location in the genome, the value(s) in your input file will be associated with each location.

If the value associated with your marker is positive, do not include a sign (e.g., "+"). Include a sign ("-") only if the value is negative.

Note that markers can only be mapped to assemblies for which there already exists a track of the type that contains your marker type. You can not, for example, use dbSNP rsID markers for the cow genome, as it does not have a SNP track.

Uploading data

Once you have created your input file, you must upload it to Genome Graphs. From the main Genome Graphs page, choose the clade, genome, and assembly to which your data pertains. If you are unsure of the UCSC assembly name, check this page. Then, click the upload button to go to the upload page.

To upload a file in any of the supported formats, locate the file on your computer using the controls next to "file name", and then submit. The other controls on this form are optional, but can be used to enhance the display. In general, the controls that default to "best guess" will not need modification, since the default guess is almost always correct.

The controls for display min and max values and connecting lines may be adjusted later via the configuration page. Here is a description of each control:

Importing data

In addition to supplying your own genome-wide data files, you can also import existing database tables from an assembly into the Genome Graphs tool. Any table containing positional information can be imported. This includes tables of the following types: BED, PSL, wiggle, MAF, and bedGraph. Custom track tables can be imported as well. The tables made by Genome Graphs (chromGraph) can not be imported as they are already in the format used by the tool, thus no conversion is necessary. All tables imported into Genome Graphs will be converted into a custom track of type chromGraph using a window-size of 10,000 bases.

To import a table or custom track, choose the group, track, and table from the lists, then click the submit button. The other controls are optional, though completing them will enhance the display. The controls for display min and max values and connecting lines can be set later via the configuration page as well. Here is a description of each control.

Quick start

Use the examples in this section of the User's Guide to get a feel for how the tool works. Refer to other sections in this User's Guide for details and instructions for more advanced features.

The Genome Graphs tool comes pre-loaded with sample data. These sample data sets are from real-world genome-wide studies. Use these data sets to quickly see what the tool looks like when data is displayed. To view the sample data, choose a data set from the graph drop-down list, then choose your desired display color from the in drop-down list. The tool will display the data set directly above the chromosomes in Genome Graphs. Read on to learn how to customize the display.

Example #1 — SNPs on chr22

Follow these steps to display in Genome Graphs all of the highest quality SNPs on chromosome 22 for the hg18 assembly whose predicted functional role is "coding non-synonymous" (where there is a change in the peptide for the allele with respect to the reference assembly). Note that there are no SNPs on the p-arm of chromosome 22.

This data set is formatted in the "marker value" style. The markers are dbSNP rsIDs. The associated value is +1 if the SNP is on the positive strand, and -1 if the SNP is on the negative strand. Here are the first ten rows of the data file:

rs1007298       +1
rs1007863       +1
rs10154509      +1
rs10154678      +1
rs10154785      +1
rs1018448       +1
rs10212022      +1
rs1022478       +1
rs1042311       +1
rs1042435       +1 

Step 1. Upload the data into the Genome Graphs tool
Copy the entire sample data set into a text editor and save the file to your computer. This data set is associated with the human assembly: hg18 (Mar. 2006). Be sure to configure the Genome Graphs tool to use the hg18 assembly like so:

clade:		Vertebrate
genome:		Human
assembly:	Mar. 2006 

Upload the file into the Genome Graphs tool. You can configure each control on the upload page, or just leave them set to their default values. The upload process may take some time, as the program is actually mapping each rsID in the input file to its location(s) in the genome.

Step 2. Display the graph in Genome Graphs
Now that your input file has been uploaded to the server, you will want to display it in the Genome Graphs tool. To display your uploaded data, simply choose the graph name from the graph drop-down list, then choose your desired display color from the in drop-down list. Your graph will be displayed directly above the chromosomes in Genome Graphs. You should see the data plotted directly above chromosome 22.

Step 3. View the graph in the Genome Browser
From the Genome Graphs display, click anywhere on the graph or on chromosome 22 to open the Genome Browser for hg18 centered at that location on chr22. The graph will be drawn as a track near the top of the Genome Browser display.

Displaying data in Genome Graphs

Once you have uploaded your data, you will want to display it in the Genome Graphs tool. To display your uploaded data, simply choose the graph name from the graph drop-down list, then choose the color in which you would like it to be displayed from the in drop-down list. Your graph will be displayed directly above the chromosomes in Genome Graphs. Read on to learn how to customize the display.

Configuring the display

Configuring the graphs display
To go to the configuration page, click the configure button on the main Genome Graphs page. This is the page from which you can configure many overall aspects of the Genome Graphs display. Individual graphs can also be configured (see the next section for help on that).

On this page you will find the following controls:

When you have completed configuring the display, click the submit button to return to the Genome Graphs display.

Configuring individual graphs
Near the bottom of the Configuration page, you will see a list of the graphs that you have uploaded. Click on the hyperlinked graph name to configure that graph. This configuration pertains to the Genome Graphs view.

You can set the range of the display by editing the display min/max value values. This will restrict the Genome Graphs display for this graph to that data range. The axis will be labeled at 1/3 and 2/3 of the data range that you set.

If your data is sparse, you may want to draw lines between your data points. You can configure that by editing the draw connecting lines between markers separated by up to ... bases value. The default value is 25,000,000 bases.

When you have completed configuring the display, click the submit button twice to return to the Genome Graphs display.

Setting a significance threshold

Most genome-wide data has some amount of noise and is only interesting when the data values are above a certain value. You can set this value using the significance threshold input box. Enter a decimal number in this input box and click Enter. The display will now have a light gray line across the graph at this data value. If you have more than one graph displayed, the significance threshold only pertains to the graphs that contain the significance threshold in the displayed data range.

The significance threshold works in concert with the browse regions and sort genes buttons; it will affect the regions that are displayed once you click either of these two buttons.

To open the Genome Browser with a view of all of the regions in your graph that include data points that pass the significance threshold, click the browse regions button. This will open the Genome Browser with a navigation pane on the left side of the screen. This pane will contain links to all regions which pass your significance threshold. Note that if you are displaying more than one graph, the significant regions are based only on the first graph in the display list.

To view a list of genes which are in regions that pass the significance threshold, click the sort genes button. This will open the Gene Sorter with only the genes that are in significant locations with respect to your data.

If you would rather view all of your regions without restricting the output to only those regions that pass the significance threshold, simply delete any values from the significance threshold input box and click Enter before clicking browse regions.

Setting a data region

The data region is the span of bases that will be added to either side of the data points in your graphs which exceed the significance threshold. Set the data region by editing the region padding value on the configuration page. The combination of setting the data region and the significance threshold will affect two things:

For example, take a data set that contains the following data:

chr2 100100000 2.3 
chr2 100100500 4.5 
chr2 100101000 1.2

If you set the significance threshold at 4.0, one data point in the data set passes that threshold. If you then set the data range to 200, then the one significant data point will be padded on each side by 200 base pairs. In that case, the only resulting significant data region will be chr2:100,100,300-100,100,700.

If instead you set the data range to 2,000, then the one significant data point will be padded on each side by 2,000 base pairs. In that case, the resulting significant data region will be chr2:100,098,500-100,102,500.

Viewing data in the Genome Browser

To view your graphs in the Genome Browser, click the browse regions button. This will open the Genome Browser with your graph(s) displayed as track(s). You can configure and edit your track as you can any other track in the Genome Browser. In addition to the Genome Browser, you will also see a pane on the left-hand side, which contains links to all of the significant regions in your data. Please note that if you are displaying more than one graph in Genome Graphs, the significant regions are based only on the first graph in the display list.

You can also navigate to the Genome Browser by clicking directly on a graph or chromosome in Genome Graphs. The Genome Browser will open with a 1,000,000 bp window centered on the location on which you clicked.

Viewing data in the Gene Sorter

To view the set of genes that are in significant regions in your data, click the sort genes button. This will open the Gene Sorter with a filter to include only genes that are located in regions in your input data that are above the significance threshold. Please note that if you are displaying more than one graph in Genome Graphs, the significant genes are based only on the first graph in the display list.

If the graph was uploaded using markers, then a custom Gene Sorter column with the same name as the graph will be created. This column will list all markers for each gene that contain values above the significance threshold.

Deleting data

There are several ways to delete your data once it has been uploaded. If you are viewing your data as a track in the Genome Browser, you can click on the mini-button or track control for the track and delete the track using the Remove custom track button. You can also choose to reset your cart which will reset the browser interface settings to their defaults, as well as delete all custom tracks and data. Do this by clicking the "Reset All User Settings" under the top blue Genome Browser menu.

Your data will be saved on our server for at least 48 hours from the time you last access it, unless it is saved in a Session.

Correlating data sets

To calculate how well correlated with one another your data sets are, click the correlate button. This will calculate and display the correlation coefficient (R) among each of your data sets. R, also known as Pearson's correlation coefficient, is a measure of the extent that two graphs move together. The value of R ranges between -1 and 1. A positive R indicates that the graphs tend to move in the same direction, while a negative R indicates that they tend to move in opposite directions. R-Squared (which is indeed just R*R) measures how much of the variation in one graph can be explained by a linear dependence on the other graph. R-Squared ranges between 0 when the two graphs are independent to 1 when the graphs are completely dependent.

To return to the Genome Graphs, click the return to graphs button.