Lung Transcriptome Affymetrix Probe Sequence Database

March 17, 2004

http://lungtranscriptome.bwh.harvard.edu/pseqdatabase.html

 

The LungTranscriptome is an online portal to genomics information.  It is freely available for academic use and we subscribe to the open access movement in both publishing and data sharing.  We are currently trying to expand the utility and scope of this site and hope that in the future it will house a microarray library as well as useful information on how to handle the enormous amount of informaton generated in large-scale expression profiling studies.  The Affymetrix Probe Sequence Database serves as an interface to the information recently published by our group in the paper titled, Increased Measurement Accuracy for Sequence Verified Probes

 

The database currently has three main components; a visualization tool that allows you to see where Affymetrix probes map onto REFSEQ mRNAs; a database downloads section that allows user access to the entire mapping files or a user-defined sub-section of those files; and finally an interface to files that indicate the quality of individual probe sets.  This document describes the user interface for each of these three components.


 

Visualization Tool

            This tool is available at http://lungtranscriptome.bwh.harvard.edu/cgi-bin/affyProbe.cgi.  The purpose of this tool is to graphically display where Affymetrix probes map onto specific RefSeq mRNAs.  It accepts as input a list of probe sets and their corresponding platform ID.  The following example ilustrates how to use the visualization tool and provides a glimpse at the intended output. 

 

The first step is to obtain a username and password.  Registration is free FOR ACADEMICS ONLY and has to be done only once.  If you are not an academic please contact us before using the site.  After registering and logging in, the user is presented with the following form:

 

 

 

The user should enter one or multiple Affymetrix Probe Set identifiers in the ŒProbe Sets¹ text area (each seperated by new lines or returns) and then select their corresponding platform with the drop down menu.  An example query input is probe set 1002_f_at on the Hu95 platform. Clicking the ŒSearch¹ button starts the program.

 

 

 

Depending on the number of inputs requested, the site will eventually return a page that is composed of a Query Summary Table and a Query Results Table.

 

 

The Query Summary Table simply indicates whether a probe set was located in our database or not.  If a probe set is not located in this search then it means that none of the probes for this probe set map to any high quality RefSeq mRNA. For the probe sets that are located, you can click on the link in this table and it will reorient the page to the information for this specific probe set. 

Each probe set has its own ³Summary² table that indicates the probe set title, the platform, the total number of probes on the chip for this probe set, the total number of Unigenes and the total number of unique RefSeqs the probes for this probe set mapped to.  For each individual identified RefSeq target, there is a table that contains information describing the RefSeq with hyperlinks to NCBI.  The last row of this table also indicates the probe numbers that map to this target (e.g. 1-11 indicates probe 1 to probe 11 inclusive, while 1,5-11 indicates probe 1 as well as probes 5 to 11). 

To the right of this table is the graphic that illustrates the precise location of the mapped probes (Note that the images are usually too larget to fit in the browser window, so the user will have to scroll to the right in order to see the entire picture).  The image has four main sections; the narrow black solid line at the top shows the length of the entire transcript; the solid blue line is the entire transcribed molecule (e.g. 5¹utr, Exons, 3¹utr); the solid orange line is coding region (e.g. Exons only); and the black rectangles are the locations of the individual probes.  

 

 


Database Download Tool

            This tool is available at http://lungtranscriptome.bwh.harvard.edu/cgi-bin/bulkdownloads.cgi.  The purpose of this tool is to aide users that want either the entire database of Refseq overlapping probes or a small subset of this database.  It accepts as input the desired Affymetrix platform and how/if the user wants to limit the output.  If the output will be limited, it also accepts the input file that contains the identifiers the user wants to retrieve as well as the desired output format.  Note that for this option we limit the number of identifiers to 25.  If a user wants more mapping information they should download the entire mapping database and make local requests.

 

             The first step is to obtain a username and password. Registration is free FOR ACADEMICS ONLY and has to be done only once. If you are not an academic please contact us before using the site.  After registering and logging in, the user is presented with the below form:

 

 

            The user should begin by choosing the desired platform and deciding if they want the entire RefSeq mapping or a defined subset of the mapping.  If the user wants the entire mappings they can proceed by clicking the ³Search² button.  However, if what is desired is a subset of the database then they must select a file at the ³Input File² prompt and then select their desired output type. The input file should be a text file that has one identifier on each line.  Valid identifiers are Affymetrix Probe Sets, Unigenes or RefSeq Sequence Identifiers.  The output types are HTML (a neat table), tab-delimited and comma-separated (for cutting and pasting into a text editor).  Below is an example usage in which the user submits an input file (called limit) consisting of two identifiers (NM_007925 and 94305_at) and selects to query the RefSeq mappings from the Mu74Av2 platform.  They selected the HTML output so the server will return a pretty HTML table.

 

 


Sequence Verified Downloads Tool

            This tool is available at http://lungtranscriptome.bwh.harvard.edu/cgi-bin/seqverified.cgi.  The purpose of this tool is to aide users that want a file that indicates whether a probe set on a given platform maps to RefSeq mRNA or not.

 

             The first step is to obtain a username and password. Registration is free FOR ACADEMICS ONLY and has to be done only once. If you are not an academic please contact us before using the site.  After registering and logging in, the user is presented with the below form:

 

            The user should simply click on the file that they wish to download.  The files are tab-delimited text files that have the probe set and a binary classifier that indicates that the probe set is sequence verified (1) or not-sequence verified (0).  For more information on sequence verfication see our recent publication (Mecham et al.).