Skip to Main Content

NCBI Bioinformatics Resources: An Introduction: GEO

GEO

Gene Expression Omnibus logo

The three main goals of GEO are to:

  1. Provide a database of high-throughput functional genomic data (see Data organization)
  2. Support complete and well-annotated data deposits from the research community (see Submission guide)
  3. Allow users to query, locate, review and download studies and gene expression profiles of interest (see Query and analysis)

There are three types of GEO submitter records:

  1. A Platform record describes an array or sequencer and, for array-based platforms, a data table defining the array template. Sample records are linked from Platform records.
    Example Platform record
  2. A Sample record describes the sample source, the protocols used in its analysis, and the expression data derived from it. Samples can only reference a single Platform.
    Example Sample record
  3. A Series record links together a group of related Samples and describes a whole study.
    Example Series record

These three types of records are organized into two higher-level categories for querying and analysis:

  1. A DataSet represents a curated collection of biologically and statistically comparable GEO Samples. All Samples in a Dataset reference the same Platform. Datasets can be searched using the GEO Datasets database.
    Example DataSet record
  2. A Profile consists of the expression measurements for an individual gene across all Samples in a DataSet. Profiles can be searched using the GEO Profiles database.
    Example Profile record

Example: Find gene expression studies that use mouse as a model organism for melanoma on a specific platform.

  1. Go to GEO Datasets
  2. Search: mouse[organism] AND melanoma AND gpl1261[accession]. GPL1261 is a specific mouse array platform.
  3. The search retrieves the GPL1261 Platform record, Sample records, Series records, and DataSet records. In the left-hand filter menu under Entry type select DataSets. DataSets are curated collections of comparable GEO Samples.
  4. Select the DataSet Melanotransferrin effect on the brain. Mtf is highly expressed in melanomas and at lower levels in normal tissue.
  5. Click the Sample Subsets button to see that in this DataSet there are two wild and four knockout samples. As the name "knockout" implies, in the knockout samples the Mtf gene has been rendered inoperative.
  6. Clicking on the Expression Profiles button will show all of the expression profiles from this DataSet in GEO Profiles.
  7. Comparing samples: Mtf interacts with the gene Derlin 1 (gene symbol Derl1). Let's examine the difference in expression levels of Derl1 in wild versus the knockout samples.
  8. In the DataSet record click on the Data Analysis Tools button.
  9. In the Find gene name or symbol box enter derl1 and click Go.
  10. We will see the four GEO Profiles records that correspond to different spots on the array; three different sequences were used for Derl1.
  11. In general, Derl1 expression is higher in the wild than the knockout samples (the pink line in the third profile record indicates an unreliable result). Mtf must stimulate Derl1 expression.