Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

UC Berkeley’s library buildings are open! Learn more.

NCBI Bioinformatics Resources: An Introduction: BioProject & BioSample

BioProject & BioSample

BioProject can be searched directly by keyword or field:

Find BioProjects by: Search text example(s)
Project data type "metagenome"[Project Data Type]
Publication information 19643200[PMID]
Material used "material transcriptome"[Properties]
Sample scope "scope environment"[Properties]
Species name Escherichia coli[organism]
Submitter organization, consortium, or center JGI[Submitter Organization]
Taxonomic Class Insecta[organism]
BioProject database identifier PRJNA33823[bioproject] or 33823[uid] or 33823[bioproject]


BioProjects link reciprocally to their constituent BioSample records.

Example: To find the sequencing data from the 2014 metagenomic survey of the New York City subway system:

  1. Enter "New York City" AND subway in the BioProject search box and click Search
  2. Note filters on the left-hand side to narrow a search if too many results are retrieved
  3. Select urban metagenome (accession: PRJNA271013)
  4. Scroll down to Project Data: there are 1572 SRA experiments and 1457 BioSample records
  5. Right-click on 1457 to see all of the BioSample records
  6. To see details, at top left change Summary to Full
  7. To download all the Biosample records, at top right click on Send to: File
  8. Select Format: Full (text) (or choose another format) and click Create File

If a particular sample or location is of interest use the geolocation information, or use the PathoMap Website linked from the BioProject record.

On the PathoMap Website, select both Subway Lines and Data Points under Reference in the right-hand menu, and the organism of interest in the left-hand menu. Clicking on the sample location on the map will provide information about the sample.

Biosample can be searched by keyword or using field tags and filters. Field tags include:

Find BioSamples by: Search text example(s)
Accession number SAMN02048828[accn]
Attribute "cell type fibroblast"[Attribute]
Attribute name "cell line"[Attribute Name]
Author "John Smith"[Author]
Filter "biosample sra"[filter]
Organism Mus musculus[organism]
Properties "package migs/mims/mimarks water"[Properties]
Publication date 2013/1:2013/3[Publication date]
Submitter organization "bioinformatics unit, max planck institute for immunobiology and epigenetics"[Submitter Organization]


Types of BioSamples:

Example: To find BioSamples from Lactobacillus acidophilus bacteria for which SRA data is available:

  1. Enter Lactobacillus acidophilus[organism] AND biosample sra[filter] in the BioSample search box and click Search. Note the links to BioProject and Sequence Read Archive (SRA) for each result
  2. To limit to BioSamples from the USFDA, add AND USFDA[Submitter Organization] to the search
  3. For the first result, click on the title link to see the full BioSample record
  4. Under Related information in the right-hand discovery menu of the BioSample record, click on BioProject. This will show you the BioProjects with which this BioSample is associated: the US Food and Drug Administration’s “Live Microbial Ingredients Survey” and the “Refseq Prokaryotic Genome Annotation Project.” Click on the "Live Microbial Ingredients Survey" link.
  5. Under Project Data, note the links to SRA experiments, other BioSample, Assembly, Nucleotide, and Protein records, and links to publications in PubMed and PMC.
  6. The Navigate Across box on the right links to other BioProjects that include this organism
  7. See Genome information links to the record for Lactobacillus acidophilus in the Genome database; the Genome record has links to the reference genome, all the genomes for this organism that have been sequenced, publications related to the genome, and a dendrogram of the strains that have been sequenced.

NCBI tools are redundant and interlinked -- you can get to the same information in multiple ways.