BioProjectA BioProject is a collection of biological data related to a large-scale research effort such as genome and transcriptome sequencing, epigenomic analyses, genome-wide association studies (GWAS) and variation analyses.
A BioProject results in high volume submissions to NCBI’s primary databases. A BioProject record provides users a single place to find links to the diverse data types generated for that project.
The NCBI Handbook (2nd edition): BioProjectDescribes the scope, history, data, access, report formats and related tools for BioProject.
BioProject can be searched directly by keyword or field:
Find BioProjects by: |
Search text example(s) |
Project data type |
"metagenome"[Project Data Type] |
Publication information |
19643200[PMID] |
Material used |
"material transcriptome"[Properties] |
Sample scope |
"scope environment"[Properties] |
Species name |
Escherichia coli[organism] |
Submitter organization, consortium, or center |
JGI[Submitter Organization] |
Taxonomic Class |
Insecta[organism] |
BioProject database identifier |
PRJNA33823[bioproject] or 33823[uid] or 33823[bioproject] |
Example: To find the Bioproject record for the 2014 metagenomic survey of the New York City subway system:
- Enter "New York City" AND subway in the BioProject search box and click Search
- Note filters on the left-hand side to narrow a search if too many results are retrieved
- Select urban metagenome (accession: PRJNA271013)
- Note the information about the project contained in the record; in particular, the links to associated publications and related resources.
- In the right-hand menu, note the links to Related information in other NCBI databases
BioProject records link reciprocally to their constituent BioSample and SRA records:
- Scroll down to Project Data: this project comprises 1457 BioSample records and 1572 SRA experiments.
BioSampleThe BioSample database contains descriptions of biological source materials used in experimental assays.
Typical examples of BioSamples include cell lines, primary tissue biopsies, individual organisms or environmental isolates.
The NCBI Handbook (2nd edition, 2013): BioSample.Describes the scope, history, data, and access points for BioSample.
BioSample can be searched independently by keyword or using field tags and filters.
Types of BioSamples: https://submit.ncbi.nlm.nih.gov/biosample/template/
Or, BioSample records can be accessed from their associated BioProject:
Example: To find all the BioSample records from the 2014 metagenomic survey of the New York City subway system:
- From the BioProject "urban metagenome" record under Project Data, right-click on Other Datasets: BioSample: 1457 to see all of the BioSample records for this BioProject
- To see details for each sample, at top left change Summary to Full
- To download all the Biosample records, at top right click on Send to: File
- Select Format: Full (text) (or choose another format) and click Create File
If a particular sample or location is of interest use the geolocation information, or use the PathoMap Website linked from the BioProject record.
On the PathoMap Website, select both Subway Lines and Data Points under Reference in the right-hand menu, and the organism of interest in the left-hand menu. Clicking on the sample location on the map will provide information about the sample.
NCBI tools are redundant and interlinked -- you can get to the same information in multiple ways.
Sequence Read Archive (SRA)NIH's primary archive of high-throughput sequencing data.
Part of the International Nucleotide Sequence Database Collaboration (INSDC) that includes at the NCBI Sequence Read Archive (SRA), the European Bioinformatics Institute (EBI), and the DNA Database of Japan (DDBJ). Data submitted to any of the three organizations are shared among them.
SRA HelpHow to submit, search, and download SRA data and work with it on Google Cloud Platform (GCP) and Amazon Web Services (AWS).
SRA can be searched independently, or SRA records associated with a specific BioProject or BioSample are linked from their respective records.
Each SRA record is given a unique accession number based on the source database (SRA, European Bioinformatics Institute (EBI), or DNA Data Bank of Japan (DDBJ)), and the type of record (Study, Sample, Experiment, Run):
- Study (e.g., the SRA record associated with a specific BioProject): SRP#, ERP#, or DRP#
- Sample (e.g.,the SRA record associated with a specific BioSample): SRS#, ERS#, or DRS#
- Experiment (e.g., the SRA record for a specific experiment or run(s)): SRX#, ERX#, or DRX#
- Run (e.g., the SRA record for a specific run): SRR#, ERR#, or DRR#
Example: To find all the SRA records and sequence data from the 2014 metagenomic survey of the New York City subway system:
- From the BioProject "urban metagenome" record under Project Data, right-click on Sequence Data: 1572. This will display the records for all 1572 SRA experiments for this BioProject.
To see the details for each experiment including the sequence data, click on its title.
Example: Shotgun sequencing of environmental sample: Sample P00189 (Accession SRX836091)
- Right-click on the title of the third record, Shotgun sequencing of environmental sample: Sample P00189
- Note the information about this experiment: the instrument used, links to the BioProject, BioSample, and SRA Study records, spot descriptor, and run information.
- In the Run table under Runs, right-click on the run accession number SRR1748784
- The Metadata tab of the Run browser shows information about the run.
- The Analysis tab shows a taxonomy of organisms identified from the sequence data.
- The Reads tab shows the sequence data.
- The Data Access tab provides links to access the sequence data.