Skip to main content

You can still access the UC Berkeley Library’s services and resources during the closure. Here’s how.

NCBI Bioinformatics Resources: An Introduction: Find sequences

From protein to sequence

Object: Starting with an organism and a protein, find a protein sequence and gene coding region.

Example: Find the protein sequence and gene coding region for pathogenicity factor listeriolysin O from the bacterium Listeria monocytogenes.

Searching for gene and protein information

Begin the search in Gene, because it has less redundancy than Protein (this same search in Protein retrieves over 700 records).

Search: Listeria monocytogenes[orgn] AND listeriolysin O[protein name]

(see Searchable fields in Gene)

Gene search

Two records for gene symbol hly are retrieved. We will examine the second record, which is associated with an NC_ accession number (specifying a complete genomic molecule that is usually a reference assembly; see RefSeq accession numbers and molecule types).

To find the gene coding sequence, look at the Genomic regions, transcripts, and products section or the NCBI Reference Sequences (RefSeq) section of the Gene record:


Clicking on the GenBank link displays the GenBank record in the Nucleotide database. The coding sequence for the gene hly can be found under CDS in the Features section of the record (outlined in red):

The GenBank record for this gene also shows its location on the chromosome and the translated protein sequence (outlined in blue). The protein sequence can also be found by clicking on the protein accession number in the Nucleotide record or in the RefSeq section of the Gene record.

Sample GenBank record