Skip to main content

Introduction to NCBI Bioinformatics Resources: Gene Search

A concise introduction to the various bioinformatic data available from NCBI.

Searching for Gene Information

NCBI Gene


We illustrate by finding what we can about the gene that controls lactose digestion in people.

We start with a simple search in PubMed for information on Lactose Intolerance.

1. Search PubMed with our term and click Search.

     NCBI returns various PubMed articles that deal with lactose intolerance. 

2. We then use the Find Related Data box on the bottom of the right had column.  This will return the records in the Gene database for genes mentioned in the PubMed articles.

     Change the database to Gene and click Find Item.

 

The resulting page includes the gene for lactase in humans, LCTBe careful to pick the correct record by making sure that it is for the organism that you are interested in. 

Clicking on the LCT [Homo sapiens (human)] will bring us to the Gene database record for LCT:


Alternatively, if we already knew the gene name, we could have started with a direct search of the Gene database:


In either case, we will display the LCT gene record:

 

We now have detailed information about the gene, including:

  • Gene ID
  • Official Symbol and Full Name
  • Organism
  • Other names used for this gene
  • Other organism that have this gene
  • Summary of the role this gene plays

The full record is fairly large, so the Table of Contents on the right top column can be used as an index into the complete record.

The Table of Contents links to sections within the Gene record:

Genomic context: chromosomal location and Exon count
Genomic regions, transcripts, products: graphical view of gene features
Bibliography: related citations in PubMed
Variation: links to variants in ClinVar, dbVar
General gene information: markers, homology clone names, gene ontology
General protein information: names and accession numbers of protein products
NCBI Reference sequences: links to curated and annotated reference sequence records for the gene (accession number prefix NG), mRNA (NM) and protein (NP).

  • NG accession number links to the GenBank record, FASTA sequence, and Sequence viewer in the Nucleotide database.
  • NM accession number links to the mRNA record in the Nucleotide database.
  • NP accession number links to the protein record in the Protein database.

Navigating to the NCBI Reference Sequence,  we can click on the RefSeq number to see the curated Nucleotide GenBank record for this gene.

GenBank record:

We now have the latest curated detailed information about the gene sequence, as well as, sequence data for it's products. 

The Reference Sequence number is a combination of the Reference number and latest version of the sequenced data.  It is important to use the RefSeq ID to search the Nucelotide database because the underlying data for sequence data changes.  You always want to be using the latest sequence available.

The complete number of base pairs (bp) is also returned.

Various sequence data can be viewed further down in this record:

 

Clicking on any of the features will highlight the feature's sequence:

  • mRNA - will connect the individual exons into the resultant messanger RNA
  • exons - will highlight the sequence of this exon
  • CDS - Coding Sequence of the resultant protein product from this gene

 

NCBI provides a very good tutorial to the GenBank fields on their web site:   NCBI Sample GenBank Record