We illustrate by finding what we can about the gene that controls lactose digestion in people.
We start with a simple search in PubMed for information on Lactose Intolerance.
1. Search PubMed with our term and click Search.
NCBI returns various PubMed articles that deal with lactose intolerance.
2. We then use the Find Related Data box on the bottom of the right had column. This will return the records in the Gene database for genes mentioned in the PubMed articles.
Change the database to Gene and click Find Item.
The resulting page includes the gene for lactase in humans, LCT. Be careful to pick the correct record by making sure that it is for the organism that you are interested in.
Clicking on the LCT [Homo sapiens (human)] will bring us to the Gene database record for LCT:
Alternatively, if we already knew the gene name, we could have started with a direct search of the Gene database:
In either case, we will display the LCT gene record:
We now have detailed information about the gene, including:
The full record is fairly large, so the Table of Contents on the right top column can be used as an index into the complete record.
The Table of Contents links to sections within the Gene record:
Genomic context: chromosomal location and Exon count
Genomic regions, transcripts, products: graphical view of gene features
Bibliography: related citations in PubMed
Variation: links to variants in ClinVar, dbVar
General gene information: markers, homology clone names, gene ontology
General protein information: names and accession numbers of protein products
NCBI Reference sequences: links to curated and annotated reference sequence records for the gene (accession number prefix NG), mRNA (NM) and protein (NP).
Navigating to the NCBI Reference Sequence, we can click on the RefSeq number to see the curated Nucleotide GenBank record for this gene.
GenBank record:
We now have the latest curated detailed information about the gene sequence, as well as, sequence data for it's products.
The Reference Sequence number is a combination of the Reference number and latest version of the sequenced data. It is important to use the RefSeq ID to search the Nucelotide database because the underlying data for sequence data changes. You always want to be using the latest sequence available.
The complete number of base pairs (bp) is also returned.
Various sequence data can be viewed further down in this record:
Clicking on any of the features will highlight the feature's sequence:
NCBI provides a very good tutorial to the GenBank fields on their web site: NCBI Sample GenBank Record