CpG sites are regions of DNA where a cytosine nucleotide occurs next to a guanine nucleotide in the linear sequence of bases along its length. "CpG" stands for cytosine and guanine separated by a phosphate, which links the two nucleosides together in DNA. The "CpG" notation is used to distinguish a cytosine followed by guanine from a cytosine base paired to a guanine.
Frequency in vertebrates
CpG dinucleotides have long been observed to occur with a much lower frequency in the sequence of vertebrate genomes than would be expected due to random chance. For example, in a genome with 42% GC content (like the human genome), a pair of nucleotides consisting of cytosine followed by guanine would be expected to occur 0.21 * 0.21 = 4.41% of the time. The frequency of CpG dinucleotides in human genomes is 1% — less than one quarter of the expected frequency. Scarano et al. proposed that the CpG deficiency is due to an increased vulnerability of methylcytosines to transition mutation in genomes with CpG cytosine methylation.
There are regions of the DNA which have a higher concentration of CpG sites, known as CpG islands. Roughly half of all genes in mammalian genomes have CpG islands associated with the start of the gene. Because of this, the presence of a CpG island is used to help in the prediction and annotation of genes. These increased concentrations of CpGs might be associated with the decreased methylation of cytosines often observed in CpG islands — this could result in a reduced vulnerability to transition mutations and, consequently, a higher equilibrium density of CpGs surviving.
Methylation, silencing, and cancer
Methylation of CpG sites within the promoters of genes can lead to their silencing, a feature found in a number of human cancers (for example the silencing of tumour suppressor genes). Conversely, the hypomethylation of CpG sites has been associated with the over-expression of oncogenes within cancer cells.
- Jabbari K, Bernardi G (2004). "Cytosine methylation and CpG, TpG (CpA) and TpA frequencies". Gene. 333: 143–9. doi:10.1016/j.gene.2004.02.043. PMID 15177689.
- Jones PA, Laird PW (1999). "Cancer epigenetics comes of age". Nat. Genet. 21 (2): 163–7. doi:10.1038/5947. PMID 9988266.