GC box gene transcriptions

Editor-In-Chief: Henry A. Hoff

A GC box is also known as a GSG box.^[1]

A "GC box is a distinct pattern of nucleotides found in the promoter region of some eukaryotic genes upstream of the TATA box and approximately 110 bases upstream from the transcription initiation site. It has a consensus sequence GGGCGG which is position dependent and orientation independent. The GC elements are bound by transcription factors and have similar functions to enhancers."^[2]

Boxes

A "repeating sequence of nucleotides that forms a transcription or a regulatory signal"^[3] is a box.

GC box theory

Def. "[a] sequence of contiguous guanine, guanine, guanine, cytosine, and guanine, in that order, along a DNA strand"^[4] is called a GC box.

GC elements

The GC elements are bound by transcription factors and have similar functions to enhancers.^[5]

Alu repeats

Karyotype from a female human lymphocyte (46, XX). Chromosomes were hybridized with a probe for Alu elements (green) and counterstained with TOPRO-3 (red). Alu elements were used as a marker for chromosomes and chromosome bands rich in genes. Credit: Andreas Bolzer, Gregor Kreth, Irina Solovei, Daniela Koehler, Kaan Saracoglu, Christine Fauth, Stefan Müller, Roland Eils, Christoph Cremer, Michael R. Speicher, Thomas Cremer.

"GC-rich genomic sequences [include those] such as Alu repeats."^[6]

"An Alu element is a short stretch [2-8 nucleotides] of DNA originally characterized by the action of the Alu (Arthrobacter luteus) restriction endonuclease.^[7] Alu elements of different kinds occur in large numbers in primate genomes. In fact, Alu elements are the most abundant transposable elements in the human genome."^[8]

"The Alu family is a family of repetitive elements in the human genome. Modern Alu elements are about 300 base pairs long and are therefore classified as short interspersed elements (SINEs) among the class of repetitive DNA elements. The typical structure is 5'Part A- A5TACA6 -Part B - PolyA Tail - 3', where Part A and Part B are similar peptide sequences, but of opposite direction."^[8]

There are over one million Alu elements interspersed throughout the human genome, and it is estimated that about 10.7% of the human genome consists of Alu sequences. However less than 0.5% are polymorphic.^[9]

Alu elements are retrotransposons and look like DNA copies made from RNA polymerase III-encoded RNAs. Alu elements do not encode for protein products and depend on LINE retrotransposons for their replication.^[10]

"Alu elements in primates form a fossil record that is relatively easy to decipher because Alu elements insertion events have a characteristic signature that is both easy to read and faithfully recorded in the genome from generation to generation. The study of Alu elements thus reveals details of ancestry because individuals will only share a particular Alu element insertion if they have a common ancestor."^[8]

Most human Alu element insertions can be found in the corresponding positions in the genomes of other primates, but about 7,000 Alu insertions are unique to humans.^[11]

"Full-length Alu elements are ~300 bp long and are commonly found in introns, 3 untranslated regions of genes and intergenic genomic regions".^[12] Human subfamilies include Y, Yc1, Yc2, Ya5, Ya5a2, Yb8, and Yb9.^[12] A source of simple sequence repeats is an A-rich region "that contains the sequence A₅TACA₆".^[12]

"[T]here are ~24 CpG positions in a new Alu insertion ... the decay of methylated CpG dinucleotides into TpG dinucleotides would also tend to increase the pair-wise divergence between Alu repeats over time, thereby decreasing the recombination between elements."^[12]

CpG sites

"CpG sites or CG sites are regions of DNA where a cytosine nucleotide occurs next to a guanine nucleotide in the linear sequence of bases along its length. "CpG" is shorthand for "—C—phosphate—G—", that is, cytosine and guanine separated by only one phosphate; phosphate links any two nucleosides together in DNA. The "CpG" notation is used to distinguish this linear sequence from the CG base-pairing of cytosine and guanine. The CpG notation can also be interpreted as the cytosine being 5 prime to the guanine base."^[13] "The "p" in CpG refers to the phosphodiester bond between the cytosine and the guanine, which indicates that the C and the G are next to each other in sequence, regardless of being single- or double- stranded. In a CpG site, both C and G are found on the same strand of DNA or RNA and are connected by a phosphodiester bond. This is a covalent bond between atoms, stable and permanent as opposed to the three hydrogen bonds established after base-pairing of C and G in opposite strands of DNA."^[6]

CpG islands

There are regions of the genome that have a higher concentration of CpG sites, known as CpG islands. Many genes in mammalian genomes have CpG islands associated with the start of the gene^[14] (promoter regions). Because of this, the presence of a CpG island is used to help in the prediction and annotation of genes.

"The usual formal definition of a CpG island is a region with at least 200 [base pair] bp, and a GC percentage that is greater than 50%, and with an observed-to-expected CpG ratio that is greater than 60%. The "observed-to-expected CpG ratio" is calculated by formula ((Num of CpG/(Num of C × Num of G)) × Total number of nucleotides in the sequence).^[15]

In mammalian genomes, CpG islands are typically 300-3,000 base pairs in length, and have been found in or near approximately 40% of promoters of mammalian genes.^[16] About 70% of human promoters have a high CpG content. Given the frequency of GC two-nucleotide sequences, the number of CpG dinucleotides is much lower than would be expected.^[17]

"CpG islands are characterized by CpG dinucleotide content of at least 60% of that which would be statistically expected (~4–6%), whereas the rest of the genome has much lower CpG frequency (~1%), a phenomenon called CG suppression. Unlike CpG sites in the coding region of a gene, in most instances the CpG sites in the CpG islands of promoters are unmethylated if the genes are expressed."^[6]

Methylation

"Cytosines in CpG dinucleotides can be methylated to form 5-methylcytosine. In mammals, methylating the cytosine within a gene can turn the gene off, a mechanism that is part of a larger field of science studying gene regulation that is called epigenetics. Enzymes that add a methyl group are called DNA methyltransferases."^[13]

In mammals, 70% to 80% of CpG cytosines are methylated.^[18]

"CpG dinucleotides have long been observed to occur with a much lower frequency in the sequence of vertebrate genomes than would be expected due to random chance. For example, in the human genome, which has a 42% GC content, a pair of nucleotides consisting of cytosine followed by guanine would be expected to occur 0.21 * 0.21 = 4.41% of the time. The frequency of CpG dinucleotides in human genomes is 1% — less than one-quarter of the expected frequency."^[13]

Unmethylated CpG sites can be detected by Toll-Like Receptor 9^[19] "(TLR 9) on plasmacytoid dendritic cells and B cells in humans. This is used to detect intracellular viral, fungal, and bacterial pathogen DNA."^[13]

Methylation is central to imprinting, along with histone modifications.^[20] Most of the methylation occurs a short distance from the CpG islands (at "CpG island shores") rather than in the islands themselves.^[21]

Methylation of CpG sites within the promoters of genes can lead to their silencing, a feature found in a number of human cancers (for example the silencing of tumor suppressor genes). In contrast, the hypomethylation of CpG sites has been associated with the over-expression of oncogenes within cancer cells.^[22]

Deamination

The CpG deficiency is due to an increased vulnerability of methylcytosines to spontaneously deaminate to thymine in genomes with CpG cytosine methylation.^[23]

Mutations

Alu elements are a common source of mutation in humans, but such mutations are often confined to non-coding regions where they have little discernible impact on the bearer.^[24]

The mutagenic effect of Alu^[25] and retrotransposons in general^[26] "has played a major role in the recent evolution of the human genome."^[8]

The first report of Alu-mediated recombination causing a prevalent inherited predisposition to cancer was a 1995 report about hereditary nonpolyposis colorectal cancer.^[27]

"The human diseases caused by Alu insertions include":^[12]

The following diseases have been associated with single-nucleotide DNA variations in Alu elements impacting transcription levels:^[28]

The ACE gene, encoding angiotensin-converting enzyme, has 2 common variants, one with an Alu insertion (ACE-I) and one with the Alu deleted (ACE-D). This variation has been linked to changes in sporting ability: the presence of the Alu element is associated with better performance in endurance-oriented events (e.g. triathlons), whereas its absence is associated with strength- and power-oriented performance^[29]

The opsin gene duplication which resulted in the re-gaining of trichromacy in Old World primates (including humans) is flanked by an Alu element,^[30] "implicating the role of Alu in the evolution of three colour vision."^[8]

Consensus sequences

"A GC box sequence, one of the most common regulatory DNA elements of eukaryotic genes, is recognized by the Spl transcription factor; its consensus sequence is represented as 5'-G/T G/A GGCG G/T G/A G/A C/T-3' [or 5′-KRGGCGKRRY-3′] (Briggs et al., 1986)."^[31]

Transcription start sites

"In promoters containing multiple GC boxes but lacking the TATAA box, transcription start sites may be single and specific, as observed in the nerve growth factor receptor gene (42) and the cellular retinol-binding protein gene (37), or there may be multiple heterogeneous start sites, such as those found in the c-myb (4), insulin receptor (45), and Ha-ras (21) genes. ... GC boxes are responsible for directing transcription from the major and the minor start sites. ... All TATAA-less promoters have at least two GC boxes".^[32]

"CpG islands typically occur at or near the transcription start site of genes, particularly housekeeping genes, in vertebrates.^[17] Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the cytosines in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time methylated cytosines tend to turn into thymines because of spontaneous deamination. While there is a special enzyme in human (Thymine-DNA glycosylase, or TDG) that specifically replaces T's from T/G mismatches, it is not sufficiently effective to prevent the relatively rapid mutation of the dinucleotides. The result is that CpGs are relatively rare. The existence of CpG islands is usually explained by the existence of selective forces for relatively high CpG content, or low levels of methylation in that genomic area, perhaps having to do with the regulation of gene expression. Recently a study showed that most CpG islands are a result of non-selective forces. ^[33]"^[6]

Transcription factors

"[A] GC box-binding factor is required for transcription and ... a truncated promoter containing one GC box is transcriptionally inactive (44). ... the DNA-protein interactions occurring at the GC boxes in the DHFR promoter are functionally distinct and that factors binding to the GC boxes must interact in a position-dependent manner."^[32]

Human genes

"A large subclass of polymerase II promoters lacks both TATAA and CCAAT sequence motifs but contains multiple GC boxes. This promoter class includes several housekeeping genes (e.g., the genes encoding dihydrofolate reductase [DHFR] ..., hydroxymethylglutaryl coenzyme A reductase [39], hypoxanthine guanine phosphoribosyltransferase [33], and adenosine deaminase [46]) [and] nonhousekeeping genes (e.g., the transforming growth factor alpha [9, 23], rat malic enzyme [36], human c-Ha-ras [21], epidermal growth factor receptor [22], and nerve growth factor receptor [42] genes)."^[32]

Some 12,000 SP-1 binding sites are found in the human genome.^[34]

Hypotheses

The GC box does not indicate the TSS for A1BG.
A1BG has no GC boxes in either promoter.
A1BG is not transcribed by an GC box.
A GC box does not participate in the transcription of A1BG.

GC box samplings

"A GC box [...] consensus sequence is represented as 5'-(G/T)(G/A)GGCG(G/T)(G/A)(G/A)C/T-3' [or 5′-KRGGCGKRRY-3′] (Briggs et al., 1986)."^[31]

For the Basic programs (starting with SuccessablesGC.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

negative strand in the negative direction is SuccessablesGC--.bas, looking for (G/T)(G/A)GGCG(G/T)(G/A)(G/A)(C/T), 0,
negative strand in the positive direction is SuccessablesGC-+.bas, looking for (G/T)(G/A)GGCG(G/T)(G/A)(G/A)(C/T), 0,
positive strand in the negative direction is SuccessablesGC+-.bas, looking for (G/T)(G/A)GGCG(G/T)(G/A)(G/A)(C/T), 2, TGGGCGTGGT at 1898, TGGGCGTGGT at 3048,
positive strand in the positive direction is SuccessablesGC++.bas, looking for (G/T)(G/A)GGCG(G/T)(G/A)(G/A)(C/T), 0,
complement, negative strand, negative direction is SuccessablesGCc--.bas, looking for (A/C)(C/T)CCGC(A/C)(C/T)(C/T)(A/G), 2, ACCCGCACCA at 1898, ACCCGCACCA at 3048,
complement, negative strand, positive direction is SuccessablesGCc-+.bas, looking for (A/C)(C/T)CCGC(A/C)(C/T)(C/T)(A/G), 0,
complement, positive strand, negative direction is SuccessablesGCc+-.bas, looking for (A/C)(C/T)CCGC(A/C)(C/T)(C/T)(A/G), 0,
complement, positive strand, negative direction is SuccessablesGCc++.bas, looking for (A/C)(C/T)CCGC(A/C)(C/T)(C/T)(A/G), 0,
inverse complement, negative strand, negative direction is SuccessablesGCci--.bas, looking for (A/G)(C/T)(C/T)(A/C)CGCC(C/T)(A/C), 1, ACTCCGCCCA at 3092,
inverse complement, negative strand, positive direction is SuccessablesGCci-+.bas, looking for (A/G)(C/T)(C/T)(A/C)CGCC(C/T)(A/C), 0,
inverse complement, positive strand, negative direction is SuccessablesGCci+-.bas, looking for (A/G)(C/T)(C/T)(A/C)CGCC(C/T)(A/C), 1, GCTCCGCCTC at 1505,
inverse complement, positive strand, positive direction is SuccessablesGCci++.bas, looking for (A/G)(C/T)(C/T)(A/C)CGCC(C/T)(A/C), 0,
inverse, negative strand, negative direction, is SuccessablesGCi--.bas, looking for (C/T)(G/A)(G/A)(G/T)GCGG (G/A)(G/T), 1, CGAGGCGGAG at 1505,
inverse, negative strand, positive direction, is SuccessablesGCi-+.bas, looking for (C/T)(G/A)(G/A)(G/T)GCGG (G/A)(G/T), 0,
inverse, positive strand, negative direction, is SuccessablesGCi+-.bas, looking for (C/T)(G/A)(G/A)(G/T)GCGG (G/A)(G/T), 1, TGAGGCGGGT at 3092,
inverse, positive strand, positive direction, is SuccessablesGCi++.bas, looking for (C/T)(G/A)(G/A)(G/T)GCGG (G/A)(G/T), 0.

Acknowledgement examples

The content on this page was first contributed by: Henry A. Hoff.

Initial content for this page in some instances came from Wikiversity.

References

↑ Lundin, M.; Nehlin, J. O.; Ronne, H. (1994-03-01). "Importance of a flanking AT-rich region in target site recognition by the GC box-binding zinc finger protein MIG1". Molecular and Cellular Biology. 14 (3): 1979–1985. doi:10.1128/MCB.14.3.1979. PMID 8114729.
↑ Zuperbri (10 June 2012). "GC box". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2013-06-15.
↑ 74.100.224.95 (10 January 2010). "Box (disambiguation)". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2013-06-15.
↑ "GC box". San Francisco, California: Wikimedia Foundation, Inc. November 10, 2012. Retrieved 2013-01-27.
↑ Klug WS, Cummings MR, Spencer CA, Palladina, MA (2009). Concepts of Genetics: Ninth Edition. San Francisco: Pearson Benjamin Cummings. pp. 463–464. ISBN 978-0-321-54098-0.
↑ ^6.0 ^6.1 ^6.2 ^6.3 "CpG island". San Francisco, California: Wikimedia Foundation, Inc. October 2, 2012. Retrieved 2013-02-07.
↑ Schmid CW, Deininger PL (1975). "Sequence organization of the human genome". Cell. 6: 345–358. doi:10.1016/0092-8674(75)90184-1. PMID 1052772.
↑ ^8.0 ^8.1 ^8.2 ^8.3 ^8.4 "Alu element". San Francisco, California: Wikimedia Foundation, Inc. February 6, 2013. Retrieved 2013-02-07.
↑ Roy-Engel AM, Carroll ML, Vogel E; et al. (September 2001). "Alu insertion polymorphisms for the study of human genomic diversity". Genetics. 159 (1): 279–90. PMID 11560904.
↑ Kramerov DA, Vassetzky NS (2005). "Short retroposons in eukaryotic genomes" (PDF). Int. Rev. Cytol. 247: 165–221. doi:10.1016/S0074-7696(05)47004-7. PMID 16344113.
↑ And Analysis Consortium, The Chimpanzee Sequencing (September 2005). "Initial sequence of the chimpanzee genome and comparison with the human genome". Nature. 437 (7055): 69–87. doi:10.1038/nature04072. PMID 16136131.
↑ ^12.0 ^12.1 ^12.2 ^12.3 ^12.4 Batzer MA, Deininger PL (May 2002). "Alu repeats and human genomic diversity" (PDF). Nat. Rev. Genet. 3 (5): 370–9. doi:10.1038/nrg798. PMID 11988762.
↑ ^13.0 ^13.1 ^13.2 ^13.3 "CpG site". San Francisco, California: Wikimedia Foundation, Inc. January 30, 2013. Retrieved 2013-02-07.
↑ Hartl DL, Jones EW (2005). Genetics: Analysis of Genes and Genomes (6 ed.). Missisauga: Jones & Bartlett, Canada. p. 477. ISBN 0-7637-1511-5.
↑ Gardiner-Garden M, Frommer M (1987). "CpG islands in vertebrate genomes". Journal of Molecular Biology. 196 (2): 261–82. doi:10.1016/0022-2836(87)90689-9. PMID 3656447.
↑ Fatemi M, Pao MM, Jeong S, Gal-Yam EN, Egger G, Weisenberger DJ, Jones PA (2005). "Footprinting of mammalian promoters: use of a CpG DNA methyltransferase revealing nucleosome positions at a single molecule level". Nucleic Acids Res. 33 (20): e176. doi:10.1093/nar/gni180. PMC 1292996. PMID 16314307.
↑ ^17.0 ^17.1 Saxonov S, Berg P, Brutlag DL (2006). "A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters". Proc Natl Acad Sci USA. 103 (5): 1412–7. doi:10.1073/pnas.0510310103. PMC 1345710. PMID 16432200.
↑ Jabbari K, Bernardi G (May 2004). "Cytosine methylation and CpG, TpG (CpA) and TpA frequencies". Gene. 333: 143–9. doi:10.1016/j.gene.2004.02.043. PMID 15177689.
↑ Ramirez-Ortiz ZG, Specht CA, Wang JP, Lee CK, Bartholomeu DC, Gazzinelli RT, Levitz SM (2008). "Toll-like receptor 9-dependent immune activation by unmethylated CpG motifs in Aspergillus fumigatus DNA". Infect Immun. 76 (5): 2123–9. doi:10.1128/IAI.00047-08. PMID 18332208.
↑ Feil R, Berger F (2007). "Convergent evolution of genomic imprinting in plants and mammals". Trends Genet. 23 (4): 192–9. doi:10.1016/j.tig.2007.02.004. PMID 17316885.
↑ Irizarry RA, Ladd-Acosta C, Wen B, Wu Z, Montano C, Onyango P, Cui H, Gabo K, Rongione M, Webster M, Ji H, Potash JB, Sabunciyan S, Feinberg AP (2009). "The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores". Nature Genetics. 41 (2): 178–86. PMID 19151715.
↑ Jones PA, Laird PW (February 1999). "Cancer epigenetics comes of age". Nat. Genet. 21 (2): 163–7. doi:10.1038/5947. PMID 9988266.
↑ Scarano E, Iaccarino M, Grippo P, Parisi E (1967). "The heterogeneity of thymine methyl group origin in DNA pyrimidine isostichs of developing sea urchin embryos". Proceedings of the National Academy of Sciences USA. 57 (5): 1394–400. doi:10.1073/pnas.57.5.1394. PMC 224485. PMID 5231746.
↑ International Human Genome Sequencing Consortium (2001). "Initial sequencing and analysis of the human genome". Nature. 409 (6822): 860–921. doi:10.1038/35057062. PMID 11237011.
↑ Shen S, Lin L, Cai JJ, Jiang P, Kenkel EJ, Stroik MR, Sato S, Davidson BL, Xing Y (2011). "Widespread establishment and regulatory impact of Alu exons in human genes". PNAS. 108 (7): 2837–42. doi:10.1073/pnas.1012834108.
↑ Cordaux R, Batzer MA (2009). "The impact of retrotransposons on human genome evolution" (PDF). Nature Reviews Genetics. 10: 691–703. doi:10.1038/nrg2640. PMID 19763152.
↑ Nyström-Lahti M, Kristo P, Nicolaides NC; et al. (November 1995). "Founding mutations and Alu-mediated recombination in hereditary colon cancer". Nat. Med. 1 (11): 1203–6. doi:10.1038/nm1195-1203. PMID 7584997.
↑ "SNPedia: SNP in the promoter region of the myeloperoxidase MPO gene".
↑ Puthucheary Z, Skipworth J, Rawal J, Loosemore M, Van Someren K, Montgomery H (2011). "The ACE Gene and Human Performance: 12 Years On". Sports Medicine. 41: 433–448. doi:10.2165/11588720-000000000-00000. PMID 21615186.
↑ Dulai KS, Von Dornum M, Mollon JD, Hunt DM (1999). "The Evolution of Trichromatic Color Vision by Opsin Gene Duplication in New World and Old World Primates". Genome Research. 9 (7): 629–638. doi:10.1101/gr.9.7.629. PMID 10413401.
↑ ^31.0 ^31.1 H Imataka, K Sogawa, KI Yasumoto, Y Kikuchi, K Sasano, A Kobayashi, M Hayami, and Y Fujii-Kuriyama (October 1992). "Two regulatory proteins that bind to the basic transcription element (BTE), a GC box sequence in the promoter region of the rat P-4501A1 gene" (PDF). The EMBO Journal. 11 (10): 3663–71. PMID 1356762. Retrieved 2013-01-27.
↑ ^32.0 ^32.1 ^32.2 Michael C. Blake, Robert C. Jambou, Andrew G. Swick, Jeanne W. Kahn, and Jane Clifford Azizkhan (December 1990). "Transcriptional Initiation Is Controlled by Upstream GC-Box Interactions in a TATAA-Less Promoter" (PDF). Molecular and Cellular Biology. 10 (12): 6632–41. doi:10.1128/MCB.10.12.6632. PMID 2247077. Retrieved 2013-01-27.
↑ Cohen N, Kenigsberg E, Tanay A (2011). "Primate CpG Islands Are Maintained by Heterogeneous Evolutionary Regimes Involving Minimal Selection". Cell. 145 (5): 773–86. doi:10.1016/j.cell.2011.04.024. PMID 21620139.
↑ Zhang, Bosen; Song, Liwei; Cai, Jiali; Li, Lei; Xu, Hong; Li, Mengying; Wang, Jiamin; Shi, Minmin; Chen, Hao; Jia, Hao; Hou, Zhaoyuan (2019). "The LIM protein Ajuba/SP1 complex forms a feed forward loop to induce SP1 target genes and promote pancreatic cancer cell proliferation". Journal of Experimental & Clinical Cancer Research. 38 (1): 205. doi:10.1186/s13046-019-1203-2. ISSN 1756-9966. PMID 31101117.

External links

[Lundin-1] Lundin, M.; Nehlin, J. O.; Ronne, H. (1994-03-01). "Importance of a flanking AT-rich region in target site recognition by the GC box-binding zinc finger protein MIG1". Molecular and Cellular Biology. 14 (3): 1979–1985. doi:10.1128/MCB.14.3.1979. PMID 8114729.

[GCBox-2] Zuperbri (10 June 2012). "GC box". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2013-06-15.

[BoxDisambiguation-3] 74.100.224.95 (10 January 2010). "Box (disambiguation)". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2013-06-15.

[GCBoxWikt-4] "GC box". San Francisco, California: Wikimedia Foundation, Inc. November 10, 2012. Retrieved 2013-01-27.

[Klug-5] Klug WS, Cummings MR, Spencer CA, Palladina, MA (2009). Concepts of Genetics: Ninth Edition. San Francisco: Pearson Benjamin Cummings. pp. 463–464. ISBN 978-0-321-54098-0.

[CpGIsland-6] 6.0 ^6.1 ^6.2 ^6.3 "CpG island". San Francisco, California: Wikimedia Foundation, Inc. October 2, 2012. Retrieved 2013-02-07.

[Schmid-7] Schmid CW, Deininger PL (1975). "Sequence organization of the human genome". Cell. 6: 345–358. doi:10.1016/0092-8674(75)90184-1. PMID 1052772.

[AluElement-8] 8.0 ^8.1 ^8.2 ^8.3 ^8.4 "Alu element". San Francisco, California: Wikimedia Foundation, Inc. February 6, 2013. Retrieved 2013-02-07.

[Engel-9] Roy-Engel AM, Carroll ML, Vogel E; et al. (September 2001). "Alu insertion polymorphisms for the study of human genomic diversity". Genetics. 159 (1): 279–90. PMID 11560904.

[Kramerov-10] Kramerov DA, Vassetzky NS (2005). "Short retroposons in eukaryotic genomes" (PDF). Int. Rev. Cytol. 247: 165–221. doi:10.1016/S0074-7696(05)47004-7. PMID 16344113.

[11] And Analysis Consortium, The Chimpanzee Sequencing (September 2005). "Initial sequence of the chimpanzee genome and comparison with the human genome". Nature. 437 (7055): 69–87. doi:10.1038/nature04072. PMID 16136131.

[Batzer-12] 12.0 ^12.1 ^12.2 ^12.3 ^12.4 Batzer MA, Deininger PL (May 2002). "Alu repeats and human genomic diversity" (PDF). Nat. Rev. Genet. 3 (5): 370–9. doi:10.1038/nrg798. PMID 11988762.

[CpGSite-13] 13.0 ^13.1 ^13.2 ^13.3 "CpG site". San Francisco, California: Wikimedia Foundation, Inc. January 30, 2013. Retrieved 2013-02-07.

[Hartl-14] Hartl DL, Jones EW (2005). Genetics: Analysis of Genes and Genomes (6 ed.). Missisauga: Jones & Bartlett, Canada. p. 477. ISBN 0-7637-1511-5.

[Gardiner-Garden1987-15] Gardiner-Garden M, Frommer M (1987). "CpG islands in vertebrate genomes". Journal of Molecular Biology. 196 (2): 261–82. doi:10.1016/0022-2836(87)90689-9. PMID 3656447.

[Fatemi2005-16] Fatemi M, Pao MM, Jeong S, Gal-Yam EN, Egger G, Weisenberger DJ, Jones PA (2005). "Footprinting of mammalian promoters: use of a CpG DNA methyltransferase revealing nucleosome positions at a single molecule level". Nucleic Acids Res. 33 (20): e176. doi:10.1093/nar/gni180. PMC 1292996. PMID 16314307.

[Saxonov2006-17] 17.0 ^17.1 Saxonov S, Berg P, Brutlag DL (2006). "A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters". Proc Natl Acad Sci USA. 103 (5): 1412–7. doi:10.1073/pnas.0510310103. PMC 1345710. PMID 16432200.

[Jabbari2004-18] Jabbari K, Bernardi G (May 2004). "Cytosine methylation and CpG, TpG (CpA) and TpA frequencies". Gene. 333: 143–9. doi:10.1016/j.gene.2004.02.043. PMID 15177689.

[Ramirez-19] Ramirez-Ortiz ZG, Specht CA, Wang JP, Lee CK, Bartholomeu DC, Gazzinelli RT, Levitz SM (2008). "Toll-like receptor 9-dependent immune activation by unmethylated CpG motifs in Aspergillus fumigatus DNA". Infect Immun. 76 (5): 2123–9. doi:10.1128/IAI.00047-08. PMID 18332208.

[Feil2007-20] Feil R, Berger F (2007). "Convergent evolution of genomic imprinting in plants and mammals". Trends Genet. 23 (4): 192–9. doi:10.1016/j.tig.2007.02.004. PMID 17316885.

[Irizarry-21] Irizarry RA, Ladd-Acosta C, Wen B, Wu Z, Montano C, Onyango P, Cui H, Gabo K, Rongione M, Webster M, Ji H, Potash JB, Sabunciyan S, Feinberg AP (2009). "The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores". Nature Genetics. 41 (2): 178–86. PMID 19151715.

[Jones1999-22] Jones PA, Laird PW (February 1999). "Cancer epigenetics comes of age". Nat. Genet. 21 (2): 163–7. doi:10.1038/5947. PMID 9988266.

[Scarano-23] Scarano E, Iaccarino M, Grippo P, Parisi E (1967). "The heterogeneity of thymine methyl group origin in DNA pyrimidine isostichs of developing sea urchin embryos". Proceedings of the National Academy of Sciences USA. 57 (5): 1394–400. doi:10.1073/pnas.57.5.1394. PMC 224485. PMID 5231746.

[IHGSC-24] International Human Genome Sequencing Consortium (2001). "Initial sequencing and analysis of the human genome". Nature. 409 (6822): 860–921. doi:10.1038/35057062. PMID 11237011.

[Shen-25] Shen S, Lin L, Cai JJ, Jiang P, Kenkel EJ, Stroik MR, Sato S, Davidson BL, Xing Y (2011). "Widespread establishment and regulatory impact of Alu exons in human genes". PNAS. 108 (7): 2837–42. doi:10.1073/pnas.1012834108.

[Cordaux-26] Cordaux R, Batzer MA (2009). "The impact of retrotransposons on human genome evolution" (PDF). Nature Reviews Genetics. 10: 691–703. doi:10.1038/nrg2640. PMID 19763152.

[Nystrom-27] Nyström-Lahti M, Kristo P, Nicolaides NC; et al. (November 1995). "Founding mutations and Alu-mediated recombination in hereditary colon cancer". Nat. Med. 1 (11): 1203–6. doi:10.1038/nm1195-1203. PMID 7584997.

[SNPedia-28] "SNPedia: SNP in the promoter region of the myeloperoxidase MPO gene".

[Puthucheary-29] Puthucheary Z, Skipworth J, Rawal J, Loosemore M, Van Someren K, Montgomery H (2011). "The ACE Gene and Human Performance: 12 Years On". Sports Medicine. 41: 433–448. doi:10.2165/11588720-000000000-00000. PMID 21615186.

[Dulai-30] Dulai KS, Von Dornum M, Mollon JD, Hunt DM (1999). "The Evolution of Trichromatic Color Vision by Opsin Gene Duplication in New World and Old World Primates". Genome Research. 9 (7): 629–638. doi:10.1101/gr.9.7.629. PMID 10413401.

[Imataka-31] 31.0 ^31.1 H Imataka, K Sogawa, KI Yasumoto, Y Kikuchi, K Sasano, A Kobayashi, M Hayami, and Y Fujii-Kuriyama (October 1992). "Two regulatory proteins that bind to the basic transcription element (BTE), a GC box sequence in the promoter region of the rat P-4501A1 gene" (PDF). The EMBO Journal. 11 (10): 3663–71. PMID 1356762. Retrieved 2013-01-27.

[Blake-32] 32.0 ^32.1 ^32.2 Michael C. Blake, Robert C. Jambou, Andrew G. Swick, Jeanne W. Kahn, and Jane Clifford Azizkhan (December 1990). "Transcriptional Initiation Is Controlled by Upstream GC-Box Interactions in a TATAA-Less Promoter" (PDF). Molecular and Cellular Biology. 10 (12): 6632–41. doi:10.1128/MCB.10.12.6632. PMID 2247077. Retrieved 2013-01-27.

[Tanay2011-33] Cohen N, Kenigsberg E, Tanay A (2011). "Primate CpG Islands Are Maintained by Heterogeneous Evolutionary Regimes Involving Minimal Selection". Cell. 145 (5): 773–86. doi:10.1016/j.cell.2011.04.024. PMID 21620139.

[Zhang2019-34] Zhang, Bosen; Song, Liwei; Cai, Jiali; Li, Lei; Xu, Hong; Li, Mengying; Wang, Jiamin; Shi, Minmin; Chen, Hao; Jia, Hao; Hou, Zhaoyuan (2019). "The LIM protein Ajuba/SP1 complex forms a feed forward loop to induce SP1 target genes and promote pancreatic cancer cell proliferation". Journal of Experimental & Clinical Cancer Research. 38 (1): 205. doi:10.1186/s13046-019-1203-2. ISSN 1756-9966. PMID 31101117.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

v t e Gene project
Articles	Complex locus A1BG and ZNF497 Grainyhead-like Genes in Regulating Development and Genetic Defects Lysenin Lysine: biosynthesis, catabolism and roles RIG-I like receptors ShK toxin: history, structure and therapeutic applications for autoimmune diseases
Categories	Biochemistry Biology Genetics Medicine
Laboratories	AGC box gene transcription laboratory ATA box gene transcription laboratory C and D boxes gene transcription laboratory CArG box gene transcription laboratory CGCG box gene transcription laboratory CRE box gene transcription laboratory E2 box gene transcription laboratory Enhancer box gene transcription laboratory Factor II B recognition element gene transcription laboratory GA responsive complex gene transcription laboratory GC box gene transcription laboratory H box gene transcription laboratory HNF6 gene transcription laboratory HY box gene transcription laboratory Initiator element gene transcription laboratory Metal responsive element gene transcription laboratory STAT5 gene transcription laboratory TATA box gene transcription laboratory
Lessons	A1BG gene transcription programming Amino Acids Enzymes Enzyme catalysis Enzyme structure and function Eukaryotic transcription Gene regulation in prokaryotes
Lists	Biomolecules
Modules	Module:Infobox gene Module:InfoboxImage
Original research	Gene project
Projects	Biochemistry Gene project History of biology Molecular Biology Molecular evolution Topobiology
Proposals	Gene expressions/Cost sharing and research products Gene expressions in human exploration beyond low earth orbits Gene expressions/Project narrative
Resources	5' cap Acid-base homeostasis Actins Adenines Allergies Alpha-1-B glycoprotein Ammonoids Original research/Amino acids Amphiphiles Anabolism Animal physiology Anomeric carbons Autocatalytic reactions Autonomously replicating sequences Base pairs Biology Biodegradation Biosynthesis Biosynthesis of a human protein Biosynthesis of amino acids Blood Bodily fluids Botany Brain box Calcium signaling Capping enzymes Carbohydrates Carcinoembryonic antigen gene family Catabolism Catalysis Cells Cell signaling Centrosomes Chromatins Chromoboxes Coactivators Corepressors Cofactors Consensus sequences Cytogenetics Cytokinesis Cytosines Deoxyribonucleic acids Digestion Disaccharides Dispersed promoters Dominant group metagenomes Downregulations Endochondral ossification Enzyme inhibitors Enzymology Epigenetics Epigenomes Esters Esterification Eukaryotes Eukaryotic initiation factors Evolution Exaptation Excision repair cross-complementing Factors Fatty acids Ferredoxin Foldings Foods Forkhead boxes Functional groups Genome surveillance complexes Genealogy Genes Genetics Gene transcriptions Genomes Genomics Glycoproteins Glycosides Glycosidic bond Guanines Hair color gene expressions Helicases Heredity History of agriculture Greek and Roman histories of biology Homeostasis Human amino acid synthesis Human DNAs Human genes Human RNA Human teeth Human temperatures Immunoglobulin domain cl11960 Immunoglobulin domain genes Immunoglobulin like domain cd05751 Immunoglobulin like domain pfam13895 Immunoglobulin like domain smart00410 Immunoglobulin receptor superfamily genes Immunoglobulin supergene family Inhibitory peptides Insulators Intranuclear localizations Introduction to Cell Biology Introduction to polymer chemistry Lamarckism Leucine zipper Localization Major histocompatibility complex class I gene family Major histocompatibility complex class II gene family Major histocompatibility complex class III gene family Mammalogy Mathematical molecular biology Mediator complexes Medicine Melanocytes Membranes Metagenomes Molecular biology Molecular genetics Nitrogen metabolism Nucleotide Synthesis Origin of life Orthomolecular medicine Osteoarthritis Paleanthropology Paleontology Phosphate biochemistry Phosphate budgets Phosphate reactions Post translational modifications Principles of biosynthesis Protein isoform Proteins Proteomics Regulations Ribonucleotides Ribosomes RNA polymerases RNA polymerase II holoenzymes RNA polymerase II holoenzyme complexes RNA translations Salinity Stroke management Teeth TFIIA Transports Vascular endothelial growth factor A What is a human? Upregulations Upstream and downstream ZSCAN22 Zoology
Transcription resources	A1BG gene transcription core promoters A1BG gene transcriptions A1BG regulatory elements and regions A1BG response element gene transcriptions A1BG response element negative results A1BG response element positive results ABA-response element gene transcriptions Abf1 regulatory factor gene transcriptions A box gene transcriptions ACGT-containing element gene transcriptions Activating protein gene transcriptions Activating transcription factor gene transcriptions Adenylate–uridylate rich element gene transcriptions Adr1p gene transcriptions Aft1p gene transcriptions AGC box gene transcriptions AGCE gene transcriptions Alpha-amylase conserved element gene transcriptions Amino acid response element gene transcriptions AARE-like Androgen response element gene transcriptions Angiotensinogen core promoter element gene transcriptions Antioxidant-electrophile responsive element gene transcriptions ATA box gene transcriptions Auxin response factor gene transcriptions B box gene transcriptions Bioinformatics tool gene transcriptions Box gene transcriptions Bridge gene transcriptions CAAT box gene transcriptions CadC binding domain gene transcriptions Calcineurin-responsive transcription factor gene transcriptions Calcium-response element gene transcriptions cAMP response element gene transcriptions C and D boxes gene transcriptions Carbohydrate response element gene transcriptions Carbon source-responsive element gene transcriptions Carcinoembryonic antigen gene family CARE gene transcriptions CArG box gene transcriptions CAT box gene transcriptions Cat8p gene transcriptions Cbf1 regulatory factor gene transcriptions C box gene transcriptions CCCTC-binding factor gene transcriptions C-EBP box gene transcriptions Cell-cycle box gene transcriptions Cell cycle regulation gene transcriptions CENP-B box gene transcriptions CGCG box gene transcriptions Circadian control element gene transcriptions Cold-responsive element gene transcriptions Complement copy gene transcriptions Complement-inverse copy gene transcriptions Consensus sequence gene transcriptions Copper response element gene transcriptions Core promoter gene transcriptions Coupling element gene transcriptions CRE box gene transcriptions Cytokinin response regulator gene transcriptions Cytoplasmic polyadenylation element gene transcriptions DAF-16-associated element gene transcriptions DAF-16 binding element gene transcriptions D box gene transcriptions Defense and stress-responsive element gene transcriptions Degenerate nucleotide gene transcriptions Dispersed promoter gene transcriptions Distal promoter gene transcriptions DNA melting gene transcriptions DNA damage response element gene transcriptions DNA replication-related element gene transcriptions Downstream core element gene transcriptions Downstream promoter element gene transcriptions Downstream TFIIB recognition element gene transcriptions DREB box gene transcriptions E2 box gene transcriptions EIF4E basal element gene transcriptions EIN3 binding site gene transcriptions Enhancer activity copy gene transcriptions E box gene transcriptions Element gene transcriptions Endoplasmic reticulum stress response element gene transcriptions Endosperm expression gene transcriptions Enhancer box gene transcriptions Estrogen response element gene transcriptions Ethylene responsive element gene transcriptions Factor II B recognition element gene transcriptions F box gene transcriptions Focused promoter gene transcriptions Forkhead box gene transcriptions Fur box gene transcriptions GAAC element gene transcriptions Gal4p gene transcriptions Γ-interferon activated sequence gene transcriptions GARE gene transcriptions GA responsive complex gene transcriptions GATA gene transcriptions G box gene transcriptions GC box gene transcriptions GCC box gene transcriptions Gcn4p gene transcriptions Gcr1p gene transcriptions Gene expressions General factor II D gene transcriptions General regulatory factors General transcription factor II A gene transcriptions General transcription factor II B gene transcriptions General transcription factor II D gene transcriptions General transcription factor II F gene transcriptions General transcription factor II H gene transcriptions General transcription factor gene transcriptions Gene transcriptions GGC triplet gene transcriptions Gibberellin responsive element gene transcriptions GLM box gene transcriptions Glucocorticoid response element gene transcriptions Grainy head gene transcriptions Grainy head transcription factor gene transcriptions Growth hormone response element gene transcriptions GT boxes Hac1p gene transcriptions Hair color gene expressions H and ACA box gene transcriptions H box gene transcriptions Heat-responsive element gene transcriptions Hex sequence gene transcriptions HMG box gene transcriptions HNF gene transcriptions Homeobox gene transcriptions Hsf1p gene transcriptions HY box gene transcriptions Hybrid C, A boxes Hybrid C, G boxes Hybrid C, T boxes Hypoxia-inducible factor gene transcriptions Hypoxia response elements I box gene transcriptions Initiator element gene transcriptions Initiator-like element gene transcriptions Inositol/choline-responsive elements Interaction gene transcriptions Interferon regulatory factors Inverse copy gene transcriptions Jasmonic acid-responsive element gene transcriptions K-boxes Kozak sequence gene transcriptions Kruppel-associated box gene transcriptions Krüppel-like factor gene transcriptions L box gene transcriptions Leu3 gene transcriptions M35 box gene transcriptions MADS box gene transcriptions Maf recognition element gene transcriptions M box gene transcriptions Mcm1 regulatory factor gene transcriptions Met31p box gene transcriptions Metal responsive element gene transcriptions Middle sporulation element gene transcriptions Mig1p gene transcriptions Model samplings Motif ten element gene transcriptions Msn2,4p gene transcriptions Musashi binding element gene transcriptions MYB recognition element gene transcriptions Myelocytomatosis transcription factor gene transcriptions Myocyte enhancer factor gene transcriptions N-boxes Ndt80p gene transcriptions Nuclear factor 1 Nuclear factor 𝜿B Nuclear factor gene transcriptions Nuclear factor of activated T cell gene transcriptions (NFAT) Nuclear factor Y gene transcriptions Nutrient-sensing response element gene transcriptions Oaf1p gene transcriptions ORE1 binding site gene transcriptions p53 response element gene transcriptions P63 DNA-binding site gene transcriptions P box gene transcriptions Pdr1,3p gene transcriptions Peroxisome proliferator hormone response element gene transcriptions Phosphate starvation-response transcription factor gene transcriptions Pollen1 element gene transcriptions Polycomb response element gene transcriptions Preinitiation complex Preinitiation complex gene transcriptions Pribnow box gene transcriptions Prolamin box gene transcriptions Promoter gene transcriptions Proximal promoter gene transcriptions Promoter occurrence gene transcriptions Pyrimidine box gene transcriptions Q element gene transcriptions Rap1 regulatory factor gene transcriptions Reb1 general regulatory factor gene transcriptions Retinoblastoma control element gene transcriptions Retinoic acid response element gene transcriptions Rgt1p gene transcriptions Rlm1p gene transcriptions RNA polymerase II gene transcriptions RNA polymerase II holoenzyme complex Root specific element gene transcriptions ROR-response element gene transcriptions Rox1p gene transcriptions Rpn4p gene transcriptions R response element gene transcriptions SARE gene transcriptions Seed-specific element gene transcriptions Serum response element gene transcriptions Servenius sequence gene transcriptions Shoot specific element gene transcriptions Sip4p gene transcriptions Smp1p gene transcriptions Sp1 gene transcriptions Spaceflight gene expressions Specificity protein gene transcriptions STAT gene transcriptions Ste12p gene transcriptions Sterol response element gene transcriptions Sucrose box gene transcriptions Synaptic Activity-Responsive Elements TACTAAC box gene transcriptions TAGteam gene transcriptions Tapetum box gene transcriptions TATA binding protein associated factor gene transcriptions TATA binding protein gene transcriptions TATA box gene transcriptions TAT box gene transcriptions TATC box gene transcriptions Tbf1 regulatory factor gene transcriptions T box gene transcriptions TCCACCATA element gene transcriptions TC element gene transcriptions TCT gene transcriptions TEA consensus sequence gene transcriptions Tec1p gene transcriptions Telomeric repeat DNA-binding factor gene transcriptions Tetradecanoylphorbol-13-acetate response element gene transcriptions TGF-β control elements (TCEs) TGF-β inhibitory elements (TIEs) Thyroid hormone response element gene transcriptions Transcriptional regulation Transcription bubble gene transcriptions Transcription factor gene transcriptions Transcription factor 3 gene transcriptions Transcription factory gene transcriptions Transcription start site gene transcriptions Translational control sequence gene transcriptions U box gene transcriptions Unfolded protein response element gene transcriptions Upstream response element gene transcriptions Upstream stimulatory factor gene transcriptions UTR promoter gene transcriptions V and P box gene transcriptions V box gene transcriptions Vhr1p gene transcriptions Vitamin D response element gene transcriptions W box gene transcriptions X box gene transcriptions Xbp1p gene transcriptions X core promoter element gene transcriptions Xenobiotic response element gene transcriptions Xenobiotic responsive element gene transcriptions Yap1p,2p gene transcriptions Y box gene transcriptions YY1 gene transcriptions Zap1p gene transcriptions Z box gene transcriptions Zinc responsive element gene transcriptions