AGC box gene transcriptions: Difference between revisions

Latest revision as of 16:35, 29 August 2023

Editor-In-Chief: Henry A. Hoff

This is a digital photograph of Arabidopsis thaliana. Credit: Alberto Salguero Quiles en Getafe (Madrid), España.

"The GCC box, also referred to as the AGC box (10), GCC element (11), or AGCCGCC sequence (13), is an ethylene-responsive element found in the promoters of a large number of [pathogenesis related] PR genes whose expression is up-regulated following pathogen attack."^[1]

Consensus sequences

The AGC box has a consensus sequence as 3'-AGCCGCC-5' in the direction of transcription.^[2]

AGC

"AGC is a binding site for factors responding to pathogen attacks (Ohme-Takagi et al., 2000)".^[3]

Inverse copies

For "AGC, one copy in inverse orientation of the AGC box (AGCCGCC) [is] present as two copies (-1346 and -1314) in the ERE".^[2]

Enhancers

"Enhancer activity, ethylene responsiveness, and binding of nuclear proteins depend on the integrity of two copies of the AGC box, AGCCGCC, present in the promoters of several ethylene-responsive genes."^[2]

"The GLB enhancer contains two copies of the sequence AGCCGCC, which is conserved in several genes showing expression patterns similar to the GLB gene, as well as a sequence identical at 6 of 7 bp."^[4]

Glucanase promoters

"One common motif, AGCCGCC (AGC box), has been found to be present in nearly all chitinase and glucanase promoters so far analyzed (Ohme-Takagi and Shinshi 1990; Hart et al. 1993)."^[5]

DNA-binding proteins

"cDNA clones have been identified representing 4 novel DNA-binding proteins, called ethylene-responsive element binding proteins (EREBPs), that specifically bind the ERE AGC box".^[2]

Functional non-coding DNA

Functional "non-coding DNA is involved in the regulation of gene expression and thus in the evolution of novelties and adaptation between species [...] Functional non-coding sequences fall into two main categories: protein binding sites such as transcription factor binding sites (TFBSs), enhancers [such as the AGC box], and silencers, which are involved in the control of gene expression, and sequences that control chromatin organization such as insulators and matrix attachment regions".^[6]

Pathogenesis-related genes

"Genes of PR-1 and -5 proteins have now been identified in the genomes of various species of organisms, including humans and nematodes. PR proteins may contribute to the innate immunity of plants as well as to that of other organisms."^[7]

Ostreococcus

File:Ostreococcus RCC143 2.jpg

This is a photomicrograph of Ostreococcus. Credit: Wenche Eikrem and Jahn Throndsen, University of Oslo.

"Ocean-dwelling phytoplankton from the genus Ostreococcus emerge at the primitive root of the green plant lineage, dating back nearly 1.5 billion years. Today, these microscopic, free-living creatures, among the smallest eukaryotes ever characterized, barely a micron in diameter, contribute to a significant share of the world’s total photosynthetic activity. These “picophytoplankton”also exhibit great diversity that contrasts sharply with the dearth of ecological niches available to them in aquatic ecosystems. This observation, known as the “paradox of the plankton,” has long puzzled biologists."^[8]

"Plumbing the depths of molecular-level information of related species, genomics offers a novel glimpse into this paradox. The researchers compared the genomes of two Ostreococcus species, O. lucimarinus and O. tauri, and saw dramatic changes in genome structure and metabolic capabilities."^[8]

“We found several striking features of genome organization. Overlapping genes conserved across the species may enable them to cross-regulate their expression, while species-specific chromosomes with horizontally transferred genes can account for changes in the cell surface to adapt to different ecological niches.”^[8]

“This work builds on the community’s emerging understanding about how carbon fixation is carried out by picoplankton.”^[9]

“From an applied perspective, we are learning some of the tricks nature has employed to ‘engineer’ an extremely small eukaryote to thrive in nature–which may well find applications in bioengineering. It was particularly interesting to see the predicted use of selenium-containing enzymes as one of the tricks to maintain such tiny cells. There are many mechanisms that can account for species formation in photosynthetic phytoplankton, and this is just one of the major pieces to this long-standing puzzle for biologists.”^[9]

“Assimilation of atmospheric CO₂ by marine phytoplankton is a global-scale process that is responsible for about half of the biosphere net primary production. This active absorption of hundreds of millions of tons of carbon per day is essential for maintaining the control of the planet’s climate by counteracting greenhouse effects due to human activities. Clearly, this storage capacity is affected by changes in the photosynthetic efficiency of the algae, which in turn is linked to the environmental conditions experienced by these organisms in their environment.”^[10]

Nicotiana

The osmotin-like protein (OLP) "has no intron and ... its promoter region contains two AGCCGCC sequences that are conserved in most basic PR-protein genes."^[11]

The "AGCCGCC sequence(s) is a DNA element(s) responsive to ethylene. An EREBP2 protein, isolated as one of the proteins binding the AGCCGCC sequence of the tobacco rβ-1,3-glucanase gene, also was found to bind to the AGCCGCC sequence(s) of OLP gene. These results suggest that the ethylene-induced expression of OLP is regulated by trans-acting factor(s) common to basic PR-proteins."^[11]

"AGCCGCC sequences were found at -46 to -52 and -161 to -167. There was no repeated sequence (-938 to -903)".^[11]

"Expression of the osmotin gene is similar to that of the OLP gene. The osmotin gene also has several AGCCGCC sequences; a complete AGCCGCC (from -50 to -44), a slightly modified CGCCGCC (from -144 to -138), and an AGCCGCC sequence in reverse orientation (from -162 to -156)."^[11]

Arabidopsis

File:Arabidopsis thaliana inflorescencias.jpg

This is an image of the flowers of Arabidopsi thaliana, a specimen of about 15 cm, in the first week of March 2004. Credit: Alberto Salguero Quiles in Getafe (Madrid), Spain.

In Arabidopsis thaliana "an ethylene-inducible, GCC box DNA-binding protein interacts with an ocs element binding protein".^[1]

"In yeast and mammalian systems, it is well established that transcriptional down-regulation by DNA-binding repressors involves core histone deacetylation, mediated by their interaction within a complex containing histone deacetylase (e.g. HDA1), as well as various proteins (e.g. SIN3, SAP18, SAP30, and RhAp46). [An] Arabidopsis thaliana gene related in sequence to SAP18, designated AtSAP18, functions in transcription regulation in plants subjected to salt stress."^[12]

Evidence has been provided "that SAP18 and HDA1 function as transcriptional repressors. [Further] they associate with Ethylene-Responsive Element binding Factors (ERFs) to create a hormone-sensitive multimeric repressor complex under conditions of environmental stress."^[12]

"At the molecular level, the actions of ethylene upon gene expression involve Ethylene Responsive element binding Factors (ERFs), which display GCC box-specific binding activities in Arabidopsis (Ohme-Takagi and Shinshi, 1995). ERFs contain a highly conserved DNA binding domain (the EFR domain) consisting of 58-59 amino acids (Ohme-Takagi and Shinshi, 1995), which binds with high affinity to the GCC box (Hao et al., 1998)."^[12]

Peaches

"An AGC box (AGCCGCC) was found [from peach (Prunus persica L. Batsch cv. Loring)] between 886 and 892 bp upstream of the translation start site which has been shown in other ethylene-responsive PR genes to be a binding site for ethylene-responsive binding factor proteins (ERF proteins) (Ohme-Takagi and Shinshi, 1995; Sato et al., 1996; Jia and Martin, 1999; Fujimoto et al., 2000)."^[3]

"The peach ACO1 does have an AGC box that has been found to bind ethylene responsive elements in response to pathogen infections (Ohme-Takagi et al., 2000; Rushton et al., 2002). Only the apple ACO1 also contains this sequence. In addition, both PpACO1 and the apple ACO1 have a MADS box transcription factor binding site (CarG) (Tilly et al., 1998), but none of the other ACO genes do."^[3]

E2F4

File:Protein E2F4 PDB 1cf7.png

Structure of the E2F4 protein shown is based on PyMOL rendering of PDB 1cf7. Credit: Emw.

Gene ID: 1874 - "The protein encoded by this gene is a member of the E2F family of transcription factors. The E2F family plays a crucial role in the control of cell cycle and action of tumor suppressor proteins and is also a target of the transforming proteins of small DNA tumor viruses. The E2F proteins contain several evolutionally conserved domains found in most members of the family. These domains include a DNA binding domain, a dimerization domain which determines interaction with the differentiation regulated transcription factor proteins (DP), a transactivation domain enriched in acidic amino acids, and a tumor suppressor protein association domain which is embedded within the transactivation domain. This protein binds to all three of the tumor suppressor proteins pRB, p107 and p130, but with higher affinity to the last two. It plays an important role in the suppression of proliferation-associated genes, and its gene mutation and increased expression may be associated with human cancer."^[13]

"The AGC triplet repeat in the coding region of the E2F-4 gene, a member of the family, has been reported to be mutated in colorectal cancers with a microsatellite instability (MSI) phenotype. We found a wider range variation of the repeat number in DNAs from tumors, the corresponding normal mucosa, and healthy individuals. A total of 5 repeat variants, ranging from 8 to 17 AGC repeats, was detected in 6 (9.7%) of the 62 healthy individuals and 8 (8.9%) of the 90 normal DNAs of the patients. The wild-type 13 repeat was present in all of these individuals. The variation of the AGC repeat number may be a polymorphism. Further, loss of heterozygosity (LOH) at the E2F-4 locus in the tumor tissues of 2 (25%) of the 8 informative cases was detected."^[14]

Hypotheses

An AGC box occurs in the human genome.

AGC box samplings

For the Basic programs (starting with SuccessablesAGC.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), including extending the number of nts from 958 to 4445, the programs are, are looking for, and found:

negative strand in the negative direction is SuccessablesAGC--.bas, looking for AGCCGCC, 0,
negative strand in the positive direction is SuccessablesAGC-+.bas, looking for AGCCGCC, 0,
positive strand in the negative direction is SuccessablesAGC+-.bas, looking for AGCCGCC, 0,
positive strand in the positive direction is SuccessablesAGC++.bas, looking for AGCCGCC, 0,
inverse complement, negative strand, negative direction is SuccessablesAGCci--.bas, looking for GGCGGCT: 0.
inverse complement, negative strand, positive direction is SuccessablesAGCci-+.bas, looking for GGCGGCT: 0.
inverse complement, positive strand, negative direction is SuccessablesAGCci+-.bas, looking for GGCGGCT: 1, GGCGGCT at 1754.
inverse complement, positive strand, positive direction is SuccessablesAGCci++.bas, looking for GGCGGCT: 0.

AGCbox negative direction (2596-1) distal promoters

Positive strand, negative direction: GGCGGCT at 1754.

AGC random dataset samplings

AGCr0: 1, AGCCGCC at 2380.
AGCr1: 0.
AGCr2: 0.
AGCr3: 2, AGCCGCC at 4138, AGCCGCC at 1452.
AGCr4: 1, AGCCGCC at 80.
AGCr5: 1, AGCCGCC at 4353.
AGCr6: 0.
AGCr7: 0.
AGCr8: 0.
AGCr9: 1, AGCCGCC at 2449.
AGCr0ci: 1, GGCGGCT at 3548.
AGCr1ci: 0.
AGCr2ci: 1, GGCGGCT at 4349.
AGCr3ci: 1, GGCGGCT at 1443.
AGCr4ci: 1, GGCGGCT at 4110.
AGCr5ci: 0.
AGCr6ci: 0.
AGCr7ci: 0.
AGCr8ci: 0.
AGCr9ci: 0.

AGCr arbitrary (evens) (4560-2846) UTRs

AGCr0ci: GGCGGCT at 3548.
AGCr2ci: GGCGGCT at 4349.
AGCr4ci: GGCGGCT at 4110.

AGCr alternate (odds) (4560-2846) UTRs

AGCr3: AGCCGCC at 4138.
AGCr5: AGCCGCC at 4353.

AGCr arbitrary positive direction (odds) (4445-4265) core promoters

AGCr5: AGCCGCC at 4353.

AGCr alternate positive direction (evens) (4445-4265) core promoters

AGCr2ci: GGCGGCT at 4349.

AGCr arbitrary positive direction (odds) (4265-4050) proximal promoters

AGCr3: AGCCGCC at 4138.

AGCr alternate positive direction (evens) (4265-4050) proximal promoters

AGCr4ci: GGCGGCT at 4110.

AGCr arbitrary negative direction (evens) (2596-1) distal promoters

AGCr0: AGCCGCC at 2380.
AGCr4: AGCCGCC at 80.

AGCr alternate negative direction (odds) (2596-1) distal promoters

AGCr3: AGCCGCC at 1452.
AGCr9: AGCCGCC at 2449.
AGCr3ci: GGCGGCT at 1443.

AGCr arbitrary positive direction (odds) (4050-1) distal promoters

AGCr3: AGCCGCC at 1452.
AGCr9: AGCCGCC at 2449.
AGCr3ci: GGCGGCT at 1443.

AGCr alternate positive direction (evens) (4050-1) distal promoters

AGCr0: AGCCGCC at 2380.
AGCr4: AGCCGCC at 80.
AGCr0ci: GGCGGCT at 3548.

AGC box analysis and results

"An AGC box (AGCCGCC) was found [from peach (Prunus persica L. Batsch cv. Loring)] between 886 and 892 bp upstream of the translation start site which has been shown in other ethylene-responsive PR genes to be a binding site for ethylene-responsive binding factor proteins (ERF proteins) (Ohme-Takagi and Shinshi, 1995; Sato et al., 1996; Jia and Martin, 1999; Fujimoto et al., 2000)."^[3]

Reals or randoms	Promoters	direction	Numbers	Strands	Occurrences	Averages (± 0.1)
Reals	UTR	negative	0	2	0	0
Randoms	UTR	arbitrary negative	3	10	0.3	0.25
Randoms	UTR	alternate negative	2	10	0.2	0.25
Reals	Core	negative	0	2	0	0
Randoms	Core	arbitrary negative	0	10	0	0
Randoms	Core	alternate negative	0	10	0	0
Reals	Core	positive	0	2	0	0
Randoms	Core	arbitrary positive	1	10	0.1	0.1
Randoms	Core	alternate positive	1	10	0.1	0.1
Reals	Proximal	negative	0	2	0	0
Randoms	Proximal	arbitrary negative	0	10	0	0
Randoms	Proximal	alternate negative	0	10	0	0
Reals	Proximal	positive	0	2	0	0
Randoms	Proximal	arbitrary positive	1	10	0.1	0.1
Randoms	Proximal	alternate positive	1	10	0.1	0.1
Reals	Distal	negative	1	2	0.5	0.5
Randoms	Distal	arbitrary negative	2	10	0.2	0.25
Randoms	Distal	alternate negative	3	10	0.3	0.25
Reals	Distal	positive	0	2	0	0
Randoms	Distal	arbitrary positive	3	10	0.3	0.3
Randoms	Distal	alternate positive	3	10	0.3	0.3

Comparison:

The occurrence of a real AGC box is greater than the randoms. This suggests that the real AGC box is likely active or activable.

GCC box samplings

Copying GCCGCC in "⌘F" yields one between ZSCAN22 and A1BG and two between ZNF497 and A1BG as can be found by the computer programs.

For the Basic programs (starting with SuccessablesGCC.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), including extending the number of nts from 958 to 4445, the programs are, are looking for, and found:

negative strand, negative direction, looking for GCCGCC, 1, GCCGCC at 2727.
positive strand, negative direction, looking for GCCGCC, 0.
negative strand, positive direction, looking for GCCGCC, 2, GCCGCC at 1757, GCCGCC at 904.
positive strand, positive direction, looking for GCCGCC, 1, GCCGCC at 356.
inverse complement, negative strand, negative direction, looking for GGCGGC, 0.
inverse complement, positive strand, negative direction, looking for GGCGGC, 1, GGCGGC at 1753.
inverse complement, positive strand, positive direction, looking for GGCGGC, 0.
inverse complement, negative strand, positive direction, looking for GGCGGC, 3, GGCGGC at 1902, GGCGGC at 1794, GGCGGC at 354.

AGC negative direction (2811-2596) proximal promoters

Negative strand, negative direction: GCCGCC at 2727.

AGC negative direction (2596-1) distal promoters

Positive strand, negative direction: GGCGGC at 1753.

AGC positive direction (4050-1) distal promoters

Negative strand, positive direction: GCCGCC at 1757, GCCGCC at 904.
Negative strand, positive direction: GGCGGC at 1902, GGCGGC at 1794, GGCGGC at 354.
Positive strand, positive direction: GCCGCC at 356.

GCC random dataset samplings

GCCr0: 3, GCCGCC at 3407, GCCGCC at 2380, GCCGCC at 1384.
GCCr1: 0.
GCCr2: 3, GCCGCC at 3586, GCCGCC at 2598, GCCGCC at 1966.
GCCr3: 3, GCCGCC at 4138, GCCGCC at 2792, GCCGCC at 1452.
GCCr4: 4, GCCGCC at 1092, GCCGCC at 1089, GCCGCC at 1022, GCCGCC at 80.
GCCr5: 1, GCCGCC at 4353.
GCCr6: 0.
GCCr7: 1, GCCGCC at 1770.
GCCr8: 2, GCCGCC at 2518, GCCGCC at 2473.
GCCr9: 3, GCCGCC at 2666, GCCGCC at 2449, GCCGCC at 1415.
GCCr0ci: 1, GGCGGC at 3547.
GCCr1ci: 0.
GCCr2ci: 1, GGCGGC at 4348.
GCCr3ci: 1, GGCGGC at 1442.
GCCr4ci: 1, GGCGGC at 4109.
GCCr5ci: 2, GGCGGC at 2932, GGCGGC at 678.
GCCr6ci: 1, GGCGGC at 4434.
GCCr7ci: 0.
GCCr8ci: 1, GGCGGC at 4280.
GCCr9ci: 3, GGCGGC at 3896, GGCGGC at 3628, GGCGGC at 1727.

GCCr arbitrary (evens) (4560-2846) UTRs

GCCr0: GCCGCC at 3407.
GCCr2: GCCGCC at 3586.
GCCr0ci: GGCGGC at 3547.
GCCr2ci: GGCGGC at 4348.
GCCr4ci: GGCGGC at 4109.
GCCr6ci: GGCGGC at 4434.
GCCr8ci: GGCGGC at 4280.

GCCr alternate (odds) (4560-2846) UTRs

GCCr3: GCCGCC at 4138.
GCCr5: GCCGCC at 4353.
GCCr5ci: GGCGGC at 2932.
GCCr9ci: GGCGGC at 3896, GGCGGC at 3628.

GCCr arbitrary positive direction (odds) (4445-4265) core promoters

GCCr5: GCCGCC at 4353.

GCCr alternate positive direction (evens) (4445-4265) core promoters

GCCr2ci: GGCGGC at 4348.
GCCr6ci: GGCGGC at 4434.
GCCr8ci: GGCGGC at 4280.

GCCr arbitrary negative direction (evens) (2811-2596) proximal promoters

GCCr2: GCCGCC at 2598.

GCCr alternate negative direction (odds) (2811-2596) proximal promoters

GCCr3: GCCGCC at 2792.
GCCr9: GCCGCC at 2666.

GCCr arbitrary positive direction (odds) (4265-4050) proximal promoters

GCCr3: GCCGCC at 4138.

GCCr alternate positive direction (evens) (4265-4050) proximal promoters

GCCr4ci: GGCGGC at 4109.

GCCr arbitrary negative direction (evens) (2596-1) distal promoters

GCCr0: GCCGCC at 2380, GCCGCC at 1384.
GCCr2: GCCGCC at 1966.
GCCr4: GCCGCC at 1092, GCCGCC at 1089, GCCGCC at 1022, GCCGCC at 80.
GCCr8: GCCGCC at 2518, GCCGCC at 2473.

GCCr alternate negative direction (odds) (2596-1) distal promoters

GCCr3: GCCGCC at 1452.
GCCr7: GCCGCC at 1770.
GCCr9: GCCGCC at 2449, GCCGCC at 1415.
GCCr3ci: GGCGGC at 1442.
GCCr5ci: GGCGGC at 678.
GCCr9ci: GGCGGC at 1727.

GCCr arbitrary positive direction (odds) (4050-1) distal promoters

GCCr3: GCCGCC at 2792, GCCGCC at 1452.
GCCr7: GCCGCC at 1770.
GCCr9: GCCGCC at 2666, GCCGCC at 2449, GCCGCC at 1415.
GCCr3ci: GGCGGC at 1442.
GCCr5ci: GGCGGC at 2932, GGCGGC at 678.
GCCr9ci: GGCGGC at 3896, GGCGGC at 3628, GGCGGC at 1727.

GCCr alternate positive direction (evens) (4050-1) distal promoters

GCCr0: GCCGCC at 3407, GCCGCC at 2380, GCCGCC at 1384.
GCCr2: GCCGCC at 3586, GCCGCC at 2598, GCCGCC at 1966.
GCCr4: GCCGCC at 1092, GCCGCC at 1089, GCCGCC at 1022, GCCGCC at 80.
GCCr8: GCCGCC at 2518, GCCGCC at 2473.
GCCr0ci: GGCGGC at 3547.

GCC box analysis and results

"Expression of the osmotin gene is similar to that of the OLP gene. The osmotin gene also has several AGCCGCC sequences; a complete AGCCGCC (from -50 to -44), a slightly modified CGCCGCC (from -144 to -138), and an AGCCGCC sequence in reverse orientation (from -162 to -156)."^[11]

Reals or randoms	Promoters	direction	Numbers	Strands	Occurrences	Averages (± 0.1)
Reals	UTR	negative	0	2	0	0
Randoms	UTR	arbitrary negative	7	10	0.7	0.6
Randoms	UTR	alternate negative	5	10	0.5	0.6
Reals	Core	negative	0	2	0	0
Randoms	Core	arbitrary negative	0	10	0	0
Randoms	Core	alternate negative	0	10	0	0
Reals	Core	positive	0	2	0	0
Randoms	Core	arbitrary positive	1	10	0.1	0
Randoms	Core	alternate positive	3	10	0.3	0
Reals	Proximal	negative	1	2	0.5	0.5 ± 0.5 (--1,+-0)
Randoms	Proximal	arbitrary negative	1	10	0.1	0.15
Randoms	Proximal	alternate negative	2	10	0.2	0.15
Reals	Proximal	positive	0	2	0	0
Randoms	Proximal	arbitrary positive	1	10	0.1	0.1
Randoms	Proximal	alternate positive	1	10	0.1	0.1
Reals	Distal	negative	1	2	0.5	0.5 ± 0.5 (--0,+-1)
Randoms	Distal	arbitrary negative	9	10	0.9	0.8
Randoms	Distal	alternate negative	7	10	0.7	0.8
Reals	Distal	positive	6	2	3	3 ± 2 (-+5,++1)
Randoms	Distal	arbitrary positive	12	10	1.2	1.25
Randoms	Distal	alternate positive	13	10	1.3	1.25

Comparison:

The occurrences of real GCC box proximals and negative distals are greater than the randoms and the positive distals are outside the randoms. This suggests that the real GCC boxes are likely active or activable.

GCC boxes occur in the

AGC boxes: "The GCC box, also referred to as the AGC box (10), GCC element (11), or AGCCGCC sequence (13), is an ethylene-responsive element found in the promoters of a large number of [pathogenesis related] PR genes whose expression is up-regulated following pathogen attack."^[1]
DNA damage response elements (DRE) (Sumrada, core): "A consensus sequence, 5'-TAGCCGCCGRRRR-3' (where R = an unspecified purine nucleoside [A/G],was generated from these data."^[15]
GGC triplets: "The transcription factors Uga3, Dal81 and Leu3 belong to the class III family (Zn(II)₂Cys₆ proteins), and they recognize highly related sequences rich in GGC triplets [15]."^[16]
Kozak sequences: GCCGCC(A/G)CCATGG.^[17]

Ethylene signaling pathway