HMG box gene transcriptions

Jump to navigation Jump to search
File:Large lymphocytes-9.JPG
This is a large lymphocyte. Credit: Guy Waterval.{{free media}}

"Upstream Binding Factor (UBF) is important for activation of ribosomal RNA transcription and belongs to a family of proteins containing nucleic acid binding domains, termed HMG-boxes, with similarity to High Mobility Group (HMG) chromosomal proteins."[1]

Chromosomal proteins

"Most HMG box proteins contain two or more HMG boxes and appear to bind DNA in a relatively sequence-aspecific manner (5, 13, 15, 16 and references therein). [...] they all appear to bind to the minor groove of the A/T A/T C A A A G-motif (10, 14, 18-20)."[2]

"Previous studies in lymphocytes have described two DNA-binding HMG box proteins, TCF-1 and LEF-1, with affinity for the A/TA/TCAAAG motif found in several T cell-specific enhancers."[3]

"The high mobility group-1 (HMG) box was originaly identified by Tjian and co-workers in the transcription factor UBF as a region of homology to HMG-1 proteins (Jantzen et al., 1990). UBF reportedly contained four such regions of -80 amino acids; one of these boxes was shown to mediate DNA binding."[3]

"Interestingly, the sequence-specific HMG boxes characterized to date display high afinity to the A/TA/TCAAAG motif despite a low level of amino acid homology (typically <25% identity)."[3]

"Human LEF-1 was originally identified as a T cell-specific protein binding to the TTCAAAG motif in the TCR-α enhancer (Waterman et al., 1991)."[3]

"As analysed by gel retardation, the Sox-4 HMG box indeed bound to the AACAAAG motif (probe MWε-1; Figure 2B, lane 1). As described for other HMG boxes, Sox-4 interacted with DNA bases within the minor groove: substitution of A/T pairs for I/C pairs, which leaves the surface of the minor groove intact (Star and Hawley, 1991), had no apparent effect on binding affinity (lanes 2 and 4)."[3]

Consensus sequences

"In mammals, the Tcf/Lef family consists of four genes: Tcf‐1, Lef‐1, Tcf‐3 and Tcf‐4. All TCF/LEF proteins display several common structural features (48,49). They contain a nearly identical DNA‐binding domain, the HMG box, recognizing the consensus sequence A/T A/T CAAA."[4]

"Both directed and random screen studies have identified a consensus recognition sequence for the HMG DBD; 5′-SCTTTGATS-3′ [...] (van de Wetering et al. 1997; van Beest et al. 2000; Hallikas and Taipale 2006; Atcha et al. 2007)."[5]

"The domain [SCTTTGATS] is called the “C clamp” to highlight the absolute requirement for four cysteine residues in DNA binding (Atcha et al. 2007) [...]."[5]

"The C clamp carries specificity for a secondary, GC-rich sequence called a “Helper site” [(C/G)C(C/G)G(C/G)] that can occur with variable spacing and orientation relative to the Wnt response element (Atcha et al. 2007; Chang et al. 2008)."[5]

High mobility group proteins

Gene ID: 6932 is TCF7 transcription factor 7 on 5q31.1: "This gene encodes a member of the T-cell factor/lymphoid enhancer-binding factor family of high mobility group (HMG) box transcriptional activators. This gene is expressed predominantly in T-cells and plays a critical role in natural killer cell and innate lymphoid cell development. The encoded protein forms a complex with beta-catenin and activates transcription through a Wnt/beta-catenin signaling pathway. Mice with a knockout of this gene are viable and fertile, but display a block in T-lymphocyte differentiation. Alternative splicing results in multiple transcript variants. Naturally-occurring isoforms lacking the N-terminal beta-catenin interaction domain may act as dominant negative regulators of Wnt signaling."[6]

  1. NP_001128323.2 transcription factor 7 isoform 3: "Transcript Variant: This variant (3, also known as A), differs in the 5' UTR, has multiple coding region differences, uses a downstream start codon, and differs in the 3' UTR, compared to variant 1. The resulting isoform (3) is shorter at the N-terminus and has a distinct C-terminus, compared to isoform 1."[6]
  2. NP_001333354.1 transcription factor 7 isoform 5: "Transcript Variant: This variant (8) contains an alternate exon in the coding region, compared to variant 1. The resulting isoform (5) is longer, compared to isoform 1."[6]
  3. NP_001333379.1 transcription factor 7 isoform 7: "Transcript Variant: This variant (9) differs in the 5' UTR, has multiple coding region differences, and uses a downstream start codon, compared to variant 1. The resulting isoform (7) is shorter at the N-terminus and has a distinct C-terminus, compared to isoform 1."[6]
  4. NP_001353431.1 transcription factor 7 isoform 8 [variant 10].[6]
  5. NP_003193.2 transcription factor 7 isoform 1: "Transcript Variant: This variant (1) encodes isoform (1)."[6]
  6. NP_963963.1 transcription factor 7 isoform 2: "Transcript Variant: This variant (2, also known as B), differs in the 5' UTR, has multiple coding region differences, and uses a downstream start codon, compared to variant 1. The resulting isoform (2) is shorter at the N-terminus, compared to isoform 1. Both variants 2 and 5 encode the same isoform."[6]
  7. NP_963965.1 transcription factor 7 isoform 4: "Transcript Variant: This variant (4, also known as C), differs in the 5' UTR, has multiple coding region differences, uses a downstream start codon, and differs in the 3' UTR, compared to variant 1. The resulting isoform (4) is shorter at the N-terminus and has a distinct C-terminus, compared to isoform 1."[6]
  8. NP_998813.1 transcription factor 7 isoform 2: "Transcript Variant: This variant (5) differs in the 5' UTR, has multiple coding region differences, and uses a downstream start codon, compared to variant 1. The resulting isoform (2) is shorter at the N-terminus, compared to isoform 1. Both variants 2 and 5 encode the same isoform."[6]

Gene ID: 6934 is TCF7L2 transcription factor 7 like 2 on 10q25.2-q25.3: "This gene encodes a high mobility group (HMG) box-containing transcription factor that plays a key role in the Wnt signaling pathway. The protein has been implicated in blood glucose homeostasis. Genetic variants of this gene are associated with increased risk of type 2 diabetes. Several transcript variants encoding multiple different isoforms have been found for this gene."[7]

  1. NP_001139746.1 transcription factor 7-like 2 isoform 1: "Transcript Variant: This variant (1) encodes the longest isoform."[7]
  2. NP_001139755.1 transcription factor 7-like 2 isoform 3: "Transcript Variant: This variant (3) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 3) has a distinct C-terminus and is shorter than isoform 1."[7]
  3. NP_001139756.1 transcription factor 7-like 2 isoform 4: "Transcript Variant: This variant (4) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 4) has a distinct C-terminus and is shorter than isoform 1."[7]
  4. NP_001139757.1 transcription factor 7-like 2 isoform 5: "Transcript Variant: This variant (5) has multiple differences in the coding region but maintains the reading frame, compared to variant 1. This variant encodes isoform 5, which is shorter than isoform 1."[7]
  5. NP_001139758.1 transcription factor 7-like 2 isoform 6: "Transcript Variant: This variant (6) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 6) has a distinct C-terminus and is shorter than isoform 1."[7]
  6. NP_001185454.1 transcription factor 7-like 2 isoform 7: "Transcript Variant: This variant (7) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 7) has a distinct C-terminus and is shorter than isoform 1."[7]
  7. NP_001185455.1 transcription factor 7-like 2 isoform 8: "Transcript Variant: This variant (8) has multiple differences in the coding region but maintains the reading frame, compared to variant 1. This variant encodes isoform 8, which is shorter than isoform 1."[7]
  8. NP_001185456.1 transcription factor 7-like 2 isoform 9: "Transcript Variant: This variant (9) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 9) has a distinct C-terminus and is shorter than isoform 1."[7]
  9. NP_001185457.1 transcription factor 7-like 2 isoform 10: "Transcript Variant: This variant (10) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 10) has a distinct C-terminus and is shorter than isoform 1."[7]
  10. NP_001185458.1 transcription factor 7-like 2 isoform 11: "Transcript Variant: This variant (11) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 11) has a distinct C-terminus and is shorter than isoform 1."[7]
  11. NP_001185459.1 transcription factor 7-like 2 isoform 12: "Transcript Variant: This variant (12) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 12) has a distinct C-terminus and is shorter than isoform 1."[7]
  12. NP_001185460.1 transcription factor 7-like 2 isoform 13: "Transcript Variant: This variant (13) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 13) has a distinct C-terminus and is shorter than isoform 1."[7]
  13. NP_001336799.1 transcription factor 7-like 2 isoform 14: "Transcript Variant: This variant (14) lacks alternate exons in the 5' UTR, lacks a portion of the 5' coding region, and initiates translation at an alternate start codon, compared to variant 1. The encoded isoform (14) has a distinct N-terminus and is shorter than isoform 1."[7]
  14. NP_001336800.1 transcription factor 7-like 2 isoform 15: "Transcript Variant: This variant (15) lacks alternate exons in the 5' UTR, lacks a portion of the 5' coding region, and initiates translation at an alternate start codon, compared to variant 1. The encoded isoform (15) has a distinct N-terminus and is shorter than isoform 1."[7]
  15. NP_001350430.1 transcription factor 7-like 2 isoform 16 [variant 16].[7]
  16. NP_001354872.1 transcription factor 7-like 2 isoform 17 [variant 17].[7]
  17. NP_110383.2 transcription factor 7-like 2 isoform 2: "Transcript Variant: This variant (2) has multiple differences in the coding region but maintains the reading frame, compared to variant 1. This variant encodes isoform 2, which is shorter than isoform 1."[7]

Gene ID: 51176 is LEF1 lymphoid enhancer binding factor 1: "This gene encodes a transcription factor belonging to a family of proteins that share homology with the high mobility group protein-1. The protein encoded by this gene can bind to a functionally important site in the T-cell receptor-alpha enhancer, thereby conferring maximal enhancer activity. This transcription factor is involved in the Wnt signaling pathway, and it may function in hair cell differentiation and follicle morphogenesis. Mutations in this gene have been found in somatic sebaceous tumors. This gene has also been linked to other cancers, including androgen-independent prostate cancer. Alternative splicing results in multiple transcript variants."[8]

  1. NP_001124185.1 lymphoid enhancer-binding factor 1 isoform 2: "Transcript Variant: This variant (2) lacks an alternate in-frame exon in the central coding region, compared to variant 1, resulting in an isoform (2) that is shorter than isoform 1. [...] SOX-TCF_HMG-box, class I member of the HMG-box superfamily of DNA-binding proteins. These proteins contain a single HMG box, and bind the minor groove of DNA in a highly sequence-specific manner. Members include SRY and its homologs in insects and vertebrates, and transcription factor-like proteins, TCF-1, -3, -4, and LEF-1. They appear to bind the minor groove of the A/T C A A A G/C-motif."[8]
  2. NP_001124186.1 lymphoid enhancer-binding factor 1 isoform 3: "Transcript Variant: This variant (3) lacks both an in-frame exon in the central coding region and an exon in the 3' coding region that causes a frameshift, compared to variant 1. The encoded isoform (3) has a distinct C-terminus and is shorter than isoform 1."[8]
  3. NP_001159591.1 lymphoid enhancer-binding factor 1 isoform 4: "Transcript Variant: This variant (4) differs in the 5' UTR and 5' coding region, and lacks an alternate in-frame exon in the central coding region, compared to variant 1. The encoded isoform (4) has a distinct N-terminus and is shorter than isoform 1."[8]
  4. NP_057353.1 lymphoid enhancer-binding factor 1 isoform 1: "Transcript Variant: This variant (1) represents the longest transcript and encodes the longest isoform (1)."[8]

HMG box samplings

Copying a responsive elements consensus sequence (A/T)(A/T)CAAAG and putting the sequence in "⌘F" finds none between ZNF497 and A1BG or none between ZSCAN22 and A1BG as can be found by the computer programs.

For the Basic programs testing consensus sequence (A/T)(A/T)CAAAG (starting with SuccessablesHMG.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for (A/T)(A/T)CAAAG, 0.
  2. negative strand, positive direction, looking for (A/T)(A/T)CAAAG, 0.
  3. positive strand, negative direction, looking for (A/T)(A/T)CAAAG, 1, ATCAAAG at 2891.
  4. positive strand, positive direction, looking for (A/T)(A/T)CAAAG, 0.
  5. complement, negative strand, negative direction, looking for (A/T)(A/T)GTTTC, 1, TAGTTTC at 2891.
  6. complement, negative strand, positive direction, looking for (A/T)(A/T)GTTTC, 0.
  7. complement, positive strand, negative direction, looking for (A/T)(A/T)GTTTC, 0.
  8. complement, positive strand, positive direction, looking for (A/T)(A/T)GTTTC, 0.
  9. inverse complement, negative strand, negative direction, looking for CTTTG(A/T)(A/T), 2, CTTTGTT at 1585, CTTTGTT at 229.
  10. inverse complement, negative strand, positive direction, looking for CTTTG(A/T)(A/T), 0.
  11. inverse complement, positive strand, negative direction, looking for CTTTG(A/T)(A/T), 0.
  12. inverse complement, positive strand, positive direction, looking for CTTTG(A/T)(A/T), 0.
  13. inverse negative strand, negative direction, looking for GAAAC(A/T)(A/T), 0.
  14. inverse negative strand, positive direction, looking for GAAAC(A/T)(A/T), 0.
  15. inverse positive strand, negative direction, looking for GAAAC(A/T)(A/T), 2, GAAACAA at 1585, GAAACAA at 229.
  16. inverse positive strand, positive direction, looking for GAAAC(A/T)(A/T), 0.

HMG UTRs

  1. Positive strand, negative direction: ATCAAAG at 2891.

HMG distal promoters

  1. Negative strand, negative direction: CTTTGTT at 1585, CTTTGTT at 229.

HMG random dataset samplings

  1. HMGr0: 3, TTCAAAG at 4166, AACAAAG at 3503, TTCAAAG at 3338.
  2. HMGr1: 2, TACAAAG at 3722, TACAAAG at 2071.
  3. HMGr2: 0.
  4. HMGr3: 1, AACAAAG at 278.
  5. HMGr4: 3, TACAAAG at 3777, AACAAAG at 3593, ATCAAAG at 672.
  6. HMGr5: 1, TACAAAG at 3734.
  7. HMGr6: 1, TACAAAG at 1499.
  8. HMGr7: 2, ATCAAAG at 2949, TTCAAAG at 252.
  9. HMGr8: 4, AACAAAG at 2658, ATCAAAG at 1142, TTCAAAG at 935, AACAAAG at 620.
  10. HMGr9: 0.
  11. HMGr0ci: 5, CTTTGTT at 4178, CTTTGAT at 3842, CTTTGAA at 3415, CTTTGTA at 986, CTTTGTT at 617.
  12. HMGr1ci: 1, CTTTGTT at 1640.
  13. HMGr2ci: 0.
  14. HMGr3ci: 0.
  15. HMGr4ci: 0.
  16. HMGr5ci: 1, CTTTGTT at 1983.
  17. HMGr6ci: 2, CTTTGAA at 1712, CTTTGAT at 257.
  18. HMGr7ci: 0.
  19. HMGr8ci: 3, CTTTGTA at 2944, CTTTGTT at 1167, CTTTGAT at 1149.
  20. HMGr9ci: 0.

HMGr arbitrary UTRs

  1. HMGr0: TTCAAAG at 4166, AACAAAG at 3503, TTCAAAG at 3338.
  2. HMGr4: TACAAAG at 3777, AACAAAG at 3593.
  3. HMGr0ci: CTTTGTT at 4178, CTTTGAT at 3842, CTTTGAA at 3415.
  4. HMGr8ci: CTTTGTA at 2944.

HMGr alternate UTRs

  1. HMGr1: TACAAAG at 3722.
  2. HMGr5: TACAAAG at 3734.
  3. HMGr7: ATCAAAG at 2949.

HMGr arbitrary negative direction proximal promoters

  1. HMGr8: AACAAAG at 2658.

HMGr alternate positive direction proximal promoters

  1. HMGr0: TTCAAAG at 4166.
  2. HMGr0ci: CTTTGTT at 4178.

HMGr arbitrary negative direction distal promoters

  1. HMGr4: ATCAAAG at 672.
  2. HMGr6: TACAAAG at 1499.
  3. HMGr8: ATCAAAG at 1142, TTCAAAG at 935, AACAAAG at 620.
  4. HMGr0ci: CTTTGTA at 986, CTTTGTT at 617.
  5. HMGr6ci: CTTTGAA at 1712, CTTTGAT at 257.
  6. HMGr8ci: CTTTGTT at 1167, CTTTGAT at 1149.

HMGr alternate negative direction distal promoters

  1. HMGr1: TACAAAG at 2071.
  2. HMGr3: AACAAAG at 278.
  3. HMGr7: TTCAAAG at 252.
  4. HMGr1ci: CTTTGTT at 1640.
  5. HMGr5ci: CTTTGTT at 1983.

HMGr arbitrary positive direction distal promoters

  1. HMGr1: TACAAAG at 3722, TACAAAG at 2071.
  2. HMGr3: AACAAAG at 278.
  3. HMGr5: TACAAAG at 3734.
  4. HMGr7: ATCAAAG at 2949, TTCAAAG at 252.
  5. HMGr1ci: CTTTGTT at 1640.
  6. HMGr5ci: CTTTGTT at 1983.

HMGr alternate positive direction distal promoters

  1. HMGr0: AACAAAG at 3503, TTCAAAG at 3338.
  2. HMGr4: TACAAAG at 3777, AACAAAG at 3593, ATCAAAG at 672.
  3. HMGr6: TACAAAG at 1499.
  4. HMGr8: AACAAAG at 2658, ATCAAAG at 1142, TTCAAAG at 935, AACAAAG at 620.
  5. HMGr0ci: CTTTGAT at 3842, CTTTGAA at 3415, CTTTGTA at 986, CTTTGTT at 617.
  6. HMGr6ci: CTTTGAA at 1712, CTTTGAT at 257.
  7. HMGr8ci: CTTTGTA at 2944, CTTTGTT at 1167, CTTTGAT at 1149.

HMG box analysis and results

"Most HMG box proteins contain two or more HMG boxes and appear to bind DNA in a relatively sequence-aspecific manner (5, 13, 15, 16 and references therein). [...] they all appear to bind to the minor groove of the A/T A/T C A A A G-motif (10, 14, 18-20)."[2]

Reals or randoms Promoters direction Numbers Strands Occurrences Averages (± 0.1)
Reals UTR negative 1 2 0.5 0.5 ± 0.5 (--0,+-1)
Randoms UTR arbitrary negative 9 10 0.9 0.6
Randoms UTR alternate negative 3 10 0.3 0.6
Reals Core negative 0 2 0 0
Randoms Core arbitrary negative 0 10 0 0
Randoms Core alternate negative 0 10 0 0
Reals Core positive 0 2 0 0
Randoms Core arbitrary positive 0 10 0 0
Randoms Core alternate positive 0 10 0 0
Reals Proximal negative 0 2 0 0
Randoms Proximal arbitrary negative 1 10 0.1 0.05
Randoms Proximal alternate negative 0 10 0 0.05
Reals Proximal positive 0 2 0 0
Randoms Proximal arbitrary positive 0 10 0 0.1
Randoms Proximal alternate positive 2 10 0.2 0.1
Reals Distal negative 2 2 1 1 ± 1 (--2,+-0)
Randoms Distal arbitrary negative 11 10 1.1 0.8
Randoms Distal alternate negative 5 10 0.5 0.8
Reals Distal positive 0 2 0 0
Randoms Distal arbitrary positive 8 10 0.8 1.35
Randoms Distal alternate positive 19 10 1.9 1.35

Comparison:

The occurrences of real HMG boxes are greater than the randoms. This suggests that the real HMG boxes are likely active or activable.

Helper site samplings

For the Basic programs testing consensus sequence (C/G)C(C/G)G(C/G) (starting with SuccessablesHelp.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. Negative strand, negative direction: 16, CCGGG at 3929, CCGGC at 3874, CCGGG at 3576, CCCGG at 3567, CCCGC at 3044, GCCGC at 2726, CCCGC at 2723, CCCGC at 2012, CCCGC at 1894, CCCGC at 1808, CCCGC at 1759, CCCGC at 1241, GCGGC at 1154, CCGGC at 512, CCCGG at 511, CCGGC at 373.
  2. Positive strand, negative direction: 17, GCCGG at 4324, GCGGG at 4000, GCGGG at 3091, GCGGC at 2725, CCGGG at 2318, GCCGG at 2317, CCGGG at 2192, GCGGC at 1753, GCGGG at 1681, GCGGG at 1251, CCGGG at 1239, GCCGG at 1238, GCGGC at 957, CCGGG at 514, GCCGG at 513, CCGGG at 375, GCCGG at 374.
  3. Negative strand, positive direction: 64, GCGGG at 4440, GCGGG at 4430, CCGGG at 4245, CCCGC at 4237, CCGGC at 4003, CCCGG at 4002, GCGGG at 3671, CCGGG at 3558, CCCGG at 3557, CCCGC at 3325, GCCGC at 3226, GCGGG at 2486, GCCGC at 2355, GCGGC at 1902, CCGGC at 1847, GCGGC at 1794, GCGGG at 1765, GCCGG at 1759, GCCGC at 1756, GCGGG at 1707, GCGGG at 1681, GCCGC at 1648, GCGGC at 1637, GCGGG at 1591, GCGGC at 1582, CCGGC at 1547, GCGGC at 1438, GCGGC at 1423, GCGGC at 1338, GCGGC at 1323, CCGGC at 1295, GCGGC at 1255, CCGGC at 1211, GCGGC at 1171, GCGGC at 1148, CCGGC at 1043, GCGGC at 1034, GCGGG at 1026, GCGGC at 1003, GCGGG at 972, CCGGG at 911, GCCGG at 910, GCCGC at 903, GCGGG at 872, CCGGG at 811, GCCGG at 810, GCGGC at 751, CCCGG at 743, GCGGC at 721, GCGGC at 667, GCGGC at 637, GCGGC at 583, GCGGC at 499, GCGGG at 490, GCCGG at 484, CCGGG at 477, GCCGG at 476, CCGGG at 443, GCGGG at 407, CCCGC at 393, CCGGC at 376, CCCGG at 375, GCGGC at 354, GCGGC at 332.
  4. Positive strand, positive direction: 68, CCCGC at 4438, CCCGC at 4428, CCCGG at 4304, GCGGG at 4292, CCGGG at 4228, CCCGG at 4227, CCGGG at 3500, CCCGG at 3499, GCCGC at 1918, CCCGC at 1900, GCCGG at 1848, GCCGG at 1795, CCCGC at 1792, GCGGG at 1770, CCCGC at 1767, GCGGC at 1758, CCGGC at 1755, CCCGG at 1754, CCGGG at 1739, CCCGC at 1717, GCGGG at 1673, GCGGG at 1657, CCGGC at 1647, CCCGG at 1646, GCCGC at 1583, CCGGG at 1570, CCCGC at 1562, GCCGC at 1548, GCGGG at 1499, CCGGC at 1486, GCGGG at 1399, CCGGC at 1386, GCCGC at 1296, GCGGG at 1247, CCCGC at 1226, GCCGC at 1212, GCCGG at 1172, GCGGC at 1163, CCGGG at 1150, GCCGG at 1149, GCGGC at 1079, GCCGC at 1044, GCCGG at 981, CCCGC at 974, CCGGC at 950, GCCGG at 881, CCCGC at 874, CCGGC at 850, CCGGC at 765, GCCGG at 764, GCCGG at 722, GCCGC at 638, GCCGC at 540, GCGGG at 453, CCCGC at 445, CCGGG at 421, CCCGG at 420, CCCGC at 405, CCGGG at 390, CCCGG at 389, CCGGG at 372, GCCGC at 355, CCCGC at 352, GCCGG at 326, CCCGG at 283, CCCGG at 248, CCGGG at 200, CCGGG at 93.
  5. Helper boxes (Helper)s ci(C/G)C(C/G)G(C/G) = direct(C/G)C(C/G)G(C/G).

Helper (4560-2846) UTRs

  1. Negative strand, negative direction: CCGGG at 3929, CCGGC at 3874, CCGGG at 3576, CCCGG at 3567, CCCGC at 3044.
  2. Positive strand, negative direction: GCCGG at 4324, GCGGG at 4000, GCGGG at 3091.

Helper positive direction (4445-4265) core promoters

  1. Negative strand, positive direction: GCGGG at 4440, GCGGG at 4430.
  2. Positive strand, positive direction: CCCGC at 4438, CCCGC at 4428, CCCGG at 4304, GCGGG at 4292.

Helper negative direction (2811-2596) proximal promoters

  1. Negative strand, negative direction: GCCGC at 2726, CCCGC at 2723.
  2. Positive strand, negative direction: GCGGC at 2725.

Helper positive direction (4265-4050) proximal promoters

  1. Negative strand, positive direction: CCGGG at 4245, CCCGC at 4237.
  2. Positive strand, positive direction: CCGGG at 4228, CCCGG at 4227.

Helper negative direction (2596-1) distal promoters

  1. Negative strand, negative direction: CCCGC at 2012, CCCGC at 1894, CCCGC at 1808, CCCGC at 1759, CCCGC at 1241, GCGGC at 1154, CCGGC at 512, CCCGG at 511, CCGGC at 373.
  2. Positive strand, negative direction: CCGGG at 2318, GCCGG at 2317, CCGGG at 2192, GCGGC at 1753, GCGGG at 1681, GCGGG at 1251, CCGGG at 1239, GCCGG at 1238, GCGGC at 957, CCGGG at 514, GCCGG at 513, CCGGG at 375, GCCGG at 374.

Helper positive direction (4050-1) distal promoters

  1. Negative strand, positive direction: CCGGC at 4003, CCCGG at 4002, GCGGG at 3671, CCGGG at 3558, CCCGG at 3557, CCCGC at 3325, GCCGC at 3226, GCGGG at 2486, GCCGC at 2355, GCGGC at 1902, CCGGC at 1847, GCGGC at 1794, GCGGG at 1765, GCCGG at 1759, GCCGC at 1756, GCGGG at 1707, GCGGG at 1681, GCCGC at 1648, GCGGC at 1637, GCGGG at 1591, GCGGC at 1582, CCGGC at 1547, GCGGC at 1438, GCGGC at 1423, GCGGC at 1338, GCGGC at 1323, CCGGC at 1295, GCGGC at 1255, CCGGC at 1211, GCGGC at 1171, GCGGC at 1148, CCGGC at 1043, GCGGC at 1034, GCGGG at 1026, GCGGC at 1003, GCGGG at 972, CCGGG at 911, GCCGG at 910, GCCGC at 903, GCGGG at 872, CCGGG at 811, GCCGG at 810, GCGGC at 751, CCCGG at 743, GCGGC at 721, GCGGC at 667, GCGGC at 637, GCGGC at 583, GCGGC at 499, GCGGG at 490, GCCGG at 484, CCGGG at 477, GCCGG at 476, CCGGG at 443, GCGGG at 407, CCCGC at 393, CCGGC at 376, CCCGG at 375, GCGGC at 354, GCGGC at 332.
  2. Positive strand, positive direction: CCGGG at 3500, CCCGG at 3499, GCCGC at 1918, CCCGC at 1900, GCCGG at 1848, GCCGG at 1795, CCCGC at 1792, GCGGG at 1770, CCCGC at 1767, GCGGC at 1758, CCGGC at 1755, CCCGG at 1754, CCGGG at 1739, CCCGC at 1717, GCGGG at 1673, GCGGG at 1657, CCGGC at 1647, CCCGG at 1646, GCCGC at 1583, CCGGG at 1570, CCCGC at 1562, GCCGC at 1548, GCGGG at 1499, CCGGC at 1486, GCGGG at 1399, CCGGC at 1386, GCCGC at 1296, GCGGG at 1247, CCCGC at 1226, GCCGC at 1212, GCCGG at 1172, GCGGC at 1163, CCGGG at 1150, GCCGG at 1149, GCGGC at 1079, GCCGC at 1044, GCCGG at 981, CCCGC at 974, CCGGC at 950, GCCGG at 881, CCCGC at 874, CCGGC at 850, CCGGC at 765, GCCGG at 764, GCCGG at 722, GCCGC at 638, GCCGC at 540, GCGGG at 453, CCCGC at 445, CCGGG at 421, CCCGG at 420, CCCGC at 405, CCGGG at 390, CCCGG at 389, CCGGG at 372, GCCGC at 355, CCCGC at 352, GCCGG at 326, CCCGG at 283, CCCGG at 248, CCGGG at 200, CCGGG at 93.

Helper site random dataset samplings

  1. Helpr0: 42, CCGGG at 4333, GCCGG at 4332, CCCGC at 4329, GCCGG at 4259, CCGGC at 4250, CCCGG at 4249, CCGGC at 4078, GCCGG at 4077, CCGGC at 3920, CCCGG at 3919, CCCGG at 3910, GCGGG at 3899, CCCGC at 3889, CCGGG at 3829, CCCGG at 3828, GCGGC at 3547, GCGGC at 3444, CCCGC at 3432, GCCGC at 3406, GCCGC at 3241, GCCGG at 3215, GCGGC at 3112, CCGGG at 3044, GCCGG at 3043, CCGGC at 3004, CCCGG at 3003, CCGGG at 2765, CCCGG at 2631, CCCGC at 2404, GCCGC at 2379, GCGGG at 2325, GCGGG at 2070, CCCGC at 1891, CCCGG at 1684, CCGGC at 1603, CCCGG at 1602, GCCGC at 1383, GCGGG at 970, CCGGG at 822, CCCGG at 821, CCCGC at 624, GCCGC at 370.
  2. Helpr1: 45, CCGGC at 4520, GCCGG at 4480, CCGGC at 4471, CCCGG at 4470, CCGGC at 4305, CCGGC at 4296, CCCGG at 4295, CCCGC at 4280, CCGGC at 4135, CCCGG at 4134, GCCGC at 4114, GCGGC at 3801, CCGGC at 3709, GCCGG at 3708, GCGGC at 3104, GCGGG at 2792, GCGGG at 2774, CCGGG at 2721, GCCGG at 2720, CCCGG at 2667, CCGGG at 2085, CCGGG at 2048, GCCGG at 2047, GCGGC at 1913, CCGGG at 1900, CCCGG at 1899, GCGGG at 1812, CCCGG at 1786, CCCGG at 1418, CCCGC at 1353, GCCGC at 1060, CCCGG at 936, CCGGG at 818, CCGGC at 720, CCCGG at 719, CCGGC at 682, CCCGG at 681, GCGGG at 601, GCCGG at 507, CCCGC at 434, CCGGG at 395, CCCGC at 232, CCGGC at 122, CCCGG at 121, CCCGC at 80.
  3. Helpr2: 56, CCCGC at 4508, GCGGC at 4348, CCGGC at 4228, CCCGG at 4227, CCGGG at 4096, GCCGG at 4095, GCCGG at 4021, GCGGG at 3976, CCGGC at 3973, GCGGG at 3910, CCCGC at 3872, CCCGG at 3864, GCGGC at 3749, GCGGC at 3670, GCCGC at 3585, CCGGG at 3498, CCCGG at 3497, CCGGG at 3293, CCCGG at 3292, GCGGG at 2819, CCGGG at 2814, CCCGG at 2813, CCGGC at 2616, GCCGC at 2597, GCGGC at 2449, CCGGG at 2155, CCCGG at 2154, CCGGG at 2077, GCCGG at 2076, GCGGG at 2037, GCCGC at 1965, GCGGG at 1932, CCGGC at 1908, GCCGC at 1816, CCCGC at 1805, CCGGG at 1666, CCGGG at 1481, CCCGG at 1480, CCGGC at 1303, CCCGG at 1302, CCGGG at 1132, CCGGC at 962, CCCGG at 961, CCCGG at 813, CCGGG at 802, GCCGG at 801, CCGGC at 798, CCGGG at 762, GCCGG at 761, CCCGG at 632, GCGGG at 463, CCGGC at 331, GCCGG at 330, CCGGG at 326, CCCGG at 325, GCGGG at 154.
  4. Helpr3: 50, GCGGG at 4492, CCGGG at 4142, CCCGG at 4141, GCCGC at 4137, CCGGC at 3837, CCCGC at 3811, CCGGG at 3750, CCCGG at 3749, CCCGC at 3728, CCGGG at 3659, CCCGG at 3475, CCGGG at 3424, GCCGG at 3423, GCGGC at 3413, CCGGG at 3401, CCCGG at 3400, GCGGG at 2912, GCCGG at 2896, GCCGG at 2884, GCCGC at 2791, CCGGC at 2724, GCGGC at 2717, GCGGG at 2657, CCCGC at 2521, GCCGC at 2375, CCGGG at 2309, CCCGG at 2032, CCGGG at 1991, GCCGG at 1990, CCGGC at 1967, CCCGG at 1966, CCCGC at 1865, GCGGC at 1764, CCGGG at 1608, CCCGG at 1607, CCGGC at 1455, GCCGG at 1454, GCCGC at 1451, GCGGC at 1442, GCGGG at 1314, CCGGC at 868, GCCGG at 867, CCGGC at 864, GCCGG at 863, CCGGC at 852, CCCGG at 851, CCCGG at 758, GCGGG at 519, CCGGC at 83, GCGGG at 9.
  5. RDr4: 0.
  6. RDr5: 0.
  7. RDr6: 0.
  8. RDr7: 0.
  9. RDr8: 0.
  10. RDr9: 0.
  11. RDr0ci: 0.
  12. RDr1ci: 0.
  13. RDr2ci: 0.
  14. RDr3ci: 0.
  15. RDr4ci: 0.
  16. RDr5ci: 0.
  17. RDr6ci: 0.
  18. RDr7ci: 0.
  19. RDr8ci: 0.
  20. RDr9ci: 0.

Helpr arbitrary (evens) (4560-2846) UTRs

  1. Helpr0: CCGGG at 4333, GCCGG at 4332, CCCGC at 4329, GCCGG at 4259, CCGGC at 4250, CCCGG at 4249, CCGGC at 4078, GCCGG at 4077, CCGGC at 3920, CCCGG at 3919, CCCGG at 3910, GCGGG at 3899, CCCGC at 3889, CCGGG at 3829, CCCGG at 3828, GCGGC at 3547, GCGGC at 3444, CCCGC at 3432, GCCGC at 3406, GCCGC at 3241, GCCGG at 3215, GCGGC at 3112, CCGGG at 3044, GCCGG at 3043, CCGGC at 3004, CCCGG at 3003.
  2. Helpr2: CCCGC at 4508, GCGGC at 4348, CCGGC at 4228, CCCGG at 4227, CCGGG at 4096, GCCGG at 4095, GCCGG at 4021, GCGGG at 3976, CCGGC at 3973, GCGGG at 3910, CCCGC at 3872, CCCGG at 3864, GCGGC at 3749, GCGGC at 3670, GCCGC at 3585, CCGGG at 3498, CCCGG at 3497, CCGGG at 3293, CCCGG at 3292.

Helpr alternate (odds) (4560-2846) UTRs

  1. Helper1: CCGGC at 4520, GCCGG at 4480, CCGGC at 4471, CCCGG at 4470, CCGGC at 4305, CCGGC at 4296, CCCGG at 4295, CCCGC at 4280, CCGGC at 4135, CCCGG at 4134, GCCGC at 4114, GCGGC at 3801, CCGGC at 3709, GCCGG at 3708, GCGGC at 3104.
  2. Helpr3: GCGGG at 4492, CCGGG at 4142, CCCGG at 4141, GCCGC at 4137, CCGGC at 3837, CCCGC at 3811, CCGGG at 3750, CCCGG at 3749, CCCGC at 3728, CCGGG at 3659, CCCGG at 3475, CCGGG at 3424, GCCGG at 3423, GCGGC at 3413, CCGGG at 3401, CCCGG at 3400, GCGGG at 2912, GCCGG at 2896, GCCGG at 2884.

Helpr arbitrary negative direction (evens) (2846-2811) core promoters

  1. Helpr2: GCGGG at 2819, CCGGG at 2814, CCCGG at 2813.

RDr alternate negative direction (odds) (2846-2811) core promoters

Helpr arbitrary positive direction (odds) (4445-4265) core promoters

  1. Helper1: CCGGC at 4305, CCGGC at 4296, CCCGG at 4295, CCCGC at 4280.

Helpr alternate positive direction (evens) (4445-4265) core promoters

  1. Helpr0: CCGGG at 4333, GCCGG at 4332, CCCGC at 4329.
  2. Helpr2: GCGGC at 4348.

Helpr arbitrary negative direction (evens) (2811-2596) proximal promoters

  1. Helpr0: CCGGG at 2765, CCCGG at 2631.
  2. Helpr2: CCGGC at 2616, GCCGC at 2597.

Helpr alternate negative direction (odds) (2811-2596) proximal promoters

  1. Helper1: GCGGG at 2792, GCGGG at 2774, CCGGG at 2721, GCCGG at 2720, CCCGG at 2667.
  2. Helpr3: GCCGC at 2791, CCGGC at 2724, GCGGC at 2717, GCGGG at 2657.

Helpr arbitrary positive direction (odds) (4265-4050) proximal promoters

  1. Helper1: CCGGC at 4135, CCCGG at 4134, GCCGC at 4114.
  2. Helpr3: CCGGG at 4142, CCCGG at 4141, GCCGC at 4137.

Helpr alternate positive direction (evens) (4265-4050) proximal promoters

  1. Helpr0: GCCGG at 4259, CCGGC at 4250, CCCGG at 4249, CCGGC at 4078, GCCGG at 4077.
  2. Helpr2: CCGGC at 4228, CCCGG at 4227, CCGGG at 4096, GCCGG at 4095.

Helpr arbitrary negative direction (evens) (2596-1) distal promoters

  1. Helpr0: CCCGC at 2404, GCCGC at 2379, GCGGG at 2325, GCGGG at 2070, CCCGC at 1891, CCCGG at 1684, CCGGC at 1603, CCCGG at 1602, GCCGC at 1383, GCGGG at 970, CCGGG at 822, CCCGG at 821, CCCGC at 624, GCCGC at 370.
  2. Helpr2: GCGGC at 2449, CCGGG at 2155, CCCGG at 2154, CCGGG at 2077, GCCGG at 2076, GCGGG at 2037, GCCGC at 1965, GCGGG at 1932, CCGGC at 1908, GCCGC at 1816, CCCGC at 1805, CCGGG at 1666, CCGGG at 1481, CCCGG at 1480, CCGGC at 1303, CCCGG at 1302, CCGGG at 1132, CCGGC at 962, CCCGG at 961, CCCGG at 813, CCGGG at 802, GCCGG at 801, CCGGC at 798, CCGGG at 762, GCCGG at 761, CCCGG at 632, GCGGG at 463, CCGGC at 331, GCCGG at 330, CCGGG at 326, CCCGG at 325, GCGGG at 154.

Helpr alternate negative direction (odds) (2596-1) distal promoters

  1. Helper1: CCGGG at 2085, CCGGG at 2048, GCCGG at 2047, GCGGC at 1913, CCGGG at 1900, CCCGG at 1899, GCGGG at 1812, CCCGG at 1786, CCCGG at 1418, CCCGC at 1353, GCCGC at 1060, CCCGG at 936, CCGGG at 818, CCGGC at 720, CCCGG at 719, CCGGC at 682, CCCGG at 681, GCGGG at 601, GCCGG at 507, CCCGC at 434, CCGGG at 395, CCCGC at 232, CCGGC at 122, CCCGG at 121, CCCGC at 80.
  2. Helpr3: CCCGC at 2521, GCCGC at 2375, CCGGG at 2309, CCCGG at 2032, CCGGG at 1991, GCCGG at 1990, CCGGC at 1967, CCCGG at 1966, CCCGC at 1865, GCGGC at 1764, CCGGG at 1608, CCCGG at 1607, CCGGC at 1455, GCCGG at 1454, GCCGC at 1451, GCGGC at 1442, GCGGG at 1314, CCGGC at 868, GCCGG at 867, CCGGC at 864, GCCGG at 863, CCGGC at 852, CCCGG at 851, CCCGG at 758, GCGGG at 519, CCGGC at 83, GCGGG at 9.

Helpr arbitrary positive direction (odds) (4050-1) distal promoters

  1. Helper1: GCGGC at 3801, CCGGC at 3709, GCCGG at 3708, GCGGC at 3104, GCGGG at 2792, GCGGG at 2774, CCGGG at 2721, GCCGG at 2720, CCCGG at 2667, CCGGG at 2085, CCGGG at 2048, GCCGG at 2047, GCGGC at 1913, CCGGG at 1900, CCCGG at 1899, GCGGG at 1812, CCCGG at 1786, CCCGG at 1418, CCCGC at 1353, GCCGC at 1060, CCCGG at 936, CCGGG at 818, CCGGC at 720, CCCGG at 719, CCGGC at 682, CCCGG at 681, GCGGG at 601, GCCGG at 507, CCCGC at 434, CCGGG at 395, CCCGC at 232, CCGGC at 122, CCCGG at 121, CCCGC at 80.
  2. Helpr3: CCGGC at 3837, CCCGC at 3811, CCGGG at 3750, CCCGG at 3749, CCCGC at 3728, CCGGG at 3659, CCCGG at 3475, CCGGG at 3424, GCCGG at 3423, GCGGC at 3413, CCGGG at 3401, CCCGG at 3400, GCGGG at 2912, GCCGG at 2896, GCCGG at 2884, GCCGC at 2791, CCGGC at 2724, GCGGC at 2717, GCGGG at 2657, CCCGC at 2521, GCCGC at 2375, CCGGG at 2309, CCCGG at 2032, CCGGG at 1991, GCCGG at 1990, CCGGC at 1967, CCCGG at 1966, CCCGC at 1865, GCGGC at 1764, CCGGG at 1608, CCCGG at 1607, CCGGC at 1455, GCCGG at 1454, GCCGC at 1451, GCGGC at 1442, GCGGG at 1314, CCGGC at 868, GCCGG at 867, CCGGC at 864, GCCGG at 863, CCGGC at 852, CCCGG at 851, CCCGG at 758, GCGGG at 519, CCGGC at 83, GCGGG at 9.

Helpr alternate positive direction (evens) (4050-1) distal promoters

  1. Helpr0: CCGGC at 3920, CCCGG at 3919, CCCGG at 3910, GCGGG at 3899, CCCGC at 3889, CCGGG at 3829, CCCGG at 3828, GCGGC at 3547, GCGGC at 3444, CCCGC at 3432, GCCGC at 3406, GCCGC at 3241, GCCGG at 3215, GCGGC at 3112, CCGGG at 3044, GCCGG at 3043, CCGGC at 3004, CCCGG at 3003, CCGGG at 2765, CCCGG at 2631, CCCGC at 2404, GCCGC at 2379, GCGGG at 2325, GCGGG at 2070, CCCGC at 1891, CCCGG at 1684, CCGGC at 1603, CCCGG at 1602, GCCGC at 1383, GCGGG at 970, CCGGG at 822, CCCGG at 821, CCCGC at 624, GCCGC at 370.
  2. Helpr2: GCCGG at 4021, GCGGG at 3976, CCGGC at 3973, GCGGG at 3910, CCCGC at 3872, CCCGG at 3864, GCGGC at 3749, GCGGC at 3670, GCCGC at 3585, CCGGG at 3498, CCCGG at 3497, CCGGG at 3293, CCCGG at 3292, GCGGG at 2819, CCGGG at 2814, CCCGG at 2813, CCGGC at 2616, GCCGC at 2597, GCGGC at 2449, CCGGG at 2155, CCCGG at 2154, CCGGG at 2077, GCCGG at 2076, GCGGG at 2037, GCCGC at 1965, GCGGG at 1932, CCGGC at 1908, GCCGC at 1816, CCCGC at 1805, CCGGG at 1666, CCGGG at 1481, CCCGG at 1480, CCGGC at 1303, CCCGG at 1302, CCGGG at 1132, CCGGC at 962, CCCGG at 961, CCCGG at 813, CCGGG at 802, GCCGG at 801, CCGGC at 798, CCGGG at 762, GCCGG at 761, CCCGG at 632, GCGGG at 463, CCGGC at 331, GCCGG at 330, CCGGG at 326, CCCGG at 325, GCGGG at 154.

Helper site analysis and results

"The C clamp carries specificity for a secondary, GC-rich sequence called a “Helper site” [(C/G)C(C/G)G(C/G)] that can occur with variable spacing and orientation relative to the Wnt response element (Atcha et al. 2007; Chang et al. 2008)."[5]

Reals or randoms Promoters direction Numbers Strands Occurrences Averages (± 0.1)
Reals UTR negative 8 2 4 4 ± 1 (--5,+-3)
Randoms UTR arbitrary negative 0 10 0 0
Randoms UTR alternate negative 0 10 0 0
Reals Core negative 0 2 0 0
Randoms Core arbitrary negative 0 10 0 0
Randoms Core alternate negative 0 10 0 0
Reals Core positive 6 2 3 3 ± 1 (-+2,++4)
Randoms Core arbitrary positive 0 10 0 0
Randoms Core alternate positive 0 10 0 0
Reals Proximal negative 3 2 1.5 1.5 ± 0.5 (--2,+-1)
Randoms Proximal arbitrary negative 0 10 0 0
Randoms Proximal alternate negative 0 10 0 0
Reals Proximal positive 4 2 2 2 ± 0 (-+2,++2)
Randoms Proximal arbitrary positive 0 10 0 0
Randoms Proximal alternate positive 0 10 0 0
Reals Distal negative 22 2 11 11 ± 2 (--9,+-13)
Randoms Distal arbitrary negative 0 10 0 0
Randoms Distal alternate negative 0 10 0 0
Reals Distal positive 122 2 61 61 ± 1 (-+60,++62)
Randoms Distal arbitrary positive 0 10 0 0
Randoms Distal alternate positive 0 10 0 0

Comparison:

The occurrences of real Helpers are greater than the randoms. This suggests that the real Helpers are likely active or activable.

C clamp samplings

For the Basic programs testing consensus sequence AAAAAAAA (starting with SuccessablesAAA.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for AAAAAAAA, 0.
  2. positive strand, negative direction, looking for AAAAAAAA, 0.
  3. negative strand, positive direction, looking for AAAAAAAA, 0.
  4. positive strand, positive direction, looking for AAAAAAAA, 0.
  5. inverse complement, negative strand, negative direction, looking for TTTTTTTT, 0.
  6. inverse complement, positive strand, negative direction, looking for TTTTTTTT, 0.
  7. inverse complement, negative strand, positive direction, looking for TTTTTTTT, 0.
  8. inverse complement, positive strand, positive direction, looking for TTTTTTTT, 0.

AAA (4560-2846) UTRs

AAA negative direction (2846-2811) core promoters

AAA positive direction (4445-4265) core promoters

AAA negative direction (2811-2596) proximal promoters

AAA positive direction (4265-4050) proximal promoters

AAA negative direction (2596-1) distal promoters

AAA positive direction (4050-1) distal promoters

C clamp random dataset samplings

  1. RDr0: 0.
  2. RDr1: 0.
  3. RDr2: 0.
  4. RDr3: 0.
  5. RDr4: 0.
  6. RDr5: 0.
  7. RDr6: 0.
  8. RDr7: 0.
  9. RDr8: 0.
  10. RDr9: 0.
  11. RDr0ci: 0.
  12. RDr1ci: 0.
  13. RDr2ci: 0.
  14. RDr3ci: 0.
  15. RDr4ci: 0.
  16. RDr5ci: 0.
  17. RDr6ci: 0.
  18. RDr7ci: 0.
  19. RDr8ci: 0.
  20. RDr9ci: 0.

RDr arbitrary (evens) (4560-2846) UTRs

RDr alternate (odds) (4560-2846) UTRs

RDr arbitrary negative direction (evens) (2846-2811) core promoters

RDr alternate negative direction (odds) (2846-2811) core promoters

RDr arbitrary positive direction (odds) (4445-4265) core promoters

RDr alternate positive direction (evens) (4445-4265) core promoters

RDr arbitrary negative direction (evens) (2811-2596) proximal promoters

RDr alternate negative direction (odds) (2811-2596) proximal promoters

RDr arbitrary positive direction (odds) (4265-4050) proximal promoters

RDr alternate positive direction (evens) (4265-4050) proximal promoters

RDr arbitrary negative direction (evens) (2596-1) distal promoters

RDr alternate negative direction (odds) (2596-1) distal promoters

RDr arbitrary positive direction (odds) (4050-1) distal promoters

RDr alternate positive direction (evens) (4050-1) distal promoters

C clamp analysis and results

"The domain [SCTTTGATS] is called the “C clamp” to highlight the absolute requirement for four cysteine residues in DNA binding (Atcha et al. 2007) [...]."[5]

Reals or randoms Promoters direction Numbers Strands Occurrences Averages (± 0.1)
Reals UTR negative 0 2 0 0
Randoms UTR arbitrary negative 0 10 0 0
Randoms UTR alternate negative 0 10 0 0
Reals Core negative 0 2 0 0
Randoms Core arbitrary negative 0 10 0 0
Randoms Core alternate negative 0 10 0 0
Reals Core positive 0 2 0 0
Randoms Core arbitrary positive 0 10 0 0
Randoms Core alternate positive 0 10 0 0
Reals Proximal negative 0 2 0 0
Randoms Proximal arbitrary negative 0 10 0 0
Randoms Proximal alternate negative 0 10 0 0
Reals Proximal positive 0 2 0 0
Randoms Proximal arbitrary positive 0 10 0 0
Randoms Proximal alternate positive 0 10 0 0
Reals Distal negative 0 2 0 0
Randoms Distal arbitrary negative 0 10 0 0
Randoms Distal alternate negative 0 10 0 0
Reals Distal positive 0 2 0 0
Randoms Distal arbitrary positive 0 10 0 0
Randoms Distal alternate positive 0 10 0 0

Comparison:

The occurrences of real C clamps are greater than the randoms. This suggests that the real C clamps are likely active or activable.

See also

References

  1. Gregory P. Copenhaver, Christopher D. Putnam, Michael L. Denton and Craig S. Pikaard (1994). "The RNA polymerase I transcription factor UBF is a sequence-tolerant HMG-box protein that can recognize structured nucleic acids" (PDF). Nucleic Acids Research. 22 (13): 2651–7. Retrieved 2017-04-05.
  2. 2.0 2.1 Vincent Laudet, Dominique Stehelin and Hans Clevers (1993). "Ancestry and diversity of the HMG box superfamily" (PDF). Nucleic Acids Research. 21 (10): 2493–501. Retrieved 2017-04-05.
  3. 3.0 3.1 3.2 3.3 3.4 Marc van de Wetering, Mariette Oosterwegel, Klaske van Norren and Hans Clevers (1993). "Sox-4, an Sry-like HMG box protein, is a transcriptional activator in lymphocytes" (PDF). The EMBO Journal. 12 (10): .3847–3854. Retrieved 2017-02-13.
  4. Tomas Valenta, Jan Lukas, Vladimir Korinek (2003). "HMG box transcription factor TCF‐4's interaction with CtBP1 controls the expression of the Wnt target Axin2/Conductin in human embryonic kidney cells". Nucleic Acids Research. 31 (9): 2369–80. doi:10.1093/nar/gkg346. Retrieved 2017-04-05.
  5. 5.0 5.1 5.2 5.3 5.4 Ken M. Cadigan and Marian L. Waterman (November 2012). "TCF/LEFs and Wnt Signaling in the Nucleus". Cold Spring Harbor Perspectives in Biology. 4 (11): a007906. doi:10.1101/cshperspect.a007906. PMID 23024173. Retrieved 2023-05-05.
  6. 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 RefSeq (October 2016). "TCF7 transcription factor 7 [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 30 April 2020.
  7. 7.00 7.01 7.02 7.03 7.04 7.05 7.06 7.07 7.08 7.09 7.10 7.11 7.12 7.13 7.14 7.15 7.16 7.17 RefSeq (8 February 2019). "TCF7L2 transcription factor 7 like 2 [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 30 April 2020.
  8. 8.0 8.1 8.2 8.3 8.4 RefSeq (October 2009). "LEF1 lymphoid enhancer binding factor 1 [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 5 April 2020.

External links