HMG box gene transcriptions

Jump to navigation Jump to search
File:Large lymphocytes-9.JPG
This is a large lymphocyte. Credit: Guy Waterval.{{free media}}

"Upstream Binding Factor (UBF) is important for activation of ribosomal RNA transcription and belongs to a family of proteins containing nucleic acid binding domains, termed HMG-boxes, with similarity to High Mobility Group (HMG) chromosomal proteins."[1]

Chromosomal proteins

"Most HMG box proteins contain two or more HMG boxes and appear to bind DNA in a relatively sequence-aspecific manner (5, 13, 15, 16 and references therein). [...] they all appear to bind to the minor groove of the A/T A/T C A A A G-motif (10, 14, 18-20)."[2]

"Previous studies in lymphocytes have described two DNA-binding HMG box proteins, TCF-1 and LEF-1, with affinity for the A/TA/TCAAAG motif found in several T cell-specific enhancers."[3]

"The high mobility group-1 (HMG) box was originaly identified by Tjian and co-workers in the transcription factor UBF as a region of homology to HMG-1 proteins (Jantzen et al., 1990). UBF reportedly contained four such regions of -80 amino acids; one of these boxes was shown to mediate DNA binding."[3]

"Interestingly, the sequence-specific HMG boxes characterized to date display high afinity to the A/TA/TCAAAG motif despite a low level of amino acid homology (typically <25% identity)."[3]

"Human LEF-1 was originally identified as a T cell-specific protein binding to the TTCAAAG motif in the TCR-α enhancer (Waterman et al., 1991)."[3]

"As analysed by gel retardation, the Sox-4 HMG box indeed bound to the AACAAAG motif (probe MWε-1; Figure 2B, lane 1). As described for other HMG boxes, Sox-4 interacted with DNA bases within the minor groove: substitution of A/T pairs for I/C pairs, which leaves the surface of the minor groove intact (Star and Hawley, 1991), had no apparent effect on binding affinity (lanes 2 and 4)."[3]

Consensus sequences

"In mammals, the Tcf/Lef family consists of four genes: Tcf‐1, Lef‐1, Tcf‐3 and Tcf‐4. All TCF/LEF proteins display several common structural features (48,49). They contain a nearly identical DNA‐binding domain, the HMG box, recognizing the consensus sequence A/T A/T CAAA."[4]

"Both directed and random screen studies have identified a consensus recognition sequence for the HMG DBD; 5′-SCTTTGATS-3′ [...] (van de Wetering et al. 1997; van Beest et al. 2000; Hallikas and Taipale 2006; Atcha et al. 2007)."[5]

"The domain [SCTTTGATS] is called the “C clamp” to highlight the absolute requirement for four cysteine residues in DNA binding (Atcha et al. 2007) [...]."[5]

"The C clamp carries specificity for a secondary, GC-rich sequence called a “Helper site” [(C/G)C(C/G)G(C/G)] that can occur with variable spacing and orientation relative to the Wnt response element (Atcha et al. 2007; Chang et al. 2008)."[5]

High mobility group proteins

Gene ID: 6932 is TCF7 transcription factor 7 on 5q31.1: "This gene encodes a member of the T-cell factor/lymphoid enhancer-binding factor family of high mobility group (HMG) box transcriptional activators. This gene is expressed predominantly in T-cells and plays a critical role in natural killer cell and innate lymphoid cell development. The encoded protein forms a complex with beta-catenin and activates transcription through a Wnt/beta-catenin signaling pathway. Mice with a knockout of this gene are viable and fertile, but display a block in T-lymphocyte differentiation. Alternative splicing results in multiple transcript variants. Naturally-occurring isoforms lacking the N-terminal beta-catenin interaction domain may act as dominant negative regulators of Wnt signaling."[6]

  1. NP_001128323.2 transcription factor 7 isoform 3: "Transcript Variant: This variant (3, also known as A), differs in the 5' UTR, has multiple coding region differences, uses a downstream start codon, and differs in the 3' UTR, compared to variant 1. The resulting isoform (3) is shorter at the N-terminus and has a distinct C-terminus, compared to isoform 1."[6]
  2. NP_001333354.1 transcription factor 7 isoform 5: "Transcript Variant: This variant (8) contains an alternate exon in the coding region, compared to variant 1. The resulting isoform (5) is longer, compared to isoform 1."[6]
  3. NP_001333379.1 transcription factor 7 isoform 7: "Transcript Variant: This variant (9) differs in the 5' UTR, has multiple coding region differences, and uses a downstream start codon, compared to variant 1. The resulting isoform (7) is shorter at the N-terminus and has a distinct C-terminus, compared to isoform 1."[6]
  4. NP_001353431.1 transcription factor 7 isoform 8 [variant 10].[6]
  5. NP_003193.2 transcription factor 7 isoform 1: "Transcript Variant: This variant (1) encodes isoform (1)."[6]
  6. NP_963963.1 transcription factor 7 isoform 2: "Transcript Variant: This variant (2, also known as B), differs in the 5' UTR, has multiple coding region differences, and uses a downstream start codon, compared to variant 1. The resulting isoform (2) is shorter at the N-terminus, compared to isoform 1. Both variants 2 and 5 encode the same isoform."[6]
  7. NP_963965.1 transcription factor 7 isoform 4: "Transcript Variant: This variant (4, also known as C), differs in the 5' UTR, has multiple coding region differences, uses a downstream start codon, and differs in the 3' UTR, compared to variant 1. The resulting isoform (4) is shorter at the N-terminus and has a distinct C-terminus, compared to isoform 1."[6]
  8. NP_998813.1 transcription factor 7 isoform 2: "Transcript Variant: This variant (5) differs in the 5' UTR, has multiple coding region differences, and uses a downstream start codon, compared to variant 1. The resulting isoform (2) is shorter at the N-terminus, compared to isoform 1. Both variants 2 and 5 encode the same isoform."[6]

Gene ID: 6934 is TCF7L2 transcription factor 7 like 2 on 10q25.2-q25.3: "This gene encodes a high mobility group (HMG) box-containing transcription factor that plays a key role in the Wnt signaling pathway. The protein has been implicated in blood glucose homeostasis. Genetic variants of this gene are associated with increased risk of type 2 diabetes. Several transcript variants encoding multiple different isoforms have been found for this gene."[7]

  1. NP_001139746.1 transcription factor 7-like 2 isoform 1: "Transcript Variant: This variant (1) encodes the longest isoform."[7]
  2. NP_001139755.1 transcription factor 7-like 2 isoform 3: "Transcript Variant: This variant (3) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 3) has a distinct C-terminus and is shorter than isoform 1."[7]
  3. NP_001139756.1 transcription factor 7-like 2 isoform 4: "Transcript Variant: This variant (4) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 4) has a distinct C-terminus and is shorter than isoform 1."[7]
  4. NP_001139757.1 transcription factor 7-like 2 isoform 5: "Transcript Variant: This variant (5) has multiple differences in the coding region but maintains the reading frame, compared to variant 1. This variant encodes isoform 5, which is shorter than isoform 1."[7]
  5. NP_001139758.1 transcription factor 7-like 2 isoform 6: "Transcript Variant: This variant (6) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 6) has a distinct C-terminus and is shorter than isoform 1."[7]
  6. NP_001185454.1 transcription factor 7-like 2 isoform 7: "Transcript Variant: This variant (7) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 7) has a distinct C-terminus and is shorter than isoform 1."[7]
  7. NP_001185455.1 transcription factor 7-like 2 isoform 8: "Transcript Variant: This variant (8) has multiple differences in the coding region but maintains the reading frame, compared to variant 1. This variant encodes isoform 8, which is shorter than isoform 1."[7]
  8. NP_001185456.1 transcription factor 7-like 2 isoform 9: "Transcript Variant: This variant (9) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 9) has a distinct C-terminus and is shorter than isoform 1."[7]
  9. NP_001185457.1 transcription factor 7-like 2 isoform 10: "Transcript Variant: This variant (10) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 10) has a distinct C-terminus and is shorter than isoform 1."[7]
  10. NP_001185458.1 transcription factor 7-like 2 isoform 11: "Transcript Variant: This variant (11) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 11) has a distinct C-terminus and is shorter than isoform 1."[7]
  11. NP_001185459.1 transcription factor 7-like 2 isoform 12: "Transcript Variant: This variant (12) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 12) has a distinct C-terminus and is shorter than isoform 1."[7]
  12. NP_001185460.1 transcription factor 7-like 2 isoform 13: "Transcript Variant: This variant (13) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 13) has a distinct C-terminus and is shorter than isoform 1."[7]
  13. NP_001336799.1 transcription factor 7-like 2 isoform 14: "Transcript Variant: This variant (14) lacks alternate exons in the 5' UTR, lacks a portion of the 5' coding region, and initiates translation at an alternate start codon, compared to variant 1. The encoded isoform (14) has a distinct N-terminus and is shorter than isoform 1."[7]
  14. NP_001336800.1 transcription factor 7-like 2 isoform 15: "Transcript Variant: This variant (15) lacks alternate exons in the 5' UTR, lacks a portion of the 5' coding region, and initiates translation at an alternate start codon, compared to variant 1. The encoded isoform (15) has a distinct N-terminus and is shorter than isoform 1."[7]
  15. NP_001350430.1 transcription factor 7-like 2 isoform 16 [variant 16].[7]
  16. NP_001354872.1 transcription factor 7-like 2 isoform 17 [variant 17].[7]
  17. NP_110383.2 transcription factor 7-like 2 isoform 2: "Transcript Variant: This variant (2) has multiple differences in the coding region but maintains the reading frame, compared to variant 1. This variant encodes isoform 2, which is shorter than isoform 1."[7]

Gene ID: 51176 is LEF1 lymphoid enhancer binding factor 1: "This gene encodes a transcription factor belonging to a family of proteins that share homology with the high mobility group protein-1. The protein encoded by this gene can bind to a functionally important site in the T-cell receptor-alpha enhancer, thereby conferring maximal enhancer activity. This transcription factor is involved in the Wnt signaling pathway, and it may function in hair cell differentiation and follicle morphogenesis. Mutations in this gene have been found in somatic sebaceous tumors. This gene has also been linked to other cancers, including androgen-independent prostate cancer. Alternative splicing results in multiple transcript variants."[8]

  1. NP_001124185.1 lymphoid enhancer-binding factor 1 isoform 2: "Transcript Variant: This variant (2) lacks an alternate in-frame exon in the central coding region, compared to variant 1, resulting in an isoform (2) that is shorter than isoform 1. [...] SOX-TCF_HMG-box, class I member of the HMG-box superfamily of DNA-binding proteins. These proteins contain a single HMG box, and bind the minor groove of DNA in a highly sequence-specific manner. Members include SRY and its homologs in insects and vertebrates, and transcription factor-like proteins, TCF-1, -3, -4, and LEF-1. They appear to bind the minor groove of the A/T C A A A G/C-motif."[8]
  2. NP_001124186.1 lymphoid enhancer-binding factor 1 isoform 3: "Transcript Variant: This variant (3) lacks both an in-frame exon in the central coding region and an exon in the 3' coding region that causes a frameshift, compared to variant 1. The encoded isoform (3) has a distinct C-terminus and is shorter than isoform 1."[8]
  3. NP_001159591.1 lymphoid enhancer-binding factor 1 isoform 4: "Transcript Variant: This variant (4) differs in the 5' UTR and 5' coding region, and lacks an alternate in-frame exon in the central coding region, compared to variant 1. The encoded isoform (4) has a distinct N-terminus and is shorter than isoform 1."[8]
  4. NP_057353.1 lymphoid enhancer-binding factor 1 isoform 1: "Transcript Variant: This variant (1) represents the longest transcript and encodes the longest isoform (1)."[8]

HMG box samplings

Copying a responsive elements consensus sequence (A/T)(A/T)CAAAG and putting the sequence in "⌘F" finds none between ZNF497 and A1BG or none between ZSCAN22 and A1BG as can be found by the computer programs.

For the Basic programs testing consensus sequence (A/T)(A/T)CAAAG (starting with SuccessablesHMG.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for (A/T)(A/T)CAAAG, 0.
  2. negative strand, positive direction, looking for (A/T)(A/T)CAAAG, 0.
  3. positive strand, negative direction, looking for (A/T)(A/T)CAAAG, 1, ATCAAAG at 2891.
  4. positive strand, positive direction, looking for (A/T)(A/T)CAAAG, 0.
  5. complement, negative strand, negative direction, looking for (A/T)(A/T)GTTTC, 1, TAGTTTC at 2891.
  6. complement, negative strand, positive direction, looking for (A/T)(A/T)GTTTC, 0.
  7. complement, positive strand, negative direction, looking for (A/T)(A/T)GTTTC, 0.
  8. complement, positive strand, positive direction, looking for (A/T)(A/T)GTTTC, 0.
  9. inverse complement, negative strand, negative direction, looking for CTTTG(A/T)(A/T), 2, CTTTGTT at 1585, CTTTGTT at 229.
  10. inverse complement, negative strand, positive direction, looking for CTTTG(A/T)(A/T), 0.
  11. inverse complement, positive strand, negative direction, looking for CTTTG(A/T)(A/T), 0.
  12. inverse complement, positive strand, positive direction, looking for CTTTG(A/T)(A/T), 0.
  13. inverse negative strand, negative direction, looking for GAAAC(A/T)(A/T), 0.
  14. inverse negative strand, positive direction, looking for GAAAC(A/T)(A/T), 0.
  15. inverse positive strand, negative direction, looking for GAAAC(A/T)(A/T), 2, GAAACAA at 1585, GAAACAA at 229.
  16. inverse positive strand, positive direction, looking for GAAAC(A/T)(A/T), 0.

HMG UTRs

  1. Positive strand, negative direction: ATCAAAG at 2891.

HMG distal promoters

  1. Negative strand, negative direction: CTTTGTT at 1585, CTTTGTT at 229.

HMG random dataset samplings

  1. HMGr0: 3, TTCAAAG at 4166, AACAAAG at 3503, TTCAAAG at 3338.
  2. HMGr1: 2, TACAAAG at 3722, TACAAAG at 2071.
  3. HMGr2: 0.
  4. HMGr3: 1, AACAAAG at 278.
  5. HMGr4: 3, TACAAAG at 3777, AACAAAG at 3593, ATCAAAG at 672.
  6. HMGr5: 1, TACAAAG at 3734.
  7. HMGr6: 1, TACAAAG at 1499.
  8. HMGr7: 2, ATCAAAG at 2949, TTCAAAG at 252.
  9. HMGr8: 4, AACAAAG at 2658, ATCAAAG at 1142, TTCAAAG at 935, AACAAAG at 620.
  10. HMGr9: 0.
  11. HMGr0ci: 5, CTTTGTT at 4178, CTTTGAT at 3842, CTTTGAA at 3415, CTTTGTA at 986, CTTTGTT at 617.
  12. HMGr1ci: 1, CTTTGTT at 1640.
  13. HMGr2ci: 0.
  14. HMGr3ci: 0.
  15. HMGr4ci: 0.
  16. HMGr5ci: 1, CTTTGTT at 1983.
  17. HMGr6ci: 2, CTTTGAA at 1712, CTTTGAT at 257.
  18. HMGr7ci: 0.
  19. HMGr8ci: 3, CTTTGTA at 2944, CTTTGTT at 1167, CTTTGAT at 1149.
  20. HMGr9ci: 0.

HMGr arbitrary UTRs

  1. HMGr0: TTCAAAG at 4166, AACAAAG at 3503, TTCAAAG at 3338.
  2. HMGr4: TACAAAG at 3777, AACAAAG at 3593.
  3. HMGr0ci: CTTTGTT at 4178, CTTTGAT at 3842, CTTTGAA at 3415.
  4. HMGr8ci: CTTTGTA at 2944.

HMGr alternate UTRs

  1. HMGr1: TACAAAG at 3722.
  2. HMGr5: TACAAAG at 3734.
  3. HMGr7: ATCAAAG at 2949.

HMGr arbitrary negative direction proximal promoters

  1. HMGr8: AACAAAG at 2658.

HMGr alternate positive direction proximal promoters

  1. HMGr0: TTCAAAG at 4166.
  2. HMGr0ci: CTTTGTT at 4178.

HMGr arbitrary negative direction distal promoters

  1. HMGr4: ATCAAAG at 672.
  2. HMGr6: TACAAAG at 1499.
  3. HMGr8: ATCAAAG at 1142, TTCAAAG at 935, AACAAAG at 620.
  4. HMGr0ci: CTTTGTA at 986, CTTTGTT at 617.
  5. HMGr6ci: CTTTGAA at 1712, CTTTGAT at 257.
  6. HMGr8ci: CTTTGTT at 1167, CTTTGAT at 1149.

HMGr alternate negative direction distal promoters

  1. HMGr1: TACAAAG at 2071.
  2. HMGr3: AACAAAG at 278.
  3. HMGr7: TTCAAAG at 252.
  4. HMGr1ci: CTTTGTT at 1640.
  5. HMGr5ci: CTTTGTT at 1983.

HMGr arbitrary positive direction distal promoters

  1. HMGr1: TACAAAG at 3722, TACAAAG at 2071.
  2. HMGr3: AACAAAG at 278.
  3. HMGr5: TACAAAG at 3734.
  4. HMGr7: ATCAAAG at 2949, TTCAAAG at 252.
  5. HMGr1ci: CTTTGTT at 1640.
  6. HMGr5ci: CTTTGTT at 1983.

HMGr alternate positive direction distal promoters

  1. HMGr0: AACAAAG at 3503, TTCAAAG at 3338.
  2. HMGr4: TACAAAG at 3777, AACAAAG at 3593, ATCAAAG at 672.
  3. HMGr6: TACAAAG at 1499.
  4. HMGr8: AACAAAG at 2658, ATCAAAG at 1142, TTCAAAG at 935, AACAAAG at 620.
  5. HMGr0ci: CTTTGAT at 3842, CTTTGAA at 3415, CTTTGTA at 986, CTTTGTT at 617.
  6. HMGr6ci: CTTTGAA at 1712, CTTTGAT at 257.
  7. HMGr8ci: CTTTGTA at 2944, CTTTGTT at 1167, CTTTGAT at 1149.

HMG box analysis and results

"Most HMG box proteins contain two or more HMG boxes and appear to bind DNA in a relatively sequence-aspecific manner (5, 13, 15, 16 and references therein). [...] they all appear to bind to the minor groove of the A/T A/T C A A A G-motif (10, 14, 18-20)."[2]

Reals or randoms Promoters direction Numbers Strands Occurrences Averages (± 0.1)
Reals UTR negative 1 2 0.5 0.5 ± 0.5 (--0,+-1)
Randoms UTR arbitrary negative 9 10 0.9 0.6
Randoms UTR alternate negative 3 10 0.3 0.6
Reals Core negative 0 2 0 0
Randoms Core arbitrary negative 0 10 0 0
Randoms Core alternate negative 0 10 0 0
Reals Core positive 0 2 0 0
Randoms Core arbitrary positive 0 10 0 0
Randoms Core alternate positive 0 10 0 0
Reals Proximal negative 0 2 0 0
Randoms Proximal arbitrary negative 1 10 0.1 0.05
Randoms Proximal alternate negative 0 10 0 0.05
Reals Proximal positive 0 2 0 0
Randoms Proximal arbitrary positive 0 10 0 0.1
Randoms Proximal alternate positive 2 10 0.2 0.1
Reals Distal negative 2 2 1 1 ± 1 (--2,+-0)
Randoms Distal arbitrary negative 11 10 1.1 0.8
Randoms Distal alternate negative 5 10 0.5 0.8
Reals Distal positive 0 2 0 0
Randoms Distal arbitrary positive 8 10 0.8 1.35
Randoms Distal alternate positive 19 10 1.9 1.35

Comparison:

The occurrences of real HMG boxes are greater than the randoms. This suggests that the real HMG boxes are likely active or activable.

Helper site samplings

For the Basic programs testing consensus sequence (C/G)C(C/G)G(C/G) (starting with SuccessablesHelp.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. Negative strand, negative direction: 16, CCGGG at 3929, CCGGC at 3874, CCGGG at 3576, CCCGG at 3567, CCCGC at 3044, GCCGC at 2726, CCCGC at 2723, CCCGC at 2012, CCCGC at 1894, CCCGC at 1808, CCCGC at 1759, CCCGC at 1241, GCGGC at 1154, CCGGC at 512, CCCGG at 511, CCGGC at 373.
  2. Positive strand, negative direction: 17, GCCGG at 4324, GCGGG at 4000, GCGGG at 3091, GCGGC at 2725, CCGGG at 2318, GCCGG at 2317, CCGGG at 2192, GCGGC at 1753, GCGGG at 1681, GCGGG at 1251, CCGGG at 1239, GCCGG at 1238, GCGGC at 957, CCGGG at 514, GCCGG at 513, CCGGG at 375, GCCGG at 374.
  3. Negative strand, positive direction: 64, GCGGG at 4440, GCGGG at 4430, CCGGG at 4245, CCCGC at 4237, CCGGC at 4003, CCCGG at 4002, GCGGG at 3671, CCGGG at 3558, CCCGG at 3557, CCCGC at 3325, GCCGC at 3226, GCGGG at 2486, GCCGC at 2355, GCGGC at 1902, CCGGC at 1847, GCGGC at 1794, GCGGG at 1765, GCCGG at 1759, GCCGC at 1756, GCGGG at 1707, GCGGG at 1681, GCCGC at 1648, GCGGC at 1637, GCGGG at 1591, GCGGC at 1582, CCGGC at 1547, GCGGC at 1438, GCGGC at 1423, GCGGC at 1338, GCGGC at 1323, CCGGC at 1295, GCGGC at 1255, CCGGC at 1211, GCGGC at 1171, GCGGC at 1148, CCGGC at 1043, GCGGC at 1034, GCGGG at 1026, GCGGC at 1003, GCGGG at 972, CCGGG at 911, GCCGG at 910, GCCGC at 903, GCGGG at 872, CCGGG at 811, GCCGG at 810, GCGGC at 751, CCCGG at 743, GCGGC at 721, GCGGC at 667, GCGGC at 637, GCGGC at 583, GCGGC at 499, GCGGG at 490, GCCGG at 484, CCGGG at 477, GCCGG at 476, CCGGG at 443, GCGGG at 407, CCCGC at 393, CCGGC at 376, CCCGG at 375, GCGGC at 354, GCGGC at 332.
  4. Positive strand, positive direction: 68, CCCGC at 4438, CCCGC at 4428, CCCGG at 4304, GCGGG at 4292, CCGGG at 4228, CCCGG at 4227, CCGGG at 3500, CCCGG at 3499, GCCGC at 1918, CCCGC at 1900, GCCGG at 1848, GCCGG at 1795, CCCGC at 1792, GCGGG at 1770, CCCGC at 1767, GCGGC at 1758, CCGGC at 1755, CCCGG at 1754, CCGGG at 1739, CCCGC at 1717, GCGGG at 1673, GCGGG at 1657, CCGGC at 1647, CCCGG at 1646, GCCGC at 1583, CCGGG at 1570, CCCGC at 1562, GCCGC at 1548, GCGGG at 1499, CCGGC at 1486, GCGGG at 1399, CCGGC at 1386, GCCGC at 1296, GCGGG at 1247, CCCGC at 1226, GCCGC at 1212, GCCGG at 1172, GCGGC at 1163, CCGGG at 1150, GCCGG at 1149, GCGGC at 1079, GCCGC at 1044, GCCGG at 981, CCCGC at 974, CCGGC at 950, GCCGG at 881, CCCGC at 874, CCGGC at 850, CCGGC at 765, GCCGG at 764, GCCGG at 722, GCCGC at 638, GCCGC at 540, GCGGG at 453, CCCGC at 445, CCGGG at 421, CCCGG at 420, CCCGC at 405, CCGGG at 390, CCCGG at 389, CCGGG at 372, GCCGC at 355, CCCGC at 352, GCCGG at 326, CCCGG at 283, CCCGG at 248, CCGGG at 200, CCGGG at 93.
  5. Helper boxes (Helper)s ci(C/G)C(C/G)G(C/G) = direct(C/G)C(C/G)G(C/G).

Helper (4560-2846) UTRs

  1. Negative strand, negative direction: CCGGG at 3929, CCGGC at 3874, CCGGG at 3576, CCCGG at 3567, CCCGC at 3044.
  2. Positive strand, negative direction: GCCGG at 4324, GCGGG at 4000, GCGGG at 3091.

Helper positive direction (4445-4265) core promoters

  1. Negative strand, positive direction: GCGGG at 4440, GCGGG at 4430.
  2. Positive strand, positive direction: CCCGC at 4438, CCCGC at 4428, CCCGG at 4304, GCGGG at 4292.

Helper negative direction (2811-2596) proximal promoters

  1. Negative strand, negative direction: GCCGC at 2726, CCCGC at 2723.
  2. Positive strand, negative direction: GCGGC at 2725.

Helper positive direction (4265-4050) proximal promoters

  1. Negative strand, positive direction: CCGGG at 4245, CCCGC at 4237.
  2. Positive strand, positive direction: CCGGG at 4228, CCCGG at 4227.

Helper negative direction (2596-1) distal promoters

  1. Negative strand, negative direction: CCCGC at 2012, CCCGC at 1894, CCCGC at 1808, CCCGC at 1759, CCCGC at 1241, GCGGC at 1154, CCGGC at 512, CCCGG at 511, CCGGC at 373.
  2. Positive strand, negative direction: CCGGG at 2318, GCCGG at 2317, CCGGG at 2192, GCGGC at 1753, GCGGG at 1681, GCGGG at 1251, CCGGG at 1239, GCCGG at 1238, GCGGC at 957, CCGGG at 514, GCCGG at 513, CCGGG at 375, GCCGG at 374.

Helper positive direction (4050-1) distal promoters

  1. Negative strand, positive direction: CCGGC at 4003, CCCGG at 4002, GCGGG at 3671, CCGGG at 3558, CCCGG at 3557, CCCGC at 3325, GCCGC at 3226, GCGGG at 2486, GCCGC at 2355, GCGGC at 1902, CCGGC at 1847, GCGGC at 1794, GCGGG at 1765, GCCGG at 1759, GCCGC at 1756, GCGGG at 1707, GCGGG at 1681, GCCGC at 1648, GCGGC at 1637, GCGGG at 1591, GCGGC at 1582, CCGGC at 1547, GCGGC at 1438, GCGGC at 1423, GCGGC at 1338, GCGGC at 1323, CCGGC at 1295, GCGGC at 1255, CCGGC at 1211, GCGGC at 1171, GCGGC at 1148, CCGGC at 1043, GCGGC at 1034, GCGGG at 1026, GCGGC at 1003, GCGGG at 972, CCGGG at 911, GCCGG at 910, GCCGC at 903, GCGGG at 872, CCGGG at 811, GCCGG at 810, GCGGC at 751, CCCGG at 743, GCGGC at 721, GCGGC at 667, GCGGC at 637, GCGGC at 583, GCGGC at 499, GCGGG at 490, GCCGG at 484, CCGGG at 477, GCCGG at 476, CCGGG at 443, GCGGG at 407, CCCGC at 393, CCGGC at 376, CCCGG at 375, GCGGC at 354, GCGGC at 332.
  2. Positive strand, positive direction: CCGGG at 3500, CCCGG at 3499, GCCGC at 1918, CCCGC at 1900, GCCGG at 1848, GCCGG at 1795, CCCGC at 1792, GCGGG at 1770, CCCGC at 1767, GCGGC at 1758, CCGGC at 1755, CCCGG at 1754, CCGGG at 1739, CCCGC at 1717, GCGGG at 1673, GCGGG at 1657, CCGGC at 1647, CCCGG at 1646, GCCGC at 1583, CCGGG at 1570, CCCGC at 1562, GCCGC at 1548, GCGGG at 1499, CCGGC at 1486, GCGGG at 1399, CCGGC at 1386, GCCGC at 1296, GCGGG at 1247, CCCGC at 1226, GCCGC at 1212, GCCGG at 1172, GCGGC at 1163, CCGGG at 1150, GCCGG at 1149, GCGGC at 1079, GCCGC at 1044, GCCGG at 981, CCCGC at 974, CCGGC at 950, GCCGG at 881, CCCGC at 874, CCGGC at 850, CCGGC at 765, GCCGG at 764, GCCGG at 722, GCCGC at 638, GCCGC at 540, GCGGG at 453, CCCGC at 445, CCGGG at 421, CCCGG at 420, CCCGC at 405, CCGGG at 390, CCCGG at 389, CCGGG at 372, GCCGC at 355, CCCGC at 352, GCCGG at 326, CCCGG at 283, CCCGG at 248, CCGGG at 200, CCGGG at 93.

Helper site random dataset samplings

  1. Helpr0: 42, CCGGG at 4333, GCCGG at 4332, CCCGC at 4329, GCCGG at 4259, CCGGC at 4250, CCCGG at 4249, CCGGC at 4078, GCCGG at 4077, CCGGC at 3920, CCCGG at 3919, CCCGG at 3910, GCGGG at 3899, CCCGC at 3889, CCGGG at 3829, CCCGG at 3828, GCGGC at 3547, GCGGC at 3444, CCCGC at 3432, GCCGC at 3406, GCCGC at 3241, GCCGG at 3215, GCGGC at 3112, CCGGG at 3044, GCCGG at 3043, CCGGC at 3004, CCCGG at 3003, CCGGG at 2765, CCCGG at 2631, CCCGC at 2404, GCCGC at 2379, GCGGG at 2325, GCGGG at 2070, CCCGC at 1891, CCCGG at 1684, CCGGC at 1603, CCCGG at 1602, GCCGC at 1383, GCGGG at 970, CCGGG at 822, CCCGG at 821, CCCGC at 624, GCCGC at 370.
  2. Helpr1: 45, CCGGC at 4520, GCCGG at 4480, CCGGC at 4471, CCCGG at 4470, CCGGC at 4305, CCGGC at 4296, CCCGG at 4295, CCCGC at 4280, CCGGC at 4135, CCCGG at 4134, GCCGC at 4114, GCGGC at 3801, CCGGC at 3709, GCCGG at 3708, GCGGC at 3104, GCGGG at 2792, GCGGG at 2774, CCGGG at 2721, GCCGG at 2720, CCCGG at 2667, CCGGG at 2085, CCGGG at 2048, GCCGG at 2047, GCGGC at 1913, CCGGG at 1900, CCCGG at 1899, GCGGG at 1812, CCCGG at 1786, CCCGG at 1418, CCCGC at 1353, GCCGC at 1060, CCCGG at 936, CCGGG at 818, CCGGC at 720, CCCGG at 719, CCGGC at 682, CCCGG at 681, GCGGG at 601, GCCGG at 507, CCCGC at 434, CCGGG at 395, CCCGC at 232, CCGGC at 122, CCCGG at 121, CCCGC at 80.
  3. Helpr2: 56, CCCGC at 4508, GCGGC at 4348, CCGGC at 4228, CCCGG at 4227, CCGGG at 4096, GCCGG at 4095, GCCGG at 4021, GCGGG at 3976, CCGGC at 3973, GCGGG at 3910, CCCGC at 3872, CCCGG at 3864, GCGGC at 3749, GCGGC at 3670, GCCGC at 3585, CCGGG at 3498, CCCGG at 3497, CCGGG at 3293, CCCGG at 3292, GCGGG at 2819, CCGGG at 2814, CCCGG at 2813, CCGGC at 2616, GCCGC at 2597, GCGGC at 2449, CCGGG at 2155, CCCGG at 2154, CCGGG at 2077, GCCGG at 2076, GCGGG at 2037, GCCGC at 1965, GCGGG at 1932, CCGGC at 1908, GCCGC at 1816, CCCGC at 1805, CCGGG at 1666, CCGGG at 1481, CCCGG at 1480, CCGGC at 1303, CCCGG at 1302, CCGGG at 1132, CCGGC at 962, CCCGG at 961, CCCGG at 813, CCGGG at 802, GCCGG at 801, CCGGC at 798, CCGGG at 762, GCCGG at 761, CCCGG at 632, GCGGG at 463, CCGGC at 331, GCCGG at 330, CCGGG at 326, CCCGG at 325, GCGGG at 154.
  4. Helpr3: 50, GCGGG at 4492, CCGGG at 4142, CCCGG at 4141, GCCGC at 4137, CCGGC at 3837, CCCGC at 3811, CCGGG at 3750, CCCGG at 3749, CCCGC at 3728, CCGGG at 3659, CCCGG at 3475, CCGGG at 3424, GCCGG at 3423, GCGGC at 3413, CCGGG at 3401, CCCGG at 3400, GCGGG at 2912, GCCGG at 2896, GCCGG at 2884, GCCGC at 2791, CCGGC at 2724, GCGGC at 2717, GCGGG at 2657, CCCGC at 2521, GCCGC at 2375, CCGGG at 2309, CCCGG at 2032, CCGGG at 1991, GCCGG at 1990, CCGGC at 1967, CCCGG at 1966, CCCGC at 1865, GCGGC at 1764, CCGGG at 1608, CCCGG at 1607, CCGGC at 1455, GCCGG at 1454, GCCGC at 1451, GCGGC at 1442, GCGGG at 1314, CCGGC at 868, GCCGG at 867, CCGGC at 864, GCCGG at 863, CCGGC at 852, CCCGG at 851, CCCGG at 758, GCGGG at 519, CCGGC at 83, GCGGG at 9.
  5. Helpr4: 54, CCGGG at 4238, CCGGG at 4210, CCCGG at 4209, GCGGC at 4109, GCGGG at 4087, CCGGC at 4011, GCCGG at 4010, GCGGC at 3556, CCGGC at 3459, GCCGG at 3393, GCGGC at 2614, GCCGG at 2523, GCGGG at 2431, GCGGG at 2411, GCCGC at 2344, CCGGG at 2157, CCCGG at 2156, GCGGG at 2088, GCGGG at 1875, CCGGG at 1816, GCGGG at 1753, GCGGG at 1649, CCGGG at 1624, GCGGC at 1582, GCGGG at 1510, CCGGG at 1465, CCCGG at 1464, CCGGG at 1220, CCCGC at 1198, GCCGC at 1091, GCCGC at 1088, CCGGC at 1025, GCCGG at 1024, GCCGC at 1021, CCCGG at 1013, GCCGG at 878, CCCGC at 875, CCGGG at 808, GCCGG at 807, CCCGG at 754, GCGGG at 712, GCCGC at 586, GCGGG at 566, GCCGG at 504, CCCGC at 501, GCCGG at 354, CCGGG at 349, CCCGG at 348, GCGGC at 218, CCCGC at 209, GCCGG at 142, CCCGC at 103, GCCGC at 79, GCCGC at 16.
  6. Helpr5: 53, CCGGG at 4447, GCCGG at 4446, GCCGC at 4438, CCGGG at 4426, GCCGG at 4425, GCCGC at 4352, GCGGG at 4277, CCGGG at 4231, GCCGG at 4230, CCCGC at 4098, CCGGG at 4048, GCGGC at 3752, CCGGG at 3453, CCCGG at 3335, GCCGC at 3323, CCCGG at 3028, GCCGG at 3022, CCGGG at 3016, CCCGG at 3015, GCGGG at 2972, GCGGC at 2932, CCCGC at 2814, GCCGC at 2758, CCGGG at 2691, GCCGG at 2690, CCCGC at 2265, GCGGG at 2231, GCGGC at 2166, GCGGG at 2035, CCGGG at 2029, CCCGG at 2028, CCCGC at 2020, CCCGC at 1972, GCGGG at 1754, CCGGG at 1741, CCCGG at 1740, GCGGC at 1700, CCGGC at 1433, CCGGG at 1224, CCGGC at 1081, GCCGG at 1080, CCCGC at 986, GCGGG at 966, CCCGC at 791, GCGGG at 735, GCGGC at 685, GCGGC at 678, CCGGC at 662, GCCGC at 596, GCGGC at 593, GCGGG at 581, CCCGC at 348, GCGGG at 178.
  7. Helpr6: 38, GCGGC at 4434, GCGGC at 4387, CCCGC at 4336, GCGGG at 4289, CCGGC at 4286, GCCGG at 4285, CCCGC at 4273, CCGGG at 3736, CCCGG at 3735, CCCGC at 3588, GCGGC at 3426, GCCGG at 3412, CCCGG at 3388, CCGGG at 3340, CCCGC at 3318, CCGGC at 3083, GCCGG at 3082, GCCGG at 2612, CCGGC at 2428, CCCGG at 2365, GCGGG at 2300, GCGGG at 2285, GCCGG at 2266, GCCGC at 2249, CCCGC at 2190, CCGGG at 1920, CCGGC at 1799, GCGGC at 1665, GCGGC at 1561, CCGGG at 1450, CCCGG at 1449, CCGGG at 1408, GCCGC at 1279, GCCGC at 1219, CCGGC at 1075, CCCGG at 1074, CCCGC at 377, CCCGC at 290.
  8. Helpr7: 49, GCCGC at 4386, GCCGC at 4213, GCGGG at 4159, CCGGC at 3982, GCCGG at 3981, GCCGC at 3954, CCGGG at 3943, GCCGG at 3942, CCCGG at 3863, GCGGC at 3615, CCGGC at 3498, GCCGG at 3497, GCCGG at 3301, CCGGG at 3088, CCCGG at 3087, GCCGG at 2994, GCGGG at 2748, CCCGC at 2745, CCGGC at 2733, CCCGG at 2732, CCGGG at 2486, GCCGG at 2485, GCGGG at 2425, CCCGC at 2160, CCGGG at 2103, GCCGG at 2102, GCCGC at 1910, CCGGC at 1907, CCGGG at 1843, CCCGG at 1842, GCGGG at 1778, GCCGC at 1769, CCGGG at 1374, CCCGG at 1373, GCCGG at 1349, CCGGG at 1094, CCCGG at 1093, GCGGG at 1065, GCGGG at 1045, GCGGC at 774, CCGGC at 688, GCCGC at 656, CCCGG at 620, CCGGG at 448, CCCGG at 447, CCGGC at 163, CCCGG at 162, CCGGG at 57, CCCGG at 56.
  9. Helpr8: 59, CCGGG at 4378, GCCGG at 4377, CCGGC at 4318, GCCGG at 4317, GCGGG at 4283, GCGGC at 4280, GCGGG at 4122, CCGGG at 4118, CCCGG at 4117, GCGGG at 4091, CCCGC at 4014, CCGGG at 3866, GCCGG at 3865, CCGGC at 3862, CCCGG at 3861, CCGGG at 3735, CCGGG at 3723, CCGGG at 3457, CCCGG at 3456, CCGGG at 3316, GCCGG at 3315, GCGGC at 3286, GCGGG at 3243, CCGGG at 3134, CCCGG at 3133, GCCGG at 2691, CCCGC at 2572, GCCGC at 2517, GCCGC at 2472, GCGGG at 2376, CCCGC at 2213, GCGGC at 2028, CCGGG at 1971, CCCGC at 1571, GCGGG at 1331, CCGGG at 1309, GCCGG at 1308, CCGGG at 1275, CCCGG at 1274, CCCGC at 1262, CCGGG at 1245, CCCGG at 1244, GCGGG at 1235, GCGGG at 1191, CCGGG at 1094, CCCGG at 1093, CCGGG at 1063, CCCGG at 1054, CCGGG at 838, GCCGG at 837, GCCGC at 760, GCGGC at 713, GCCGG at 682, CCCGC at 627, CCCGG at 512, GCGGG at 296, GCGGG at 242, GCCGC at 134, GCCGG at 49.
  10. Helpr9: 48, GCCGG at 4491, GCCGC at 4309, CCCGG at 4231, GCGGG at 3999, GCCGG at 3982, GCGGG at 3962, CCCGG at 3918, GCGGC at 3896, GCGGC at 3893, GCCGC at 3883, CCGGC at 3810, GCGGC at 3628, GCGGC at 3536, CCGGG at 3341, CCGGG at 3161, CCCGG at 3160, GCGGG at 3035, GCGGG at 2855, CCCGC at 2760, GCCGC at 2665, GCGGG at 2653, CCGGC at 2551, CCCGG at 2550, CCCGC at 2503, GCCGC at 2448, GCGGG at 2135, GCGGC at 2069, GCCGG at 1842, GCGGC at 1727, CCGGC at 1687, CCCGG at 1686, CCCGC at 1648, GCCGG at 1611, GCGGC at 1427, GCCGC at 1414, CCGGG at 1368, GCCGG at 1367, CCGGC at 1040, GCGGC at 887, CCCGC at 690, CCGGC at 633, CCCGG at 632, CCGGG at 597, GCCGC at 568, GCCGG at 544, CCCGC at 323, CCGGG at 168, GCCGG at 167.

Helpr arbitrary (evens) (4560-2846) UTRs

  1. Helpr0: CCGGG at 4333, GCCGG at 4332, CCCGC at 4329, GCCGG at 4259, CCGGC at 4250, CCCGG at 4249, CCGGC at 4078, GCCGG at 4077, CCGGC at 3920, CCCGG at 3919, CCCGG at 3910, GCGGG at 3899, CCCGC at 3889, CCGGG at 3829, CCCGG at 3828, GCGGC at 3547, GCGGC at 3444, CCCGC at 3432, GCCGC at 3406, GCCGC at 3241, GCCGG at 3215, GCGGC at 3112, CCGGG at 3044, GCCGG at 3043, CCGGC at 3004, CCCGG at 3003.
  2. Helpr2: CCCGC at 4508, GCGGC at 4348, CCGGC at 4228, CCCGG at 4227, CCGGG at 4096, GCCGG at 4095, GCCGG at 4021, GCGGG at 3976, CCGGC at 3973, GCGGG at 3910, CCCGC at 3872, CCCGG at 3864, GCGGC at 3749, GCGGC at 3670, GCCGC at 3585, CCGGG at 3498, CCCGG at 3497, CCGGG at 3293, CCCGG at 3292.
  3. Helpr4: CCGGG at 4238, CCGGG at 4210, CCCGG at 4209, GCGGC at 4109, GCGGG at 4087, CCGGC at 4011, GCCGG at 4010, GCGGC at 3556, CCGGC at 3459, GCCGG at 3393.
  4. Helpr6: GCGGC at 4434, GCGGC at 4387, CCCGC at 4336, GCGGG at 4289, CCGGC at 4286, GCCGG at 4285, CCCGC at 4273, CCGGG at 3736, CCCGG at 3735, CCCGC at 3588, GCGGC at 3426, GCCGG at 3412, CCCGG at 3388, CCGGG at 3340, CCCGC at 3318, CCGGC at 3083, GCCGG at 3082.
  5. Helpr8: CCGGG at 4378, GCCGG at 4377, CCGGC at 4318, GCCGG at 4317, GCGGG at 4283, GCGGC at 4280, GCGGG at 4122, CCGGG at 4118, CCCGG at 4117, GCGGG at 4091, CCCGC at 4014, CCGGG at 3866, GCCGG at 3865, CCGGC at 3862, CCCGG at 3861, CCGGG at 3735, CCGGG at 3723, CCGGG at 3457, CCCGG at 3456, CCGGG at 3316, GCCGG at 3315, GCGGC at 3286, GCGGG at 3243, CCGGG at 3134, CCCGG at 3133.

Helpr alternate (odds) (4560-2846) UTRs

  1. Helper1: CCGGC at 4520, GCCGG at 4480, CCGGC at 4471, CCCGG at 4470, CCGGC at 4305, CCGGC at 4296, CCCGG at 4295, CCCGC at 4280, CCGGC at 4135, CCCGG at 4134, GCCGC at 4114, GCGGC at 3801, CCGGC at 3709, GCCGG at 3708, GCGGC at 3104.
  2. Helpr3: GCGGG at 4492, CCGGG at 4142, CCCGG at 4141, GCCGC at 4137, CCGGC at 3837, CCCGC at 3811, CCGGG at 3750, CCCGG at 3749, CCCGC at 3728, CCGGG at 3659, CCCGG at 3475, CCGGG at 3424, GCCGG at 3423, GCGGC at 3413, CCGGG at 3401, CCCGG at 3400, GCGGG at 2912, GCCGG at 2896, GCCGG at 2884.
  3. Helpr5: CCGGG at 4447, GCCGG at 4446, GCCGC at 4438, CCGGG at 4426, GCCGG at 4425, GCCGC at 4352, GCGGG at 4277, CCGGG at 4231, GCCGG at 4230, CCCGC at 4098, CCGGG at 4048, GCGGC at 3752, CCGGG at 3453, CCCGG at 3335, GCCGC at 3323, CCCGG at 3028, GCCGG at 3022, CCGGG at 3016, CCCGG at 3015, GCGGG at 2972, GCGGC at 2932.
  4. Helpr7: GCCGC at 4386, GCCGC at 4213, GCGGG at 4159, CCGGC at 3982, GCCGG at 3981, GCCGC at 3954, CCGGG at 3943, GCCGG at 3942, CCCGG at 3863, GCGGC at 3615, CCGGC at 3498, GCCGG at 3497, GCCGG at 3301, CCGGG at 3088, CCCGG at 3087, GCCGG at 2994.
  5. Helpr9: GCCGG at 4491, GCCGC at 4309, CCCGG at 4231, GCGGG at 3999, GCCGG at 3982, GCGGG at 3962, CCCGG at 3918, GCGGC at 3896, GCGGC at 3893, GCCGC at 3883, CCGGC at 3810, GCGGC at 3628, GCGGC at 3536, CCGGG at 3341, CCGGG at 3161, CCCGG at 3160, GCGGG at 3035, GCGGG at 2855.

Helpr arbitrary negative direction (evens) (2846-2811) core promoters

  1. Helpr2: GCGGG at 2819, CCGGG at 2814, CCCGG at 2813.

Helpr alternate negative direction (odds) (2846-2811) core promoters

  1. Helpr5: CCCGC at 2814.

Helpr arbitrary positive direction (odds) (4445-4265) core promoters

  1. Helpr1: CCGGC at 4305, CCGGC at 4296, CCCGG at 4295, CCCGC at 4280.
  2. Helpr5: GCCGC at 4438, CCGGG at 4426, GCCGG at 4425, GCCGC at 4352, GCGGG at 4277.
  3. Helpr7: GCCGC at 4386.
  4. Helpr9: GCCGC at 4309.

Helpr alternate positive direction (evens) (4445-4265) core promoters

  1. Helpr0: CCGGG at 4333, GCCGG at 4332, CCCGC at 4329.
  2. Helpr2: GCGGC at 4348.
  3. Helpr6: GCGGC at 4434, GCGGC at 4387, CCCGC at 4336, GCGGG at 4289, CCGGC at 4286, GCCGG at 4285, CCCGC at 4273.
  4. Helpr8: CCGGG at 4378, GCCGG at 4377, CCGGC at 4318, GCCGG at 4317, GCGGG at 4283, GCGGC at 4280.

Helpr arbitrary negative direction (evens) (2811-2596) proximal promoters

  1. Helpr0: CCGGG at 2765, CCCGG at 2631.
  2. Helpr2: CCGGC at 2616, GCCGC at 2597.
  3. Helpr4: GCGGC at 2614.
  4. Helpr6: GCCGG at 2612.
  5. Helpr8: GCCGG at 2691.

Helpr alternate negative direction (odds) (2811-2596) proximal promoters

  1. Helpr1: GCGGG at 2792, GCGGG at 2774, CCGGG at 2721, GCCGG at 2720, CCCGG at 2667.
  2. Helpr3: GCCGC at 2791, CCGGC at 2724, GCGGC at 2717, GCGGG at 2657.
  3. Helpr5: GCCGC at 2758, CCGGG at 2691, GCCGG at 2690.
  4. Helpr7: GCGGG at 2748, CCCGC at 2745, CCGGC at 2733, CCCGG at 2732.
  5. Helpr9: CCCGC at 2760, GCCGC at 2665, GCGGG at 2653.

Helpr arbitrary positive direction (odds) (4265-4050) proximal promoters

  1. Helpr1: CCGGC at 4135, CCCGG at 4134, GCCGC at 4114.
  2. Helpr3: CCGGG at 4142, CCCGG at 4141, GCCGC at 4137.
  3. Helpr5: CCGGG at 4231, GCCGG at 4230, CCCGC at 4098.
  4. Helpr7: GCCGC at 4213, GCGGG at 4159.
  5. Helpr9: CCCGG at 4231.

Helpr alternate positive direction (evens) (4265-4050) proximal promoters

  1. Helpr0: GCCGG at 4259, CCGGC at 4250, CCCGG at 4249, CCGGC at 4078, GCCGG at 4077.
  2. Helpr2: CCGGC at 4228, CCCGG at 4227, CCGGG at 4096, GCCGG at 4095.
  3. Helpr4: CCGGG at 4238, CCGGG at 4210, CCCGG at 4209, GCGGC at 4109, GCGGG at 4087.
  4. Helpr8: GCGGG at 4122, CCGGG at 4118, CCCGG at 4117, GCGGG at 4091.

Helpr arbitrary negative direction (evens) (2596-1) distal promoters

  1. Helpr0: CCCGC at 2404, GCCGC at 2379, GCGGG at 2325, GCGGG at 2070, CCCGC at 1891, CCCGG at 1684, CCGGC at 1603, CCCGG at 1602, GCCGC at 1383, GCGGG at 970, CCGGG at 822, CCCGG at 821, CCCGC at 624, GCCGC at 370.
  2. Helpr2: GCGGC at 2449, CCGGG at 2155, CCCGG at 2154, CCGGG at 2077, GCCGG at 2076, GCGGG at 2037, GCCGC at 1965, GCGGG at 1932, CCGGC at 1908, GCCGC at 1816, CCCGC at 1805, CCGGG at 1666, CCGGG at 1481, CCCGG at 1480, CCGGC at 1303, CCCGG at 1302, CCGGG at 1132, CCGGC at 962, CCCGG at 961, CCCGG at 813, CCGGG at 802, GCCGG at 801, CCGGC at 798, CCGGG at 762, GCCGG at 761, CCCGG at 632, GCGGG at 463, CCGGC at 331, GCCGG at 330, CCGGG at 326, CCCGG at 325, GCGGG at 154.
  3. Helpr4: GCCGG at 2523, GCGGG at 2431, GCGGG at 2411, GCCGC at 2344, CCGGG at 2157, CCCGG at 2156, GCGGG at 2088, GCGGG at 1875, CCGGG at 1816, GCGGG at 1753, GCGGG at 1649, CCGGG at 1624, GCGGC at 1582, GCGGG at 1510, CCGGG at 1465, CCCGG at 1464, CCGGG at 1220, CCCGC at 1198, GCCGC at 1091, GCCGC at 1088, CCGGC at 1025, GCCGG at 1024, GCCGC at 1021, CCCGG at 1013, GCCGG at 878, CCCGC at 875, CCGGG at 808, GCCGG at 807, CCCGG at 754, GCGGG at 712, GCCGC at 586, GCGGG at 566, GCCGG at 504, CCCGC at 501, GCCGG at 354, CCGGG at 349, CCCGG at 348, GCGGC at 218, CCCGC at 209, GCCGG at 142, CCCGC at 103, GCCGC at 79, GCCGC at 16.
  4. Helpr6: CCGGC at 2428, CCCGG at 2365, GCGGG at 2300, GCGGG at 2285, GCCGG at 2266, GCCGC at 2249, CCCGC at 2190, CCGGG at 1920, CCGGC at 1799, GCGGC at 1665, GCGGC at 1561, CCGGG at 1450, CCCGG at 1449, CCGGG at 1408, GCCGC at 1279, GCCGC at 1219, CCGGC at 1075, CCCGG at 1074, CCCGC at 377, CCCGC at 290.
  5. Helpr8: CCCGC at 2572, GCCGC at 2517, GCCGC at 2472, GCGGG at 2376, CCCGC at 2213, GCGGC at 2028, CCGGG at 1971, CCCGC at 1571, GCGGG at 1331, CCGGG at 1309, GCCGG at 1308, CCGGG at 1275, CCCGG at 1274, CCCGC at 1262, CCGGG at 1245, CCCGG at 1244, GCGGG at 1235, GCGGG at 1191, CCGGG at 1094, CCCGG at 1093, CCGGG at 1063, CCCGG at 1054, CCGGG at 838, GCCGG at 837, GCCGC at 760, GCGGC at 713, GCCGG at 682, CCCGC at 627, CCCGG at 512, GCGGG at 296, GCGGG at 242, GCCGC at 134, GCCGG at 49.

Helpr alternate negative direction (odds) (2596-1) distal promoters

  1. Helper1: CCGGG at 2085, CCGGG at 2048, GCCGG at 2047, GCGGC at 1913, CCGGG at 1900, CCCGG at 1899, GCGGG at 1812, CCCGG at 1786, CCCGG at 1418, CCCGC at 1353, GCCGC at 1060, CCCGG at 936, CCGGG at 818, CCGGC at 720, CCCGG at 719, CCGGC at 682, CCCGG at 681, GCGGG at 601, GCCGG at 507, CCCGC at 434, CCGGG at 395, CCCGC at 232, CCGGC at 122, CCCGG at 121, CCCGC at 80.
  2. Helpr3: CCCGC at 2521, GCCGC at 2375, CCGGG at 2309, CCCGG at 2032, CCGGG at 1991, GCCGG at 1990, CCGGC at 1967, CCCGG at 1966, CCCGC at 1865, GCGGC at 1764, CCGGG at 1608, CCCGG at 1607, CCGGC at 1455, GCCGG at 1454, GCCGC at 1451, GCGGC at 1442, GCGGG at 1314, CCGGC at 868, GCCGG at 867, CCGGC at 864, GCCGG at 863, CCGGC at 852, CCCGG at 851, CCCGG at 758, GCGGG at 519, CCGGC at 83, GCGGG at 9.
  3. Helpr5: CCCGC at 2265, GCGGG at 2231, GCGGC at 2166, GCGGG at 2035, CCGGG at 2029, CCCGG at 2028, CCCGC at 2020, CCCGC at 1972, GCGGG at 1754, CCGGG at 1741, CCCGG at 1740, GCGGC at 1700, CCGGC at 1433, CCGGG at 1224, CCGGC at 1081, GCCGG at 1080, CCCGC at 986, GCGGG at 966, CCCGC at 791, GCGGG at 735, GCGGC at 685, GCGGC at 678, CCGGC at 662, GCCGC at 596, GCGGC at 593, GCGGG at 581, CCCGC at 348, GCGGG at 178.
  4. Helpr7: CCGGG at 2486, GCCGG at 2485, GCGGG at 2425, CCCGC at 2160, CCGGG at 2103, GCCGG at 2102, GCCGC at 1910, CCGGC at 1907, CCGGG at 1843, CCCGG at 1842, GCGGG at 1778, GCCGC at 1769, CCGGG at 1374, CCCGG at 1373, GCCGG at 1349, CCGGG at 1094, CCCGG at 1093, GCGGG at 1065, GCGGG at 1045, GCGGC at 774, CCGGC at 688, GCCGC at 656, CCCGG at 620, CCGGG at 448, CCCGG at 447, CCGGC at 163, CCCGG at 162, CCGGG at 57, CCCGG at 56.
  5. Helpr9: CCGGC at 2551, CCCGG at 2550, CCCGC at 2503, GCCGC at 2448, GCGGG at 2135, GCGGC at 2069, GCCGG at 1842, GCGGC at 1727, CCGGC at 1687, CCCGG at 1686, CCCGC at 1648, GCCGG at 1611, GCGGC at 1427, GCCGC at 1414, CCGGG at 1368, GCCGG at 1367, CCGGC at 1040, GCGGC at 887, CCCGC at 690, CCGGC at 633, CCCGG at 632, CCGGG at 597, GCCGC at 568, GCCGG at 544, CCCGC at 323, CCGGG at 168, GCCGG at 167.

Helpr arbitrary positive direction (odds) (4050-1) distal promoters

  1. Helper1: GCGGC at 3801, CCGGC at 3709, GCCGG at 3708, GCGGC at 3104, GCGGG at 2792, GCGGG at 2774, CCGGG at 2721, GCCGG at 2720, CCCGG at 2667, CCGGG at 2085, CCGGG at 2048, GCCGG at 2047, GCGGC at 1913, CCGGG at 1900, CCCGG at 1899, GCGGG at 1812, CCCGG at 1786, CCCGG at 1418, CCCGC at 1353, GCCGC at 1060, CCCGG at 936, CCGGG at 818, CCGGC at 720, CCCGG at 719, CCGGC at 682, CCCGG at 681, GCGGG at 601, GCCGG at 507, CCCGC at 434, CCGGG at 395, CCCGC at 232, CCGGC at 122, CCCGG at 121, CCCGC at 80.
  2. Helpr3: CCGGC at 3837, CCCGC at 3811, CCGGG at 3750, CCCGG at 3749, CCCGC at 3728, CCGGG at 3659, CCCGG at 3475, CCGGG at 3424, GCCGG at 3423, GCGGC at 3413, CCGGG at 3401, CCCGG at 3400, GCGGG at 2912, GCCGG at 2896, GCCGG at 2884, GCCGC at 2791, CCGGC at 2724, GCGGC at 2717, GCGGG at 2657, CCCGC at 2521, GCCGC at 2375, CCGGG at 2309, CCCGG at 2032, CCGGG at 1991, GCCGG at 1990, CCGGC at 1967, CCCGG at 1966, CCCGC at 1865, GCGGC at 1764, CCGGG at 1608, CCCGG at 1607, CCGGC at 1455, GCCGG at 1454, GCCGC at 1451, GCGGC at 1442, GCGGG at 1314, CCGGC at 868, GCCGG at 867, CCGGC at 864, GCCGG at 863, CCGGC at 852, CCCGG at 851, CCCGG at 758, GCGGG at 519, CCGGC at 83, GCGGG at 9.
  3. Helpr5: CCGGG at 4048, GCGGC at 3752, CCGGG at 3453, CCCGG at 3335, GCCGC at 3323, CCCGG at 3028, GCCGG at 3022, CCGGG at 3016, CCCGG at 3015, GCGGG at 2972, GCGGC at 2932, CCCGC at 2814, GCCGC at 2758, CCGGG at 2691, GCCGG at 2690, CCCGC at 2265, GCGGG at 2231, GCGGC at 2166, GCGGG at 2035, CCGGG at 2029, CCCGG at 2028, CCCGC at 2020, CCCGC at 1972, GCGGG at 1754, CCGGG at 1741, CCCGG at 1740, GCGGC at 1700, CCGGC at 1433, CCGGG at 1224, CCGGC at 1081, GCCGG at 1080, CCCGC at 986, GCGGG at 966, CCCGC at 791, GCGGG at 735, GCGGC at 685, GCGGC at 678, CCGGC at 662, GCCGC at 596, GCGGC at 593, GCGGG at 581, CCCGC at 348, GCGGG at 178.
  4. Helpr7: CCGGC at 3982, GCCGG at 3981, GCCGC at 3954, CCGGG at 3943, GCCGG at 3942, CCCGG at 3863, GCGGC at 3615, CCGGC at 3498, GCCGG at 3497, GCCGG at 3301, CCGGG at 3088, CCCGG at 3087, GCCGG at 2994, GCGGG at 2748, CCCGC at 2745, CCGGC at 2733, CCCGG at 2732, CCGGG at 2486, GCCGG at 2485, GCGGG at 2425, CCCGC at 2160, CCGGG at 2103, GCCGG at 2102, GCCGC at 1910, CCGGC at 1907, CCGGG at 1843, CCCGG at 1842, GCGGG at 1778, GCCGC at 1769, CCGGG at 1374, CCCGG at 1373, GCCGG at 1349, CCGGG at 1094, CCCGG at 1093, GCGGG at 1065, GCGGG at 1045, GCGGC at 774, CCGGC at 688, GCCGC at 656, CCCGG at 620, CCGGG at 448, CCCGG at 447, CCGGC at 163, CCCGG at 162, CCGGG at 57, CCCGG at 56.
  5. Helpr9: GCGGG at 3999, GCCGG at 3982, GCGGG at 3962, CCCGG at 3918, GCGGC at 3896, GCGGC at 3893, GCCGC at 3883, CCGGC at 3810, GCGGC at 3628, GCGGC at 3536, CCGGG at 3341, CCGGG at 3161, CCCGG at 3160, GCGGG at 3035, GCGGG at 2855, CCCGC at 2760, GCCGC at 2665, GCGGG at 2653, CCGGC at 2551, CCCGG at 2550, CCCGC at 2503, GCCGC at 2448, GCGGG at 2135, GCGGC at 2069, GCCGG at 1842, GCGGC at 1727, CCGGC at 1687, CCCGG at 1686, CCCGC at 1648, GCCGG at 1611, GCGGC at 1427, GCCGC at 1414, CCGGG at 1368, GCCGG at 1367, CCGGC at 1040, GCGGC at 887, CCCGC at 690, CCGGC at 633, CCCGG at 632, CCGGG at 597, GCCGC at 568, GCCGG at 544, CCCGC at 323, CCGGG at 168, GCCGG at 167.

Helpr alternate positive direction (evens) (4050-1) distal promoters

  1. Helpr0: CCGGC at 3920, CCCGG at 3919, CCCGG at 3910, GCGGG at 3899, CCCGC at 3889, CCGGG at 3829, CCCGG at 3828, GCGGC at 3547, GCGGC at 3444, CCCGC at 3432, GCCGC at 3406, GCCGC at 3241, GCCGG at 3215, GCGGC at 3112, CCGGG at 3044, GCCGG at 3043, CCGGC at 3004, CCCGG at 3003, CCGGG at 2765, CCCGG at 2631, CCCGC at 2404, GCCGC at 2379, GCGGG at 2325, GCGGG at 2070, CCCGC at 1891, CCCGG at 1684, CCGGC at 1603, CCCGG at 1602, GCCGC at 1383, GCGGG at 970, CCGGG at 822, CCCGG at 821, CCCGC at 624, GCCGC at 370.
  2. Helpr2: GCCGG at 4021, GCGGG at 3976, CCGGC at 3973, GCGGG at 3910, CCCGC at 3872, CCCGG at 3864, GCGGC at 3749, GCGGC at 3670, GCCGC at 3585, CCGGG at 3498, CCCGG at 3497, CCGGG at 3293, CCCGG at 3292, GCGGG at 2819, CCGGG at 2814, CCCGG at 2813, CCGGC at 2616, GCCGC at 2597, GCGGC at 2449, CCGGG at 2155, CCCGG at 2154, CCGGG at 2077, GCCGG at 2076, GCGGG at 2037, GCCGC at 1965, GCGGG at 1932, CCGGC at 1908, GCCGC at 1816, CCCGC at 1805, CCGGG at 1666, CCGGG at 1481, CCCGG at 1480, CCGGC at 1303, CCCGG at 1302, CCGGG at 1132, CCGGC at 962, CCCGG at 961, CCCGG at 813, CCGGG at 802, GCCGG at 801, CCGGC at 798, CCGGG at 762, GCCGG at 761, CCCGG at 632, GCGGG at 463, CCGGC at 331, GCCGG at 330, CCGGG at 326, CCCGG at 325, GCGGG at 154.
  3. Helpr4: CCGGC at 4011, GCCGG at 4010, GCGGC at 3556, CCGGC at 3459, GCCGG at 3393, GCGGC at 2614, GCCGG at 2523, GCGGG at 2431, GCGGG at 2411, GCCGC at 2344, CCGGG at 2157, CCCGG at 2156, GCGGG at 2088, GCGGG at 1875, CCGGG at 1816, GCGGG at 1753, GCGGG at 1649, CCGGG at 1624, GCGGC at 1582, GCGGG at 1510, CCGGG at 1465, CCCGG at 1464, CCGGG at 1220, CCCGC at 1198, GCCGC at 1091, GCCGC at 1088, CCGGC at 1025, GCCGG at 1024, GCCGC at 1021, CCCGG at 1013, GCCGG at 878, CCCGC at 875, CCGGG at 808, GCCGG at 807, CCCGG at 754, GCGGG at 712, GCCGC at 586, GCGGG at 566, GCCGG at 504, CCCGC at 501, GCCGG at 354, CCGGG at 349, CCCGG at 348, GCGGC at 218, CCCGC at 209, GCCGG at 142, CCCGC at 103, GCCGC at 79, GCCGC at 16.
  4. Helpr6: CCGGG at 3736, CCCGG at 3735, CCCGC at 3588, GCGGC at 3426, GCCGG at 3412, CCCGG at 3388, CCGGG at 3340, CCCGC at 3318, CCGGC at 3083, GCCGG at 3082, GCCGG at 2612, CCGGC at 2428, CCCGG at 2365, GCGGG at 2300, GCGGG at 2285, GCCGG at 2266, GCCGC at 2249, CCCGC at 2190, CCGGG at 1920, CCGGC at 1799, GCGGC at 1665, GCGGC at 1561, CCGGG at 1450, CCCGG at 1449, CCGGG at 1408, GCCGC at 1279, GCCGC at 1219, CCGGC at 1075, CCCGG at 1074, CCCGC at 377, CCCGC at 290.
  5. Helpr8: CCCGC at 4014, CCGGG at 3866, GCCGG at 3865, CCGGC at 3862, CCCGG at 3861, CCGGG at 3735, CCGGG at 3723, CCGGG at 3457, CCCGG at 3456, CCGGG at 3316, GCCGG at 3315, GCGGC at 3286, GCGGG at 3243, CCGGG at 3134, CCCGG at 3133, GCCGG at 2691, CCCGC at 2572, GCCGC at 2517, GCCGC at 2472, GCGGG at 2376, CCCGC at 2213, GCGGC at 2028, CCGGG at 1971, CCCGC at 1571, GCGGG at 1331, CCGGG at 1309, GCCGG at 1308, CCGGG at 1275, CCCGG at 1274, CCCGC at 1262, CCGGG at 1245, CCCGG at 1244, GCGGG at 1235, GCGGG at 1191, CCGGG at 1094, CCCGG at 1093, CCGGG at 1063, CCCGG at 1054, CCGGG at 838, GCCGG at 837, GCCGC at 760, GCGGC at 713, GCCGG at 682, CCCGC at 627, CCCGG at 512, GCGGG at 296, GCGGG at 242, GCCGC at 134, GCCGG at 49.

Helper site analysis and results

"The C clamp carries specificity for a secondary, GC-rich sequence called a “Helper site” [(C/G)C(C/G)G(C/G)] that can occur with variable spacing and orientation relative to the Wnt response element (Atcha et al. 2007; Chang et al. 2008)."[5]

Reals or randoms Promoters direction Numbers Strands Occurrences Averages (± 0.1)
Reals UTR negative 8 2 4 4 ± 1 (--5,+-3)
Randoms UTR arbitrary negative 97 10 9.7 9.3 ± 0.4
Randoms UTR alternate negative 89 10 8.9 9.3 ± 0.4
Reals Core negative 0 2 0 0
Randoms Core arbitrary negative 3 10 0.3 0.2 ± 0.1
Randoms Core alternate negative 1 10 0.1 0.2 ± 0.1
Reals Core positive 6 2 3 3 ± 1 (-+2,++4)
Randoms Core arbitrary positive 11 10 1.1 1.4
Randoms Core alternate positive 17 10 1.7 1.4
Reals Proximal negative 3 2 1.5 1.5 ± 0.5 (--2,+-1)
Randoms Proximal arbitrary negative 7 10 0.7 1.3 ± 0.6
Randoms Proximal alternate negative 19 10 1.9 1.3 ± 0.6
Reals Proximal positive 4 2 2 2 ± 0 (-+2,++2)
Randoms Proximal arbitrary positive 12 10 1.2 1.5 ± 0.3
Randoms Proximal alternate positive 18 10 1.8 1.5 ± 0.3
Reals Distal negative 22 2 11 11 ± 2 (--9,+-13)
Randoms Distal arbitrary negative 142 10 14.2 13.9 ± 0.3
Randoms Distal alternate negative 136 10 13.6 13.9 ± 0.3
Reals Distal positive 122 2 61 61 ± 1 (-+60,++62)
Randoms Distal arbitrary positive 214 10 21.4 21.35 ± 0.5
Randoms Distal alternate positive 213 10 21.3 21.35 ± 0.5

Comparison:

The occurrences of real Helper UTRs are less than the randoms, positive cores, positive proximals are greater than the randoms, negative proximals overlap the high randoms, the negative distals are less than the randoms, and the positive distals are greater than the randoms. This suggests that the real Helpers are likely active or activable.

C clamp samplings

For the Basic programs testing consensus sequence (C/G)CTTTGAT(C/G) (starting with SuccessablesClamp.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. Negative strand, negative direction: 0.
  2. Positive strand, negative direction: 0.
  3. Negative strand, positive direction: 0.
  4. Positive strand, positive direction: 0.
  5. inverse complement, negative strand, negative direction: 0.
  6. inverse complement, positive strand, negative direction: 0.
  7. inverse complement, negative strand, positive direction: 0.
  8. inverse complement, positive strand, positive direction: 0.

See also

References

  1. Gregory P. Copenhaver, Christopher D. Putnam, Michael L. Denton and Craig S. Pikaard (1994). "The RNA polymerase I transcription factor UBF is a sequence-tolerant HMG-box protein that can recognize structured nucleic acids" (PDF). Nucleic Acids Research. 22 (13): 2651–7. Retrieved 2017-04-05.
  2. 2.0 2.1 Vincent Laudet, Dominique Stehelin and Hans Clevers (1993). "Ancestry and diversity of the HMG box superfamily" (PDF). Nucleic Acids Research. 21 (10): 2493–501. Retrieved 2017-04-05.
  3. 3.0 3.1 3.2 3.3 3.4 Marc van de Wetering, Mariette Oosterwegel, Klaske van Norren and Hans Clevers (1993). "Sox-4, an Sry-like HMG box protein, is a transcriptional activator in lymphocytes" (PDF). The EMBO Journal. 12 (10): .3847–3854. Retrieved 2017-02-13.
  4. Tomas Valenta, Jan Lukas, Vladimir Korinek (2003). "HMG box transcription factor TCF‐4's interaction with CtBP1 controls the expression of the Wnt target Axin2/Conductin in human embryonic kidney cells". Nucleic Acids Research. 31 (9): 2369–80. doi:10.1093/nar/gkg346. Retrieved 2017-04-05.
  5. 5.0 5.1 5.2 5.3 Ken M. Cadigan and Marian L. Waterman (November 2012). "TCF/LEFs and Wnt Signaling in the Nucleus". Cold Spring Harbor Perspectives in Biology. 4 (11): a007906. doi:10.1101/cshperspect.a007906. PMID 23024173. Retrieved 2023-05-05.
  6. 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 RefSeq (October 2016). "TCF7 transcription factor 7 [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 30 April 2020.
  7. 7.00 7.01 7.02 7.03 7.04 7.05 7.06 7.07 7.08 7.09 7.10 7.11 7.12 7.13 7.14 7.15 7.16 7.17 RefSeq (8 February 2019). "TCF7L2 transcription factor 7 like 2 [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 30 April 2020.
  8. 8.0 8.1 8.2 8.3 8.4 RefSeq (October 2009). "LEF1 lymphoid enhancer binding factor 1 [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 5 April 2020.

External links