D box gene transcriptions

Jump to navigation Jump to search

Associate Editor(s)-in-Chief: Henry A. Hoff

File:RF00071.jpg
This example of a C/D box is a small nucleolar RNA 73 (snoRNA U73). Credit: Rfam database (RF00071).{{free media}}

For "box C/D snoRNAs, boxes C and D and an adjoining stem form a vital structure, known as the box C/D motif."[1]

In snoRNA U73 on the right, from the right side, the D box is AGUCY. In 5' to 3' direction, the D box is YCUGA.

Degenerate nucleotides

For transcription, U (in RNA) is T, Y=(C or T) and R=(A or G).

Consensus sequences

File:U14 snoRNA.png
This U14 snoRNA from Saccharomyces cerevisiae shows structure and genomic organization. Credit: Dmitry A.Samarsky, Maurille J.Fournier, Robert H.Singer and Edouard Bertrand.{{fairuse}}

Shown in the image on the right is the D box (3'-AGUCUG-5'). Substituting T for U yields D box = 3'-AGTCTG-5' in the transcription direction on the template strand.

"Members of the box C/D snoRNA family, which are the subject of the present report, possess characteristic sequence elements known as box C (UGAUGA) and box D (GUCUGA)."[1]

D-box (TGAGTGG).[2]

Hypotheses

  1. The D boxes are not involved in the transcription of A1BG.
  2. The promoters of A1BG do not contain a Samarsky D box.

Dbox (Samarsky) samplings

For the Basic programs (starting with SuccessablesDbox.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. Negative strand, negative direction: AGTCTG at 2947.
  2. Negative strand, positive direction: AGTCTG at 3923.
  3. Positive strand, negative direction: AGTCTG at 1355.
  4. Positive strand, positive direction: 0.
  5. inverse complement, negative strand, negative direction: 0,
  6. inverse complement, negative strand, positive direction: CAGACT at 1744, CAGACT at 2416.
  7. inverse complement, positive strand, negative direction: CAGACT at 15, CAGACT at 1616,
  8. inverse complement, positive strand, positive direction: CAGACT at 2943, CAGACT at 3006, CAGACT at 3924.

DboxS (4560-2846) UTRs

  1. Negative strand, negative direction: AGTCTG at 2947.

DboxS negative direction (2596-1) distal promoters

  1. Positive strand, negative direction: CAGACT at 1616, AGTCTG at 1355, CAGACT at 15.

DboxS positive direction (4050-1) distal promoters

  1. Negative strand, positive direction: AGTCTG at 3923, CAGACT at 2416, CAGACT at 1744.
  2. Positive strand, positive direction: CAGACT at 3924, CAGACT at 3006, CAGACT at 2943.

Samarsky random dataset samplings

  1. Dboxr0: 1, AGTCTG at 4073.
  2. Dboxr1: 0.
  3. Dboxr2: 0.
  4. Dboxr3: 1, AGTCTG at 1984.
  5. Dboxr4: 0.
  6. Dboxr5: 1, AGTCTG at 2334.
  7. Dboxr6: 1, AGTCTG at 804.
  8. Dboxr7: 0.
  9. Dboxr8: 1, AGTCTG at 587.
  10. Dboxr9: 4, AGTCTG at 3816, AGTCTG at 1207, AGTCTG at 111, AGTCTG at 36.
  11. Dboxr0ci: 1, CAGACT at 1616.
  12. Dboxr1ci: 1, CAGACT at 1754.
  13. Dboxr2ci: 1, CAGACT at 355.
  14. Dboxr3ci: 0.
  15. Dboxr4ci: 0.
  16. Dboxr5ci: 0.
  17. Dboxr6ci: 0.
  18. Dboxr7ci: 0.
  19. Dboxr8ci: 0.
  20. Dboxr9ci: 0.

DboxSr arbitrary (evens) (4560-2846) UTRs

  1. Dboxr0: AGTCTG at 4073.

DboxSr alternate (odds) (4560-2846) UTRs

  1. Dboxr9: AGTCTG at 3816.

DboxSr alternate positive direction (evens) (4265-4050) proximal promoters

  1. Dboxr0: AGTCTG at 4073.

DboxSr arbitrary negative direction (evens) (2596-1) distal promoters

  1. Dboxr6: AGTCTG at 804.
  2. Dboxr8: AGTCTG at 587.
  3. Dboxr0ci: CAGACT at 1616.
  4. Dboxr2ci: CAGACT at 355.

Dboxr alternate negative direction (odds) (2596-1) distal promoters

  1. Dboxr3: AGTCTG at 1984.
  2. Dboxr5: AGTCTG at 2334.
  3. Dboxr9: AGTCTG at 1207, AGTCTG at 111, AGTCTG at 36.
  4. Dboxr1ci: CAGACT at 1754.

DboxSr arbitrary positive direction (odds) (4050-1) distal promoters

  1. Dboxr3: AGTCTG at 1984.
  2. Dboxr5: AGTCTG at 2334.
  3. Dboxr9: AGTCTG at 3816, AGTCTG at 1207, AGTCTG at 111, AGTCTG at 36.
  4. Dboxr1ci: CAGACT at 1754.

DboxSr alternate positive direction (evens) (4050-1) distal promoters

  1. Dboxr6: AGTCTG at 804.
  2. Dboxr8: AGTCTG at 587.
  3. Dboxr0ci: CAGACT at 1616.
  4. Dboxr2ci: CAGACT at 355.

Dbox (Samarsky) analysis and results

The D box (AGUCUG) is determined by substituting T for U to yield D box = AGTCTG in the transcription direction.[1]

Reals or randoms Promoters direction Numbers Strands Occurrences Averages (± 0.1)
Reals UTR negative 1 2 0.5 0.5 ± 0.5 (--1,+-0)
Randoms UTR arbitrary negative 1 10 0.1 0.1 ± 0
Randoms UTR alternate negative 1 10 0.1 0.1 ± 0
Reals Core negative 0 2 0 0
Randoms Core arbitrary negative 0 10 0 0
Randoms Core alternate negative 0 10 0 0
Reals Core positive 0 2 0 0
Randoms Core arbitrary positive 0 10 0 0
Randoms Core alternate positive 0 10 0 0
Reals Proximal negative 0 2 0 0
Randoms Proximal arbitrary negative 0 10 0 0
Randoms Proximal alternate negative 0 10 0 0
Reals Proximal positive 0 2 0 0
Randoms Proximal arbitrary positive 0 10 0 0.05 ± 0.05
Randoms Proximal alternate positive 1 10 0.1 0.05 ± 0.05
Reals Distal negative 3 2 1.5 1.5 ± 1.5 (--0,+-3)
Randoms Distal arbitrary negative 4 10 0.4 0.5 ± 0.1
Randoms Distal alternate negative 6 10 0.6 0.5 ± 0.1
Reals Distal positive 6 2 3 3 ± 0 (-+3,++3)
Randoms Distal arbitrary positive 7 10 0.7 0.55 ± 0.15
Randoms Distal alternate positive 4 10 0.4 0.55 ± 0.15

Comparison:

The occurrences of real DboxSs are greater than the randoms. This suggests that the real DboxSs are likely active or activable.

D boxes

The human ribosomal protein L11 gene (HRPL11) has a potential snRNA-coding sequences in intron 4: a D box beginning at +4237 (TCCTG).[3]

D box (Voronina) samplings

For the Basic programs testing consensus sequence TCCTG (starting with SuccessablesAAA.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for TCCTG, 4, TCCTG at 4467, TCCTG at 3755, TCCTG at 3639, TCCTG at 3388, and complements.
  2. negative strand, positive direction, looking for TCCTG, 10, TCCTG at 4408, TCCTG at 4185, TCCTG at 3621, TCCTG at 3295, TCCTG at 2519, TCCTG at 2500, TCCTG at 2210, TCCTG at 1775, TCCTG at 1117, TCCTG at 143, and complements.
  3. positive strand, negative direction, looking for TCCTG, 5, TCCTG at 4545, TCCTG at 3905, TCCTG at 1910, TCCTG at 1840, TCCTG at 595, and complements.
  4. positive strand, positive direction, looking for TCCTG, 4, TCCTG at 4251, TCCTG at 3130, TCCTG at 2459, TCCTG at 1669, and complements.
  5. inverse complement, negative strand, negative direction, looking for CAGGA, 0.
  6. inverse complement, negative strand, positive direction, looking for CAGGA, 7, CAGGA at 3869, CAGGA at 3572, CAGGA at 3129, CAGGA at 2746, CAGGA at 2621, CAGGA at 708, CAGGA at 425, and complements.
  7. inverse complement, positive strand, negative direction, looking for CAGGA, 23, CAGGA at 4437, CAGGA at 4283, CAGGA at 4171, CAGGA at 4139, CAGGA at 3250, CAGGA at 3218, CAGGA at 3111, CAGGA at 2690, CAGGA at 2588, CAGGA at 2368, CAGGA at 2251, CAGGA at 2135, CAGGA at 1942, CAGGA at 1824, CAGGA at 1289, CAGGA at 1276, CAGGA at 998, CAGGA at 985, CAGGA at 851, CAGGA at 832, CAGGA at 715, CAGGA at 579, CAGGA at 442, and complements.
  8. inverse complement, positive strand, positive direction, looking for CAGGA, 5, CAGGA at 3864, CAGGA at 3620, CAGGA at 2999, CAGGA at 758, CAGGA at 219, and complements.

DboxV (4560-2846) UTRs

  1. Negative strand, negative direction: TCCTG at 4467, TCCTG at 3755, TCCTG at 3639, TCCTG at 3388.
  2. Positive strand, negative direction: TCCTG at 4545, CAGGA at 4437, CAGGA at 4283, CAGGA at 4171, CAGGA at 4139, TCCTG at 3905, CAGGA at 3250, CAGGA at 3218, CAGGA at 3111.

DboxV positive direction (4445-4265) core promoters

  1. Negative strand, positive direction: TCCTG at 4408.
  2. Positive strand, positive direction: TCCTG at 4251, TCCTG at 3130, TCCTG at 2459, TCCTG at 1669.

DboxV negative direction (2811-2596) proximal promoters

  1. Positive strand, negative direction: CAGGA at 2690.

DboxV positive direction (4265-4050) proximal promoters

  1. Negative strand, positive direction: TCCTG at 4185.
  2. Positive strand, positive direction: TCCTG at 4251.

DboxV negative direction (2596-1) distal promoters

  1. Positive strand, negative direction: CAGGA at 2588, CAGGA at 2368, CAGGA at 2251, CAGGA at 2135, CAGGA at 1942, TCCTG at 1910, TCCTG at 1840, CAGGA at 1824, CAGGA at 1289, CAGGA at 1276, CAGGA at 998, CAGGA at 985, CAGGA at 851, CAGGA at 832, CAGGA at 715, TCCTG at 595, CAGGA at 579, CAGGA at 442.

DboxV positive direction (4050-1) distal promoters

  1. Negative strand, positive direction: TCCTG at 3621, TCCTG at 3295, TCCTG at 2519, TCCTG at 2500, TCCTG at 2210, TCCTG at 1775, TCCTG at 1117, TCCTG at 143.
  2. Negative strand, positive direction: CAGGA at 3869, CAGGA at 3572, CAGGA at 3129, CAGGA at 2746, CAGGA at 2621, CAGGA at 708, CAGGA at 425.
  3. Positive strand, positive direction: TCCTG at 3130, TCCTG at 2459, TCCTG at 1669.
  4. Positive strand, positive direction: CAGGA at 3864, CAGGA at 3620, CAGGA at 2999, CAGGA at 758, CAGGA at 219.

D box (Voronina) random dataset samplings

  1. DVor0: 8, TCCTG at 4018, TCCTG at 2252, TCCTG at 1914, TCCTG at 1550, TCCTG at 1008, TCCTG at 513, TCCTG at 348, TCCTG at 159.
  2. DVor1: 3, TCCTG at 1801, TCCTG at 1388, TCCTG at 1188.
  3. DVor2: 4, TCCTG at 2878, TCCTG at 1402, TCCTG at 1203, TCCTG at 724.
  4. DVor3: 4, TCCTG at 2931, TCCTG at 2508, TCCTG at 2127, TCCTG at 349.
  5. DVor4: 7, TCCTG at 3918, TCCTG at 3821, TCCTG at 3321, TCCTG at 2668, TCCTG at 2622, TCCTG at 1919, TCCTG at 800.
  6. DVor5: 3, TCCTG at 4160, TCCTG at 1116, TCCTG at 864.
  7. DVor6: 5, TCCTG at 4466, TCCTG at 4013, TCCTG at 3240, TCCTG at 3184, TCCTG at 946.
  8. DVor7: 10, TCCTG at 4133, TCCTG at 4128, TCCTG at 2878, TCCTG at 2785, TCCTG at 2098, TCCTG at 2053, TCCTG at 1578, TCCTG at 1215, TCCTG at 748, TCCTG at 627.
  9. DVor8: 3, TCCTG at 3448, TCCTG at 3429, TCCTG at 1014.
  10. DVor9: 2, TCCTG at 1587, TCCTG at 373.
  11. DVor0ci: 4, CAGGA at 4312, CAGGA at 3138, CAGGA at 1483, CAGGA at 1403.
  12. DVor1ci: 7, CAGGA at 4309, CAGGA at 3531, CAGGA at 3275, CAGGA at 3139, CAGGA at 2739, CAGGA at 645, CAGGA at 381.
  13. DVor2ci: 5, CAGGA at 2328, CAGGA at 1600, CAGGA at 985, CAGGA at 574, CAGGA at 492.
  14. DVor3ci: 5, CAGGA at 2408, CAGGA at 2253, CAGGA at 1525, CAGGA at 1344, CAGGA at 1272.
  15. DVor4ci: 1, CAGGA at 857.
  16. DVor5ci: 1, CAGGA at 784.
  17. DVor6ci: 8, CAGGA at 4256, CAGGA at 4168, CAGGA at 3987, CAGGA at 3260, CAGGA at 2705, CAGGA at 2593, CAGGA at 1223, CAGGA at 419.
  18. DVor7ci: 2, CAGGA at 4062, CAGGA at 2793.
  19. DVor8ci: 4, CAGGA at 2161, CAGGA at 868, CAGGA at 372, CAGGA at 22.
  20. DVor9ci: 2, CAGGA at 3716, CAGGA at 3370.

DboxVr arbitrary (evens) (4560-2846) UTRs

  1. DVor0: TCCTG at 4018.
  2. DVor2: TCCTG at 2878.
  3. DVor4: TCCTG at 3918, TCCTG at 3821, TCCTG at 3321.
  4. DVor6: TCCTG at 4466, TCCTG at 4013, TCCTG at 3240, TCCTG at 3184.
  5. DVor8: TCCTG at 3448, TCCTG at 3429.
  6. DVor0ci: CAGGA at 4312, CAGGA at 3138.
  7. DVor6ci: CAGGA at 4256, CAGGA at 4168, CAGGA at 3987, CAGGA at 3260.

DboxVr alternate (odds) (4560-2846) UTRs

  1. DVor3: TCCTG at 2931.
  2. DVor5: TCCTG at 4160.
  3. DVor7: TCCTG at 4133, TCCTG at 4128, TCCTG at 2878.
  4. DVor1ci: CAGGA at 4309, CAGGA at 3531, CAGGA at 3275, CAGGA at 3139.
  5. DVor7ci: CAGGA at 4062.
  6. DVor9ci: CAGGA at 3716, CAGGA at 3370.

DboxVr arbitrary positive direction (odds) (4445-4265) core promoters

  1. DVor1ci: CAGGA at 4309.

DboxVr alternate positive direction (evens) (4445-4265) core promoters

  1. DVor0ci: CAGGA at 4312.

DboxVr arbitrary negative direction (evens) (2811-2596) proximal promoters

  1. DVor4: TCCTG at 2668, TCCTG at 2622.
  2. DVor6ci: CAGGA at 2705.

DboxVr alternate negative direction (odds) (2811-2596) proximal promoters

  1. DVor7: TCCTG at 2785.
  2. DVor1ci: CAGGA at 2739.
  3. DVor7ci: CAGGA at 2793.

DboxVr arbitrary positive direction (odds) (4265-4050) proximal promoters

  1. DVor5: TCCTG at 4160.
  2. DVor7: TCCTG at 4133, TCCTG at 4128.
  3. DVor7ci: CAGGA at 4062.

DboxVr alternate positive direction (evens) (4265-4050) proximal promoters

  1. DVor6ci: CAGGA at 4256, CAGGA at 4168.

DboxVr arbitrary negative direction (evens) (2596-1) distal promoters

  1. DVor0: TCCTG at 2252, TCCTG at 1914, TCCTG at 1550, TCCTG at 1008, TCCTG at 513, TCCTG at 348, TCCTG at 159.
  2. DVor2: TCCTG at 1402, TCCTG at 1203, TCCTG at 724.
  3. DVor4: TCCTG at 1919, TCCTG at 800.
  4. DVor6: TCCTG at 946.
  5. DVor8: TCCTG at 1014.
  6. DVor0ci: CAGGA at 1483, CAGGA at 1403.
  7. DVor2ci: CAGGA at 2328, CAGGA at 1600, CAGGA at 985, CAGGA at 574, CAGGA at 492.
  8. DVor4ci: CAGGA at 857.
  9. DVor6ci: CAGGA at 2593, CAGGA at 1223, CAGGA at 419.
  10. DVor8ci: CAGGA at 2161, CAGGA at 868, CAGGA at 372, CAGGA at 22.

DboxVr alternate negative direction (odds) (2596-1) distal promoters

  1. DVor1: TCCTG at 1801, TCCTG at 1388, TCCTG at 1188.
  2. DVor3: TCCTG at 2508, TCCTG at 2127, TCCTG at 349.
  3. DVor5: TCCTG at 1116, TCCTG at 864.
  4. DVor7: TCCTG at 2098, TCCTG at 2053, TCCTG at 1578, TCCTG at 1215, TCCTG at 748, TCCTG at 627.
  5. DVor9: TCCTG at 1587, TCCTG at 373.
  6. DVor1ci: CAGGA at 645, CAGGA at 381.
  7. DVor3ci: CAGGA at 2408, CAGGA at 2253, CAGGA at 1525, CAGGA at 1344, CAGGA at 1272.
  8. DVor5ci: CAGGA at 784.

DboxVr arbitrary positive direction (odds) (4050-1) distal promoters

  1. DVor1: TCCTG at 1801, TCCTG at 1388, TCCTG at 1188.
  2. DVor3: TCCTG at 2931, TCCTG at 2508, TCCTG at 2127, TCCTG at 349.
  3. DVor5: TCCTG at 1116, TCCTG at 864.
  4. DVor7: TCCTG at 2878, TCCTG at 2785, TCCTG at 2098, TCCTG at 2053, TCCTG at 1578, TCCTG at 1215, TCCTG at 748, TCCTG at 627.
  5. DVor9: TCCTG at 1587, TCCTG at 373.
  6. DVor1ci: CAGGA at 3531, CAGGA at 3275, CAGGA at 3139, CAGGA at 2739, CAGGA at 645, CAGGA at 381.
  7. DVor3ci: CAGGA at 2408, CAGGA at 2253, CAGGA at 1525, CAGGA at 1344, CAGGA at 1272.
  8. DVor5ci: CAGGA at 784.
  9. DVor7ci: CAGGA at 2793.
  10. DVor9ci: CAGGA at 3716, CAGGA at 3370.

DboxVr alternate positive direction (evens) (4050-1) distal promoters

  1. DVor0: TCCTG at 4018, TCCTG at 2252, TCCTG at 1914, TCCTG at 1550, TCCTG at 1008, TCCTG at 513, TCCTG at 348, TCCTG at 159.
  2. DVor2: TCCTG at 2878, TCCTG at 1402, TCCTG at 1203, TCCTG at 724.
  3. DVor4: TCCTG at 3918, TCCTG at 3821, TCCTG at 3321, TCCTG at 2668, TCCTG at 2622, TCCTG at 1919, TCCTG at 800.
  4. DVor6: TCCTG at 4013, TCCTG at 3240, TCCTG at 3184, TCCTG at 946.
  5. DVor8: TCCTG at 3448, TCCTG at 3429, TCCTG at 1014.
  6. DVor0ci: CAGGA at 3138, CAGGA at 1483, CAGGA at 1403.
  7. DVor2ci: CAGGA at 2328, CAGGA at 1600, CAGGA at 985, CAGGA at 574, CAGGA at 492.
  8. DVor4ci: CAGGA at 857.
  9. DVor6ci: CAGGA at 3987, CAGGA at 3260, CAGGA at 2705, CAGGA at 2593, CAGGA at 1223, CAGGA at 419.
  10. DVor8ci: CAGGA at 2161, CAGGA at 868, CAGGA at 372, CAGGA at 22.

DboxV analysis and results

The human ribosomal protein L11 gene (HRPL11) has a potential snRNA-coding sequences in intron 4: a D box beginning at +4237 (TCCTG).[3]

Reals or randoms Promoters direction Numbers Strands Occurrences Averages (± 0.1)
Reals UTR negative 13 2 6.5 6.5 ± 2.5 (--4,+-9)
Randoms UTR arbitrary negative 17 10 1.7 1.45 ± 0.25
Randoms UTR alternate negative 12 10 1.2 1.45 ± 0.25
Reals Core negative 0 2 0 0
Randoms Core arbitrary negative 0 10 0 0
Randoms Core alternate negative 0 10 0 0
Reals Core positive 5 2 2.5 2.5 ± 1.5 (-+1,+-4)
Randoms Core arbitrary positive 1 10 0.1 0.1 ± 0
Randoms Core alternate positive 1 10 0.1 0.1 ± 0
Reals Proximal negative 1 2 0.5 0.5 ± 0.5 (--0,+-1)
Randoms Proximal arbitrary negative 3 10 0.3 0.3 ± 0
Randoms Proximal alternate negative 3 10 0.3 0.3 ± 0
Reals Proximal positive 2 2 1 1 ± 0 (-+1,++1)
Randoms Proximal arbitrary positive 4 10 0.4 0.3 ± 0.1
Randoms Proximal alternate positive 2 10 0.2 0.3 ± 0.1
Reals Distal negative 18 2 9 9 ± 9 (--0,+-18)
Randoms Distal arbitrary negative 29 10 2.9 2.65 ± 0.25
Randoms Distal alternate negative 24 10 2.4 2.65 ± 0.25
Reals Distal positive 23 2 11.5 11.5 ± 3.5 (-+15,++8)
Randoms Distal arbitrary positive 34 10 3.4 3.95 ± 0.55
Randoms Distal alternate positive 45 10 4.5 3.95 ± 0.55

Comparison:

The occurrences of real DboxVs are greater than the randoms. This suggests that the real DboxVs are likely active or activable.

(Johnson) samplings

TCTCACATT(A/C)AATAAGTCA is a D-box.[4]

For the Basic programs testing consensus sequence 5'-TCTCACATT(A/C)AATAAGTCA-3' (starting with SuccessablesAAA.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for 5'-TCTCACATT(A/C)AATAAGTCA-3', 0.
  2. negative strand, positive direction, looking for 5'-TCTCACATT(A/C)AATAAGTCA-3', 0.
  3. positive strand, negative direction, looking for 5'-TCTCACATT(A/C)AATAAGTCA-3', 0.
  4. positive strand, positive direction, looking for 5'-TCTCACATT(A/C)AATAAGTCA-3', 0.
  5. complement, negative strand, negative direction, looking for 5'-AGAGTGTAA(G/T)TTATTCAGT-3', 0.
  6. complement, negative strand, positive direction, looking for 5'-AGAGTGTAA(G/T)TTATTCAGT-3', 0.
  7. complement, positive strand, negative direction, looking for 5'-AGAGTGTAA(G/T)TTATTCAGT-3', 0.
  8. complement, positive strand, positive direction, looking for 5'-AGAGTGTAA(G/T)TTATTCAGT-3', 0.
  9. inverse complement, negative strand, negative direction, looking for 5'-TGACTTATT(G/T)AATGTGAGA-3', 0.
  10. inverse complement, negative strand, positive direction, looking for 5'-TGACTTATT(G/T)AATGTGAGA-3', 0.
  11. inverse complement, positive strand, negative direction, looking for 5'-TGACTTATT(G/T)AATGTGAGA-3', 0.
  12. inverse complement, positive strand, positive direction, looking for 5'-TGACTTATT(G/T)AATGTGAGA-3', 0.
  13. inverse negative strand, negative direction, looking for 5'-ACTGAATAA(A/C)TTACACTCT-3', 0.
  14. inverse negative strand, positive direction, looking for 5'-ACTGAATAA(A/C)TTACACTCT-3', 0.
  15. inverse positive strand, negative direction, looking for 5'-ACTGAATAA(A/C)TTACACTCT-3', 0.
  16. inverse positive strand, positive direction, looking for 5'-ACTGAATAA(A/C)TTACACTCT-3', 0.

(Mracek) samplings

There is another promoter D box, or D-box: "Located in the region [...] is a single D-box element (5′-GTTGTATAAC-3′) with a distinct sequence from that of the functional D-box identified in the per2 promoter (5′-CTTATGTAAA-3′) [21]."[5]

(Mracek1) samplings

For the Basic programs testing consensus sequence 5'-GTTGTATAAC-3' (starting with SuccessablesMra1.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for 5'-GTTGTATAAC-3', 0.
  2. negative strand, positive direction, looking for 5'-GTTGTATAAC-3', 0.
  3. positive strand, negative direction, looking for 5'-GTTGTATAAC-3', 0.
  4. positive strand, positive direction, looking for 5'-GTTGTATAAC-3', 0.
  5. complement, negative strand, negative direction, looking for 5'-CAACATATTG-3', 0.
  6. complement, negative strand, positive direction, looking for 5'-CAACATATTG-3', 0.
  7. complement, positive strand, negative direction, looking for 5'-CAACATATTG-3', 0.
  8. complement, positive strand, positive direction, looking for 5'-CAACATATTG-3', 0.
  9. inverse complement, negative strand, negative direction, looking for 5'-GTTATACAAC-3', 0.
  10. inverse complement, negative strand, positive direction, looking for 5'-GTTATACAAC-3', 0.
  11. inverse complement, positive strand, negative direction, looking for 5'-GTTATACAAC-3', 0.
  12. inverse complement, positive strand, positive direction, looking for 5'-GTTATACAAC-3', 0.
  13. inverse negative strand, negative direction, looking for 5'-CAATATGTTG-3', 0.
  14. inverse negative strand, positive direction, looking for 5'-CAATATGTTG-3', 0.
  15. inverse positive strand, negative direction, looking for 5'-CAATATGTTG-3', 0.
  16. inverse positive strand, positive direction, looking for 5'-CAATATGTTG-3', 0.

(Mracek2) samplings

For the Basic programs testing consensus sequence 5'-CTTATGTAAA-3' (starting with SuccessablesMra2.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for 5'-CTTATGTAAA-3', 0.
  2. negative strand, positive direction, looking for 5'-CTTATGTAAA-3', 0.
  3. positive strand, negative direction, looking for 5'-CTTATGTAAA-3', 0.
  4. positive strand, positive direction, looking for 5'-CTTATGTAAA-3', 0.
  5. complement, negative strand, negative direction, looking for 5'-GAATACATTT-3', 0.
  6. complement, negative strand, positive direction, looking for 5'-GAATACATTT-3', 0.
  7. complement, positive strand, negative direction, looking for 5'-GAATACATTT-3', 0.
  8. complement, positive strand, positive direction, looking for 5'-GAATACATTT-3', 0.
  9. inverse complement, negative strand, negative direction, looking for 5'-TTTACATAAG-3', 0.
  10. inverse complement, negative strand, positive direction, looking for 5'-TTTACATAAG-3', 0.
  11. inverse complement, positive strand, negative direction, looking for 5'-TTTACATAAG-3', 0.
  12. inverse complement, positive strand, positive direction, looking for 5'-TTTACATAAG-3', 0.
  13. inverse negative strand, negative direction, looking for 5'-AAATGTATTC-3', 0.
  14. inverse negative strand, positive direction, looking for 5'-AAATGTATTC-3', 0.
  15. inverse positive strand, negative direction, looking for 5'-AAATGTATTC-3', 0.
  16. inverse positive strand, positive direction, looking for 5'-AAATGTATTC-3', 0.

Consensus sequence (Motojima)

D-box (TGAGTGG).[2]

(Motojima) samplings

Copying the consensus of the D-box: TGAGTGG and putting the sequence in "⌘F" finds no locations between ZSCAN22 and A1BG and one between ZNF497 and A1BG as can be found by the computer programs.

For the Basic programs testing consensus sequence TGAGTGG (starting with SuccessablesMOT.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for TGAGTGG, 0.
  2. negative strand, positive direction, looking for TGAGTGG, 1, TGAGTGG at 3449.
  3. positive strand, negative direction, looking for TGAGTGG, 0.
  4. positive strand, positive direction, looking for TGAGTGG, 0.
  5. inverse complement, negative strand, negative direction, looking for CCACTCA, 1, CCACTCA at 3827.
  6. inverse complement, negative strand, positive direction, looking for CCACTCA, 0.
  7. inverse complement, positive strand, negative direction, looking for CCACTCA, 1, CCACTCA at 4487.
  8. inverse complement, positive strand, positive direction, looking for CCACTCA, 0.

DboxM (4560-2846) UTRs

  1. Negative strand, negative direction: CCACTCA at 3827.
  2. Positive strand, negative direction: CCACTCA at 4487.

DboxM positive direction (4050-1) distal promoters

  1. Negative strand, positive direction: TGAGTGG at 3449.

Motojima random dataset samplings

  1. MOTr0: 1, TGAGTGG at 4502.
  2. MOTr1: 1, TGAGTGG at 4148.
  3. MOTr2: 0.
  4. MOTr3: 0.
  5. MOTr4: 0.
  6. MOTr5: 0.
  7. MOTr6: 0.
  8. MOTr7: 0.
  9. MOTr8: 0.
  10. MOTr9: 0.
  11. MOTr0ci: 0.
  12. MOTr1ci: 0.
  13. MOTr2ci: 1, CCACTCA at 1365.
  14. MOTr3ci: 0.
  15. MOTr4ci: 0.
  16. MOTr5ci: 0.
  17. MOTr6ci: 1, CCACTCA at 1766.
  18. MOTr7ci: 0.
  19. MOTr8ci: 0.
  20. MOTr9ci: 0.

DboxMr arbitrary (evens) (4560-2846) UTRs

  1. MOTr0: TGAGTGG at 4502.

DboxMr alternate (odds) (4560-2846) UTRs

  1. MOTr1: TGAGTGG at 4148.

DboxMr arbitrary positive direction (odds) (4265-4050) proximal promoters

  1. MOTr1: TGAGTGG at 4148.

DboxMr arbitrary negative direction (evens) (2596-1) distal promoters

  1. MOTr2ci: CCACTCA at 1365.
  2. MOTr6ci: CCACTCA at 1766.

DboxMr alternate positive direction (evens) (4050-1) distal promoters

  1. MOTr2ci: CCACTCA at 1365.
  2. MOTr6ci: CCACTCA at 1766.

D-box (Motojima) analysis and results

D-box (TGAGTGG).[2]

Reals or randoms Promoters direction Numbers Strands Occurrences Averages (± 0.1)
Reals UTR negative 2 2 1 1 ± 0 (--1,+-1)
Randoms UTR arbitrary negative 1 10 0.1 0.1 ± 0
Randoms UTR alternate negative 1 10 0.1 0.1 ± 0
Reals Core negative 0 2 0 0
Randoms Core arbitrary negative 0 10 0 0
Randoms Core alternate negative 0 10 0 0
Reals Core positive 0 2 0 0
Randoms Core arbitrary positive 0 10 0 0
Randoms Core alternate positive 0 10 0 0
Reals Proximal negative 0 2 0 0
Randoms Proximal arbitrary negative 0 10 0 0
Randoms Proximal alternate negative 0 10 0 0
Reals Proximal positive 0 2 0 0
Randoms Proximal arbitrary positive 1 10 0.1 0.05 ± 0.05
Randoms Proximal alternate positive 0 10 0 0.05 ± 0.05
Reals Distal negative 0 2 0 0
Randoms Distal arbitrary negative 2 10 0.2 0.1 ± 0.1
Randoms Distal alternate negative 0 10 0 0.1 ± 0.1
Reals Distal positive 1 2 0.5 0.5 ± 0.5 (-+1,++0)
Randoms Distal arbitrary positive 0 10 0 0.1 ± 0.1
Randoms Distal alternate positive 2 10 0.2 0.1 ± 0.1

Comparison:

The occurrences of real DboxMs are greater than the randoms. This suggests that the real DboxMs are likely active or activable.

(Samarsky) D box analysis and results

For "box C/D snoRNAs, boxes C and D and an adjoining stem form a vital structure, known as the box C/D motif."[1] Adjoining Domain B and overlapping for two nucleotides is Box D: GUCUGA from Domain B where "GU" are also at the end of Domain B, with the inverse being AGUCUG and replacing U with T yields a likely consensus sequence to search for AGTCTG.[1]

The real consensus sequences are AGTCTG at 2947 in the UTR between A1BG and ZSCAN22 with an occurrence of 0.5, three in the distal promoter also in the negative direction for an occurrence of 1.5, and six in the positive direction for an occurrence of 3.0.

The randoms had one in the UTR: AGTCTG at 4073 in the arbitrary negative direction for an occurrence of 0.1, four in the negative direction in the distal promoter for an occurrence of 0.4 and seven in the positive direction for an occurrence of 0.7.

By comparison, the occurrences are systematically higher for the reals than the randoms which suggests that the reals are likely active or activable.

(Voronina) D box analysis and results

The reals have four consensus sequences in the UTR for an occurrence of 2.0.

There is only one core promoter of eight promoters for an occurrence of 0.125.

Proximal promoters have two occurrences among eight possibilities for an occurrence of 0.25.

Distal promoters have twenty-eight consensus sequences in the negative direction for an occurrence 3.5.

In the positive direction has twenty-three consensus sequences in the positive direction for an occurrence 2.875.

The randoms had seventeen UTR consensus sequences for an occurrence of 1.7.

The randoms had one core promoter from twenty opportunities for an occurrence of 0.05.

In the proximal promoters, the randoms had three in the arbitrary negative direction and four in the positive direction for occurrences of 0.3 and 0.4.

For the distal promoters, the negative direction had twenty-nine consensus sequences for an occurrence of 2.9.

In the positive direction, the randoms had thirty-four consensual sequences for an occurrence of 3.4.

In comparison for the distal promoters, the random sequences had approximately the same occurrences as the reals. For the proximal promoters the randoms had slightly more occurrences than the reals. For the core promoters, the randoms had slightly less occurrences. For the UTR the randoms had slightly less occurrences than the reals (1.7 vs. 2.0). Based on the UTR and core promoters it appears that the reals are likely active or activable.

(Motojima) D-box analysis and results

D-box (TGAGTGG).[2]

The real promoters have two inverse complements in the UTR positive strand, negative direction: CCACTCA at 4487 nucleotides from the end of gene ZSCAN22 and negative strand, negative direction: CCACTCA at 3827, for an occurrence of 0.5.

In the distal promoters, there is an inverse complement (ic) between ZNF497 and A1BG negative strand, positive direction: TGAGTGG at 3449 for an occurrence of 0.25.

The random datasets had one UTR TGAGTGG at 4502 for an occurrence of 0.1.

They had one proximal promoter D-box consensus sequence: TGAGTGG at 4148 in the arbitrary positive direction for an occurrence of 0.05.

The distal promoters had two consensus sequence ics: CCACTCA at 1766 and CCACTCA at 1365 for an occurrence of 0.1.

Comparing the two results, the occurrences are higher for the real UTR consensus sequences and the distal promoter consensus sequences than the randoms suggesting that the reals are likely active or activable.

Destruction box

"The ordered progression through the cell cycle depends on regulating the abundance of several proteins through ubiquitin-mediated proteolysis. Degradation is precisely timed and specific. One key component of the degradation system, the anaphase promoting complex (APC), is a ubiquitin protein ligase. It is activated both during mitosis and late in mitosis/G1, by the WD repeat proteins Cdc20 and Cdh1, respectively. These activators target distinct sets of substrates. Cdc20–APC requires a well-defined destruction box (D box), [...]."[6]

"The budding yeast homolog of Cdc20 contains two destruction boxes [...], but the vertebrate homologs lack any motif similar to the R-L-N of the D box."[6]

The destruction box R-L-N[6] is CGN-(C/T)TN-AAN, but for leucine it's TT(A/G) or CTN. This can be searched for using CGN(C/T)TNAAN.

Destruction box samplings

For the Basic programs testing consensus sequence CGN(C/T)TNAAN (starting with SuccessablesDest.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. Negative strand, negative direction: 2, CGTTTTAAG at 3316, CGTCTGAAC at 1618.
  2. Positive strand, negative direction: 2, CGCTTGAAC at 2716, CGCTTGAAC at 845.
  3. Negative strand, positive direction: 1, CGCCTCAAC at 3290.
  4. Positive strand, positive direction: 1, CGATTAAAA at 2441.
  5. inverse complement, negative strand, negative direction: 8, TTTGAGCCG at 3597, TTTTAATCG at 3177, TTTTAGTCG at 2650, TTTTAATCG at 1888, TTTTAATCG at 1235, TTTTAATCG at 778, TTTTAACCG at 644, TTTTAATCG at 500.
  6. inverse complement, positive strand, negative direction: 1, GTTGAATCG at 2709.
  7. inverse complement, negative strand, positive direction: 0.
  8. inverse complement, positive strand, positive direction: 1, CTTGAGTCG at 4052.

Dest (4560-2846) UTRs

  1. Negative strand, negative direction: CGTTTTAAG at 3316.
  2. Negative strand, negative direction: TTTGAGCCG at 3597, TTTTAATCG at 3177.

Dest negative direction (2811-2596) proximal promoters

  1. Negative strand, negative direction: TTTTAGTCG at 2650.
  2. Positive strand, negative direction: CGCTTGAAC at 2716.
  3. Positive strand, negative direction: GTTGAATCG at 2709.

Dest positive direction (4265-4050) proximal promoters

  1. Positive strand, positive direction: CTTGAGTCG at 4052.

Dest negative direction (2596-1) distal promoters

  1. Negative strand, negative direction: CGTCTGAAC at 1618.
  2. Negative strand, negative direction: TTTTAATCG at 1888, TTTTAATCG at 1235, TTTTAATCG at 778, TTTTAACCG at 644, TTTTAATCG at 500.
  3. Positive strand, negative direction: CGCTTGAAC at 2716, CGCTTGAAC at 845.

Dest positive direction (4050-1) distal promoters

  1. Negative strand, positive direction: CGCCTCAAC at 3290.
  2. Positive strand, positive direction: CGATTAAAA at 2441.

Destruction box random dataset samplings

  1. Destr0: 3, CGGCTAAAT at 3925, CGGTTTAAG at 3298, CGCTTAAAG at 630.
  2. Destr1: 4, CGTCTAAAG at 3124, CGACTAAAC at 1832, CGACTAAAT at 1619, CGACTCAAG at 622.
  3. Destr2: 4, CGTTTAAAG at 4291, CGCCTAAAA at 2779, CGTCTAAAT at 2730, CGTTTCAAT at 69.
  4. Destr3: 3, CGGCTAAAA at 3009, CGGCTGAAG at 1447, CGTCTTAAC at 594.
  5. Destr4: 2, CGTCTAAAA at 2319, CGACTCAAG at 329.
  6. Destr5: 6, CGACTTAAT at 4548, CGCCTAAAA at 3780, CGGTTCAAT at 2987, CGACTTAAC at 2142, CGGCTCAAC at 572, CGACTTAAC at 101.
  7. Destr6: 2, CGGTTTAAG at 4070, CGCTTTAAG at 2255.
  8. Destr7: 2, CGGCTCAAA at 1591, CGCCTCAAA at 807.
  9. Destr8: 4, CGGTTGAAC at 3796, CGTTTTAAA at 3607, CGTTTTAAA at 3573, CGCTTTAAA at 476.
  10. Destr9: 5, CGCCTTAAT at 3556, CGGTTTAAC at 3285, CGCCTCAAG at 1420, CGGCTAAAA at 892, CGACTTAAC at 516.
  11. Destr0ci: 2, TTTGAAACG at 3418, GTTCAAACG at 3105.
  12. Destr1ci: 3, GTTTAAACG at 3347, CTTCAAACG at 1957, CTTGAACCG at 393.
  13. Destr2ci: 1, ATTTAGCCG at 2524.
  14. Destr3ci: 4, GTTTAACCG at 2155, ATTTAACCG at 1092, GTTCAGCCG at 862, ATTTAGCCG at 363.
  15. Destr4ci: 1, CTTTAGGCG at 1951.
  16. Destr5ci: 4, TTTGAATCG at 3275, CTTCAAACG at 2453, CTTCAGTCG at 2194, GTTAAGCCG at 1079.
  17. Destr6ci: 2, ATTTAGGCG at 1148, CTTCAGACG at 1045.
  18. Destr7ci: 6, ATTTAAACG at 2982, TTTCAAACG at 2187, GTTTAGGCG at 1963, CTTTAGTCG at 1436, TTTCAAGCG at 772, ATTTAAGCG at 334.
  19. Destr8ci: 4, GTTGAACCG at 4456, TTTGAGTCG at 3402, GTTAAAGCG at 2271, TTTGAACCG at 2259.
  20. Destr9ci: 2, CTTAAATCG at 4302, GTTGAGACG at 2425.

Destr arbitrary (evens) (4560-2846) UTRs

  1. Destr0: CGGCTAAAT at 3925, CGGTTTAAG at 3298.
  2. Destr2: CGTTTAAAG at 4291.
  3. Destr6: CGGTTTAAG at 4070.
  4. Destr8: CGGTTGAAC at 3796, CGTTTTAAA at 3607, CGTTTTAAA at 3573.
  5. Destr0ci: TTTGAAACG at 3418, GTTCAAACG at 3105.
  6. Destr8ci: GTTGAACCG at 4456, TTTGAGTCG at 3402.

Destr alternate (odds) (4560-2846) UTRs

  1. Destr1: CGTCTAAAG at 3124.
  2. Destr3: CGGCTAAAA at 3009.
  3. Destr5: CGACTTAAT at 4548, CGCCTAAAA at 3780, CGGTTCAAT at 2987.
  4. Destr9: CGCCTTAAT at 3556, CGGTTTAAC at 3285.
  5. Destr1ci: GTTTAAACG at 3347.
  6. Destr5ci: TTTGAATCG at 3275.
  7. Destr7ci: ATTTAAACG at 2982.
  8. Destr9ci: CTTAAATCG at 4302.

Destr arbitrary positive direction (odds) (4445-4265) core promoters

  1. Destr9ci: CTTAAATCG at 4302.

Destr alternate positive direction (evens) (4445-4265) core promoters

  1. Destr2: CGTTTAAAG at 4291.

Destr arbitrary negative direction (evens) (2811-2596) proximal promoters

  1. Destr2: CGCCTAAAA at 2779, CGTCTAAAT at 2730.

Destr alternate positive direction (evens) (4265-4050) proximal promoters

  1. Destr6: CGGTTTAAG at 4070.

Destr arbitrary negative direction (evens) (2596-1) distal promoters

  1. Destr0: CGCTTAAAG at 630.
  2. Destr2: CGTTTCAAT at 69.
  3. Destr4: CGTCTAAAA at 2319, CGACTCAAG at 329.
  4. Destr6: CGCTTTAAG at 2255.
  5. Destr8: CGCTTTAAA at 476.
  6. Destr2ci: ATTTAGCCG at 2524.
  7. Destr4ci: CTTTAGGCG at 1951.
  8. Destr6ci: ATTTAGGCG at 1148, CTTCAGACG at 1045.
  9. Destr8ci: GTTAAAGCG at 2271, TTTGAACCG at 2259.

Destr alternate negative direction (odds) (2596-1) distal promoters

  1. Destr1: CGACTAAAC at 1832, CGACTAAAT at 1619, CGACTCAAG at 622.
  2. Destr3: CGGCTGAAG at 1447, CGTCTTAAC at 594.
  3. Destr5: CGACTTAAC at 2142, CGGCTCAAC at 572, CGACTTAAC at 101.
  4. Destr7: CGGCTCAAA at 1591, CGCCTCAAA at 807.
  5. Destr9: CGCCTCAAG at 1420, CGGCTAAAA at 892, CGACTTAAC at 516.
  6. Destr1ci: CTTCAAACG at 1957, CTTGAACCG at 393.
  7. Destr3ci: GTTTAACCG at 2155, ATTTAACCG at 1092, GTTCAGCCG at 862, ATTTAGCCG at 363.
  8. Destr5ci: CTTCAAACG at 2453, CTTCAGTCG at 2194, GTTAAGCCG at 1079.
  9. Destr7ci: TTTCAAACG at 2187, GTTTAGGCG at 1963, CTTTAGTCG at 1436, TTTCAAGCG at 772, ATTTAAGCG at 334.
  10. Destr9ci: GTTGAGACG at 2425.

Destr arbitrary positive direction (odds) (4050-1) distal promoters

  1. Destr1: CGTCTAAAG at 3124, CGACTAAAC at 1832, CGACTAAAT at 1619, CGACTCAAG at 622.
  2. Destr3: CGGCTAAAA at 3009, CGGCTGAAG at 1447, CGTCTTAAC at 594.
  3. Destr5: CCGCCTAAAA at 3780, CGGTTCAAT at 2987, CGACTTAAC at 2142, CGGCTCAAC at 572, CGACTTAAC at 101.
  4. Destr7: CGGCTCAAA at 1591, CGCCTCAAA at 807.
  5. Destr9: CGCCTTAAT at 3556, CGGTTTAAC at 3285, CGCCTCAAG at 1420, CGGCTAAAA at 892, CGACTTAAC at 516.
  6. Destr1ci: GTTTAAACG at 3347, CTTCAAACG at 1957, CTTGAACCG at 393.
  7. Destr3ci: GTTTAACCG at 2155, ATTTAACCG at 1092, GTTCAGCCG at 862, ATTTAGCCG at 363.
  8. Destr5ci: TTTGAATCG at 3275, CTTCAAACG at 2453, CTTCAGTCG at 2194, GTTAAGCCG at 1079.
  9. Destr7ci: 6, ATTTAAACG at 2982, TTTCAAACG at 2187, GTTTAGGCG at 1963, CTTTAGTCG at 1436, TTTCAAGCG at 772, ATTTAAGCG at 334.
  10. Destr9ci: GTTGAGACG at 2425.

Destr alternate positive direction (evens) (4050-1) distal promoters

  1. Destr0: CGGCTAAAT at 3925, CGGTTTAAG at 3298, CGCTTAAAG at 630.
  2. Destr2: CGCCTAAAA at 2779, CGTCTAAAT at 2730, CGTTTCAAT at 69.
  3. Destr4: CGTCTAAAA at 2319, CGACTCAAG at 329.
  4. Destr6: CGCTTTAAG at 2255.
  5. Destr8: CGGTTGAAC at 3796, CGTTTTAAA at 3607, CGTTTTAAA at 3573, CGCTTTAAA at 476.
  6. Destr0ci: TTTGAAACG at 3418, GTTCAAACG at 3105.
  7. Destr2ci: ATTTAGCCG at 2524.
  8. Destr4ci: CTTTAGGCG at 1951.
  9. Destr6ci: ATTTAGGCG at 1148, CTTCAGACG at 1045.
  10. Destr8ci: TTTGAGTCG at 3402, GTTAAAGCG at 2271, TTTGAACCG at 2259.

Destruction box analysis and results

The destruction box R-L-N[6] is CGN-(C/T)TN-AAN, but for leucine it's TT(A/G) or CTN.

Reals or randoms Promoters direction Numbers Strands Occurrences Averages (± 0.1)
Reals UTR negative 3 2 1.5 1.5 ± 0.5 (--3,+-0)
Randoms UTR arbitrary negative 11 10 1.1 1.1 ± 0
Randoms UTR alternate negative 11 10 1.1 1.1 ± 0
Reals Core negative 0 2 0 0
Randoms Core arbitrary negative 0 10 0 0
Randoms Core alternate negative 0 10 0 0
Reals Core positive 0 2 0 0
Randoms Core arbitrary positive 1 10 0.1 0.1 ± 0
Randoms Core alternate positive 1 10 0.1 0.1 ± 0
Reals Proximal negative 3 2 1.5 1.5 ± 0.5 (--1,+-2)
Randoms Proximal arbitrary negative 2 10 0.2 0.1 ± 0.1
Randoms Proximal alternate negative 0 10 0 0.1 ± 0.1
Reals Proximal positive 1 2 0.5 0.5 ± 0.5 (-+0,++1)
Randoms Proximal arbitrary positive 0 10 0 0.05 ± 0.05
Randoms Proximal alternate positive 1 10 0.1 0.05 ± 0.05
Reals Distal negative 8 2 4 4 ± 2 (--6,+-2)
Randoms Distal arbitrary negative 12 10 1.2 2.0 ± 0.8
Randoms Distal alternate negative 28 10 2.8 2.0 ± 0.8
Reals Distal positive 2 2 1 1 ± 0 (-+1,++1)
Randoms Distal arbitrary positive 37 10 3.7 2.95 ± 0.75
Randoms Distal alternate positive 22 10 2.2 2.95 ± 0.75

Comparison:

The occurrences of real Dest UTRs, proximals are greater than the randoms, negative distals overlap the high end randoms, positive distals are lower than the randoms. This suggests that the real Dests are likely active or activable.

KEN box

The KEN box, lysine glutamate asparagine or AA(A/G)GA(A/G)AA(C/T), serves as a general targeting signal for Cdh1–APC.[6]

"Selection of APC/C targets is controlled through recognition of short destruction motifs, predominantly the D box and KEN box."[7]

"The classical APC/C degron is the destruction box or D box, a nine-residue motif (RxxLxxI/VxN), first characterized in B-type cyclins as being sufficient for APC/C-mediated ubiquitylation [...], common to most, but not all APC/C substrates. Another APC/C degron, the KEN motif (KENxxxN/D), is often present in APC/C substrates usually in addition to the D box [...]."[7]

KEN box samplings

For the Basic programs testing consensus sequence AA(A/G)GA(A/G)AA(C/T) (starting with SuccessablesKEN.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. Negative strand, negative direction: 0.
  2. Positive strand, negative direction: 3, AAAGAAAAC at 4396, AAGGAAAAC at 2969, AAAGAAAAC at 2840.
  3. Negative strand, positive direction: 0.
  4. Positive strand, positive direction: 1, AAAGAGAAC at 4388.
  5. inverse complement, negative strand, negative direction: 0.
  6. inverse complement, positive strand, negative direction: 0.
  7. inverse complement, negative strand, positive direction: 0.
  8. inverse complement, positive strand, positive direction: 0.

KEN (4560-2846) UTRs

  1. Positive strand, negative direction: AAAGAAAAC at 4396, AAGGAAAAC at 2969.

KEN negative direction (2846-2811) core promoters

  1. Positive strand, negative direction: AAAGAAAAC at 2840.

KEN positive direction (4445-4265) core promoters

  1. Positive strand, positive direction: AAAGAGAAC at 4388.

KEN random dataset samplings

  1. KENr0: 1, AAAGAAAAC at 635.
  2. KENr1: 0.
  3. KENr2: 1, AAAGAAAAC at 1850.
  4. KENr3: 0.
  5. KENr4: 1, AAGGAAAAC at 3905.
  6. KENr5: 0.
  7. KENr6: 0.
  8. KENr7: 0.
  9. KENr8: 1, AAAGAGAAT at 2117.
  10. KENr9: 1, AAGGAAAAC at 593.
  11. KENr0ci: 0.
  12. KENr1ci: 1, ATTTTGCTT at 4224.
  13. KENr2ci: 0.
  14. KENr3ci: 0.
  15. KENr4ci: 0.
  16. KENr5ci: 0.
  17. KENr6ci: 0.
  18. KENr7ci: 0.
  19. KENr8ci: 1, GTTTTGCTT at 3379.
  20. KENr9ci: 1, GTTTTGCTT at 710.

KENr arbitrary (evens) (4560-2846) UTRs

  1. KENr4: AAGGAAAAC at 3905.
  2. KENr8ci: GTTTTGCTT at 3379.

KENr alternate (odds) (4560-2846) UTRs

  1. KENr1ci: ATTTTGCTT at 4224.

KENr arbitrary positive direction (odds) (4265-4050) proximal promoters

  1. KENr1ci: ATTTTGCTT at 4224.

KENr arbitrary negative direction (evens) (2596-1) distal promoters

  1. KENr0: AAAGAAAAC at 635.
  2. KENr2: AAAGAAAAC at 1850.
  3. KENr8: AAAGAGAAT at 2117.

KENr alternate negative direction (odds) (2596-1) distal promoters

  1. KENr9: AAGGAAAAC at 593.
  2. KENr9ci: GTTTTGCTT at 710.

KENr arbitrary positive direction (odds) (4050-1) distal promoters

  1. KENr4: AAGGAAAAC at 3905.
  2. KENr9: AAGGAAAAC at 593.
  3. KENr9ci: GTTTTGCTT at 710.

KENr alternate positive direction (evens) (4050-1) distal promoters

  1. KENr0: AAAGAAAAC at 635.
  2. KENr2: AAAGAAAAC at 1850.
  3. KENr8: AAAGAGAAT at 2117.
  4. KENr8ci: GTTTTGCTT at 3379.

KEN analysis and results

The KEN box, lysine glutamate asparagine or AA(A/G)GA(A/G)AA(C/T), serves as a general targeting signal for Cdh1–APC.[6]

Reals or randoms Promoters direction Numbers Strands Occurrences Averages (± 0.1)
Reals UTR negative 2 2 1 1 ± 1 (--0,+-2)
Randoms UTR arbitrary negative 2 10 0.2 0.15 ± 0.05
Randoms UTR alternate negative 1 10 0.1 0.15 ± 0.05
Reals Core negative 1 2 0.5 0.5 ± 0.5 (--0,+-1)
Randoms Core arbitrary negative 0 10 0 0
Randoms Core alternate negative 0 10 0 0
Reals Core positive 1 2 0.5 0.5 ± 0.5 (-+0,++1)
Randoms Core arbitrary positive 0 10 0 0
Randoms Core alternate positive 0 10 0 0
Reals Proximal negative 0 2 0 0
Randoms Proximal arbitrary negative 0 10 0 0
Randoms Proximal alternate negative 0 10 0 0
Reals Proximal positive 0 2 0 0
Randoms Proximal arbitrary positive 1 10 0.1 0.05 ± 0.05
Randoms Proximal alternate positive 0 10 0 0.05 ± 0.05
Reals Distal negative 0 2 0 0
Randoms Distal arbitrary negative 3 10 0.3 0.25 ± 0.05
Randoms Distal alternate negative 2 10 0.2 0.25 ± 0.05
Reals Distal positive 0 2 0 0
Randoms Distal arbitrary positive 3 10 0.3 0.35 ± 0.05
Randoms Distal alternate positive 4 10 0.4 0.35 ± 0.05

Comparison:

The occurrences of real KEN box UTRs and cores are greater than the randoms. This suggests that the real KENs are likely active or activable.

Acknowledgements

The content on this page was first contributed by: Henry A. Hoff.

Initial content for this page in some instances came from Wikiversity.

See also

References

  1. 1.0 1.1 1.2 1.3 1.4 Dmitry A.Samarsky, Maurille J.Fournier, Robert H.Singer and Edouard Bertrand (1 July 1998). "The snoRNA box C/D motif directs nucleolar targeting and also couples snoRNA synthesis and localization" (PDF). The European Molecular Biology Organization (EMBO) Journal. 17 (13): 3747–3757. doi:10.1093/emboj/17.13.3747. PMID 9649444. Retrieved 2017-02-04.
  2. 2.0 2.1 2.2 2.3 Masaru Motojima, Takao Ando and Toshimasa Yoshioka (10 July 2000). "Sp1-like activity mediates angiotensin-II-induced plasminogen-activator inhibitor type-1 (PAI-1) gene expression in mesangial cells" (PDF). Biomedical Journal. 349 (2): 435–441. doi:10.1042/0264-6021:3490435. PMID 10880342. Retrieved 13 August 2020.
  3. 3.0 3.1 E. N. Voronina, T. D. Kolokol’tsova, E. A. Nechaeva, and M. L. Filipenko (2003). "Structural–Functional Analysis of the Human Gene for Ribosomal Protein L11" (PDF). Molecular Biology. 37 (3): 362–371. Retrieved 11 April 2019.
  4. PA Johnson, D Bunick, NB Hecht (1991). "Protein Binding Regions in the Mouse and Rat Protamine-2 Genes" (PDF). Biology of Reproduction. 44 (1): 127–134. Retrieved 6 April 2019.
  5. Philipp Mracek, Cristina Santoriello, M. Laura Idda, Cristina Pagano, Zohar Ben-Moshe, Yoav Gothilf, Daniela Vallone, Nicholas S. Foulkes (December 6, 2012). "Regulation of per and cry Genes Reveals a Central Role for the D-Box Enhancer in Light-Dependent Gene Expression". PLoS ONE. 7 (12): e51278. doi:10.1371/journal.pone.0051278. Retrieved 10 February 2019.
  6. 6.0 6.1 6.2 6.3 6.4 6.5 Cathie M. Pfleger and Marc W. Kirschner (15 March 2000). "The KEN box: an APC recognition signal distinct from the D box targeted by Cdh1". Genes & Development. 14 (6): 655–665. PMID 10733526. Retrieved 10 May 2023.
  7. 7.0 7.1 David Barford (27 December 2011). "Structural insights into anaphase-promoting complex function and mechanism". Philosophical Transactions of the Royal Society B: Biological Sciences. 366 (1584): 3605–3624. doi:10.1098/rstb.2011.0069. PMID 22084387. Retrieved 10 May 2023.

External links