Jump to navigation Jump to search

Editor-In-Chief: Henry A. Hoff Template:TOCright

File:UCSC human chromosome colours.png
This is an image of the 46 chromosomes making up the diploid genome of a human male. (The mitochondrial chromosome is not shown.) Credit: HYanWong.

The genome is the entirety of an organism's hereditary information. In humans, it is encoded in DNA. The genome includes both the genes and the non-coding sequences of the DNA.[1]

Genetic information is encoded as a sequence of nucleobases: adenine (A), cytosine (C), guanine (G), and thymine (T).

There are "3.2 billion base pairs in the human genome."[2]

Associated with genomes are epigenomes.

Theoretical genomes

Def. the "complete genetic information ... of an organism"[3] is called a genome.


File:Map of the human mitochondrial genome.svg
Map of the human mitochondrial DNA genome has 16569 bp. Credit: Emmanuel Douzery.{{free media}}

Genomics is a branch of molecular biology concerned with the structure, function, evolution, and mapping of genomes.

The H (heavy, outer circle) and L (light, inner circle) strands are given with their corresponding genes. There are 22 transfer RNA (TRN) genes for the following amino acids: F, V, L1 (codon UUA/G), I, Q, M, W, A, N, C, Y, S1 (UCN), D, K, G, R, H, S2 (AGC/U), L2 (CUN), E, T and P (white boxes). There are 2 ribosomal RNA (RRN) genes: S (small subunit, or 12S) and L (large subunit, or 16S) (blue boxes). There are 13 protein-coding genes: 7 for NADH dehydrogenase subunits (ND, yellow boxes), 3 for cytochrome c oxidase subunits (COX, orange boxes), 2 for ATPase subunits (ATP, red boxes), and one for cytochrome b (CYTB, coral box). Two gene overlaps are indicated (ATP8-ATP6, and ND4L-ND4, black boxes).

The control region (CR) is the longest non-coding sequence (grey box). Its three hyper-variable regions are indicated (HV, green boxes).


File:Complete Histone with DNA.png
This is a schematic representation of a nucleosome. Credit: Zephyris.

Inside each eukaryote nucleus is genetic material (DNA) surrounded by protective and regulatory proteins. These protective and regulatory proteins and the dynamic changes to them that occur during the course of a eukaryote's existence are the epigenome.

An epigenome consists of a record of the chemical changes to the DNA and histone proteins of an organism that can be passed down to an organism's offspring via transgenerational epigenetic inheritance, where changes to the epigenome can result in changes to the structure of chromatin and changes to the function of the genome.[4]

Unlike the underlying genome which is largely static within an individual, the epigenome can be dynamically altered by environmental conditions.[5]

Deoxyribonucleic acids

Deoxyribonucleic acid (DNA) is composed of nucleobases (the sequence of which is the epigenome), deoxyribose (a sugar), and phosphate groups. Each nucleobase is attached to one deoxyribose molecule and one (PO4) phosphate molecule to form a chain of nucleotides (nucleobase + deoxyribose + phosphate) for a haploid genome. A linking of nucleobases may occur without the phosphate or the deoxyribose. The phosphate and the sugar are part of the epigenome.


A haploid genome contains non-repetitive DNA and repetitive DNA. Non-repetitive DNA consists mainly of coding DNA. In eukaryotes, the coding DNA consists of genes having exon-intron organization. The major part of mammalian genomes is repetitive DNA.[6]

In eukaryotes such as plants, protozoa and animals, however, "genome" carries the typical connotation of only information on chromosomal DNA. So although these organisms contain chloroplasts and/or mitochondria that have their own DNA, the genetic information contained by DNA within these organelles is not considered part of the genome. In fact, mitochondria are sometimes said to have their own genome often referred to as the "mitochondrial genome". The DNA found within the chloroplast may be referred to as the "plastome".

Genetic information is encoded as a sequence of nucleobases: adenine (A), cytosine (C), guanine (G), and thymine (T).

Human DNA

File:DNA NoBB.png
This diagram of the structure of DNA shows the four bases; adenine, cytosine, guanine and thymine, and the location of the major and minor groove. Credit: Zephyris.

"[H]uman DNA has millions of on-off switches and complex networks that control the genes' activities. ... [A]t least 80% of the human genome is active, which opposed the previously held idea that most of the DNA are useless."[7]

"DNA contains genes, which hold the instructions for [life. But, these] take up only about 2 percent of the genome ... The human genome is made up of about 3 billion “letters” along strands that make up the familiar double helix structure of DNA. Particular sequences of these letters form genes, which tell cells how to make proteins. People have about 20,000 genes, but the vast majority of DNA lies outside of genes. ... [A]t least three-quarters of the genome is involved in making RNA [...] it appears to help regulate gene activity."[8]

Non-repetitive DNA

Each gene has exons interspaced with introns, usually alternating along the DNA template strand. Before and after these are the 5' and 3' untranslated regions, by convention, respectively. Between genes usually before the gene in the direction of transcription are nucleotide sequences that contain the gene promoters.

Mitochondrial DNA

File:Mitochondrial DNA en.svg
Mitochondrial genome is a circular double strand molecule. Credit: derivative work of Shanel.

Mitochondrial DNA (mtDNA) is a circular double strand molecule as in the image on the right which has a length of 15-20 kilobases in animals. In most species it has the same 37 genes that codify for 13 proteins, 2 ribosomal RNAs and 22 transfer RNAs.

The human mtDNA was the first mitochondrial genome sequenced. This first complete sequence was called Cambridge Reference Sequence (CRS).

This mtDNA is composed of 16,569 base pairs (bp) with their genes distributed between the H chain (high) and the L chain (light). With the exception of the Control Region (D-loop) which has regulatory functions and a 9 bp region called V Region, all the rest of the genome consists in coding DNA.

In the last 30 years, the mtDNA has been widely used in human evolution studies as a consequence of its particular characteristics that make it an ideal and useful tool.

Because of its non-coding nature, the Control Region exhibits the highest mutation rate in the mitochondrial genome. Several studies have demonstrated the strictly maternal inheritance of mtDNA, a phenomenon which represents an enormous advantage because it allows tracing related matrilineages along time without all inherent nuclear DNA problems like recombination and biparental inheritance.[9]

Viral genomes

Viral genomes can be composed of either RNA or DNA. The genomes of RNA viruses can be either single-stranded or double-stranded RNA], and may contain one or more separate RNA molecules. DNA viruses can have either single-stranded or double-stranded genomes. Most DNA virus genomes are composed of a single, linear molecule of DNA, but some are made up of a circular DNA molecule.[10]

Archaea genomes

Archaea have a single circular chromosome.[11]

Prokaryotic genomes

Most bacteria also have a single circular chromosome; however, some bacterial species have linear chromosomes[12] or multiple chromosomes.[13] If the DNA is replicated faster than the bacterial cells divide, multiple copies of the chromosome can be present in a single cell. Most prokaryotes have very little repetitive DNA in their genomes.[14] However, some symbiotic bacteria (e.g. Serratia symbiotica) have reduced genomes and a high fraction of pseudogenes: only ~40% of their DNA encodes proteins.[15][16]

Some bacteria have auxiliary genetic material, which is carried in plasmids.

Eukaryotic genomes

File:Components of the human genome.png
Diagram of components of the genome is estimated in 2014. Credit: NHS National Genetics and Genomics Education Centre.

Eukaryotic genomes are composed of one or more linear DNA chromosomes. The number of chromosomes varies widely from Jack jumper ants and Diploscapter pachys an asexual nemotode,[17] which each have only one pair, to an Ophioglossum or fern species that has 720 pairs.[18] A typical human cell has two copies of each of 22 autosomes, one inherited from each parent, plus two sex chromosomes, making it diploid. Gametes, such as ova, sperm, spores, and pollen, are haploid, meaning they carry only one copy of each chromosome.

In addition to the chromosomes in the nucleus, organelles such as the chloroplasts and mitochondria have their own DNA. Mitochondria are sometimes said to have their own genome often referred to as the "mitochondrial genome". The DNA found within the chloroplast may be referred to as the "plastome". Like the bacteria they originated from, mitochondria and chloroplasts have a circular chromosome.

Unlike prokaryotes, eukaryotes have exon-intron organization of protein coding genes and variable amounts of repetitive DNA. In mammals and plants, the majority of the genome is composed of repetitive DNA.[19]

A larger genome does not necessarily contain more genes, and the proportion of non-repetitive DNA decreases along with increasing genome size in complex eukaryotes.[19]

Simple eukaryotes such as Caenorhabditis elegans and Drosophila melanogaster (fruit fly), have more non-repetitive DNA than repetitive DNA,[19][20] while the genomes of more complex eukaryotes tend to be composed largely of repetitive DNA.[21] In some plants and amphibians, the proportion of repetitive DNA is more than 80%.[19] Similarly, only 2% of the human genome codes for proteins.

Non-coding sequences

Noncoding sequences include introns, sequences for non-coding RNAs, regulatory regions, and repetitive DNA. Noncoding sequences make up 98% of the human genome. There are two categories of repetitive DNA in the genome: tandem repeats and interspersed repeats.[22]

Tandem repeats

Short, non-coding sequences that are repeated head-to-tail are called tandem repeats. Microsatellites consisting of 2-5 basepair repeats, while minisatellite repeats are 30-35 bp. Tandem repeats make up about 4% of the human genome and 9% of the fruit fly genome.[23] Tandom repeats can be functional. For example, telomeres are composed of the tandem repeat TTAGGG in mammals, and they play an important role in protecting the ends of the chromosome.

In other cases, expansions in the number of tandem repeats in exons or introns can cause disease.[24] For example, the human gene huntingtin typically contains 6-29 tandem repeats of the nucleotides CAG (encoding a polyglutamine tract). An expansion to over 36 repeats results in Huntington's disease, a neurodegenerative disease. Twenty human disorders are known to result from similar tandem repeat expansions in various genes. The mechanism by which proteins with expanded polygulatamine tracts cause death of neurons is not fully understood. One possibility is that the proteins fail to fold properly and avoid degradation, instead accumulating in aggregates that also sequester important transcription factors, thereby altering gene expression.[24]

Tandem repeats are usually caused by slippage during replication, unequal crossing-over and gene conversion.[25]

Transposable elements

Transposable elements (TEs) are sequences of DNA with a defined structure that are able to change their location in the genome.[23][14][26] TEs are categorized as either class I TEs, which replicate by a copy-and-paste mechanism, or class II TEs, which can be excised from the genome and inserted at a new location.

The movement of TEs is a driving force of genome evolution in eukaryotes because their insertion can disrupt gene functions, homologous recombination between TEs can produce duplications, and TE can shuffle exons and regulatory sequences to new locations.[27]


Retrotransposons can be transcribed into RNA, which are then duplicated at another site into the genome.[28] Retrotransposons can be divided into Long terminal repeats (LTRs) and Non-Long Terminal Repeats (Non-LTR).[27]

Long terminal repeats (LTRs) are derived from ancient retroviral infections, so they encode proteins related to retroviral proteins including gag (structural proteins of the virus), pol (reverse transcriptase and integrase), pro (protease), and in some cases env (envelope) genes.[26] These genes are flanked by long repeats at both 5' and 3' ends. It has been reported that LTRs consist of the largest fraction in most plant genome and might account for the huge variation in genome size.[29]

Non-long terminal repeats (Non-LTRs) are classified as long interspersed elements (LINEs), short interspersed elements (SINEs), and Penelope-like elements. In Dictyostelium discoideum, there is another DIRS-like elements belong to Non-LTRs. Non-LTRs are widely spread in eukaryotic genomes.[30]

Long interspersed elements (LINEs) encode genes for reverse transcriptase and endonuclease, making them autonomous transposable elements. The human genome has around 500,000 LINEs, taking around 17% of the genome.[31]

Short interspersed elements (SINEs) are usually less than 500 base pairs and are non-autonomous, so they rely on the proteins encoded by LINEs for transposition.[32] The Alu element is the most common SINE found in primates. It is about 350 base pairs and occupies about 11% of the human genome with around 1,500,000 copies.[27]

DNA transposons

DNA transposons encode a transposase enzyme between inverted terminal repeats, which when expressed, recognizes the terminal inverted repeats that flank the transposon and catalyzes its excision and reinsertion in a new site.[23] This cut-and-paste mechanism typically reinserts transposons near their original location (within 100kb).[27] DNA transposons are found in bacteria and make up 3% of the human genome and 12% of the genome of the roundworm C. elegans.[27]


  1. The human genome may be less than 10 % human.


The content on this page was first contributed by: Henry A. Hoff.

Initial content for this page in some instances came from Wikiversity.

See also


  1. Ridley, M. (2006). Genome. New York, NY: Harper Perennial. ISBN 0-06-019497-9
  2. Heidi Chial (2008). "DNA sequencing technologies key to the Human Genome Project". Nature Education. 1 (1): 219. Retrieved 2017-06-20.
  3. Msh210 (20 March 2009). genome. San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2012-10-30.
  4. Bradley E. Bernstein, Alexander Meissner, Eric S. Lander (2007). "The Mammalian Epigenome" (PDF). Cell. 128 (4): 669–81. doi:10.1016/j.cell.2007.01.033. Retrieved 19 December 2011. Unknown parameter |month= ignored (help)
  5. Conley, A.B., King Jordan, I. (2012). Endogenous Retroviruses and the Epigenome. In: Witzany, G. (ed). Viruses: Essential Agents of Life, Springer, Dordrecht, pp. 309-323.
  6. Benjamin Lewin (2004). Genes VIII (8th ed.). Upper Saddle River, NJ: Pearson/Prentice Hall. ISBN 0-13-143981-2.
  7. Bryan McBournie (6 September 2012). Human genome study could unlock the biology of disease. Sigma Xi. Retrieved 2012-09-06.
  8. Malcolm Ritter (6 September 2012). Far from being mostly junk, human DNA is ‘a jungle’ of complex activity, huge project shows. The Washington Post. Retrieved 2012-09-06.
  9. Pakendorf, B. and M. Stoneking (2005). "Mitochondrial DNA and human evolution." Annu Rev Genomics Hum Genet 6: 165-83.
  10. Gelderblom, Hans R. (1996). Medical Microbiology (4th ed.). Galveston, TX: The University of Texas Medical Branch at Galveston.
  11. Samson RY, Bell SD (2014). "Archaeal chromosome biology". Journal of Molecular Microbiology and Biotechnology. 24 (5–6): 420–7. doi:10.1159/000368854. PMC 5175462. PMID 25732343.
  12. Chaconas G, Chen CW (2005). "Replication of Linear Bacterial Chromosomes: No Longer Going Around in Circles". The Bacterial Chromosome: 525. doi:10.1128/9781555817640.ch29.
  13. Bacterial Chromosomes. 2002.
  14. 14.0 14.1 Koonin EV, Wolf YI (July 2010). "Constraints and plasticity in genome and molecular-phenome evolution". Nature Reviews. Genetics. 11 (7): 487–98. doi:10.1038/nrg2810. PMC 3273317. PMID 20548290.
  15. McCutcheon JP, Moran NA (November 2011). "Extreme genome reduction in symbiotic bacteria". Nature Reviews. Microbiology. 10 (1): 13–26. doi:10.1038/nrmicro2670. PMID 22064560.
  16. Land M, Hauser L, Jun SR, Nookaew I, Leuze MR, Ahn TH, Karpinets T, Lund O, Kora G, Wassenaar T, Poudel S, Ussery DW (March 2015). "Insights from 20 years of bacterial genome sequencing". Functional & Integrative Genomics. 15 (2): 141–61. doi:10.1007/s10142-015-0433-4. PMID 25722247.
  17. Scientists sequence asexual tiny worm whose lineage stretches back 18 million years. Retrieved 7 November 2017.
  18. Khandelwal S (March 1990). "Chromosome evolution in the genus Ophioglossum L.". Botanical Journal of the Linnean Society. 102 (3): 205–217. doi:10.1111/j.1095-8339.1990.tb01876.x.
  19. 19.0 19.1 19.2 19.3 Lewin, Benjamin (2004). Genes VIII (8th ed.). Upper Saddle River, NJ: Pearson/Prentice Hall. ISBN 978-0-13-143981-8.
  20. Naclerio G, Cangiano G, Coulson A, Levitt A, Ruvolo V, La Volpe A (July 1992). "Molecular and genomic organization of clusters of repetitive DNA sequences in Caenorhabditis elegans". Journal of Molecular Biology. 226 (1): 159–68. doi:10.1016/0022-2836(92)90131-3. PMID 1619649.
  21. Witzany G (2017). "Two genetic codes: Repetitive syntax for active non-coding RNAs; non-repetitive syntax for the DNA archives". Communicative & Integrative Biology. 10 (2): e1297352. doi:10.1080/19420889.2017.1297352. PMC 5398208. PMID 29149223.
  22. Nikola Stojanovic, ed. (2007). Computational genomics : current methods. Wymondham: Horizon Bioscience. ISBN 978-1-904933-30-4.
  23. 23.0 23.1 23.2 Padeken J, Zeller P, Gasser SM (April 2015). "Repeat DNA in genome organization and stability". Current Opinion in Genetics & Development. 31: 12–9. doi:10.1016/j.gde.2015.03.009. PMID 25917896.
  24. 24.0 24.1 Usdin K (July 2008). "The biological effects of simple tandem repeats: lessons from the repeat expansion diseases". Genome Research. 18 (7): 1011–9. doi:10.1101/gr.070409.107. PMC 3960014. PMID 18593815.
  25. Li YC, Korol AB, Fahima T, Beiles A, Nevo E (December 2002). "Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review". Molecular Ecology. 11 (12): 2453–65. doi:10.1046/j.1365-294X.2002.01643.x. PMID 12453231.
  26. 26.0 26.1 Wessler SR (November 2006). "Transposable elements and the evolution of eukaryotic genomes". Proceedings of the National Academy of Sciences of the United States of America. 103 (47): 17600–1. Bibcode:2006PNAS..10317600W. doi:10.1073/pnas.0607612103. PMC 1693792. PMID 17101965.
  27. 27.0 27.1 27.2 27.3 27.4 Kazazian HH (March 2004). "Mobile elements: drivers of genome evolution". Science. 303 (5664): 1626–32. Bibcode:2004Sci...303.1626K. doi:10.1126/science.1089670. PMID 15016989.
  28. Deininger PL, Moran JV, Batzer MA, Kazazian HH (December 2003). "Mobile elements and mammalian genome evolution". Current Opinion in Genetics & Development. 13 (6): 651–8. doi:10.1016/j.gde.2003.10.013. PMID 14638329.
  29. Kidwell MG, Lisch DR (March 2000). "Transposable elements and host genome evolution". Trends in Ecology & Evolution. 15 (3): 95–99. doi:10.1016/S0169-5347(99)01817-0. PMID 10675923.
  30. Richard GF, Kerrest A, Dujon B (December 2008). "Comparative genomics and molecular dynamics of DNA repeats in eukaryotes". Microbiology and Molecular Biology Reviews. 72 (4): 686–727. doi:10.1128/MMBR.00011-08. PMC 2593564. PMID 19052325.
  31. Cordaux R, Batzer MA (October 2009). "The impact of retrotransposons on human genome evolution". Nature Reviews. Genetics. 10 (10): 691–703. doi:10.1038/nrg2640. PMC 2884099. PMID 19763152.
  32. Han JS, Boeke JD (August 2005). "LINE-1 retrotransposons: modulators of quantity and quality of mammalian gene expression?". BioEssays. 27 (8): 775–84. doi:10.1002/bies.20257. PMID 16015595.

External links