DNA sequencing: Difference between revisions

Jump to navigation Jump to search
m (Bot: Automated text replacement (-{{SIB}} + & -{{EH}} + & -{{EJ}} + & -{{Editor Help}} + & -{{Editor Join}} +))
 
mNo edit summary
 
Line 1: Line 1:
__NOTOC__
{{SI}}
{{CMG}}
{{CMG}}


==Overview==


{| style="float: right;"
| [[File:Radioactive Fluorescent Seq.jpg|thumbnail|An example of the results of automated chain-termination DNA sequencing.]]
|}


The term '''DNA sequencing''' encompasses [[biochemistry|biochemical]] methods for determining the order of the [[nucleotide]] bases, [[adenine]], [[guanine]], [[cytosine]], and [[thymine]], in a [[DNA]] [[oligonucleotide]]. The sequence of DNA constitutes the heritable genetic information in [[Cell nucleus|nuclei]], [[plasmids]], [[mitochondria]], and [[chloroplasts]] that forms the basis for the developmental programs of all living organisms. Determining the DNA sequence is therefore useful in basic research studying fundamental biological processes, as well as in applied fields such as diagnostic or [[forensic]] research. The advent of DNA sequencing has significantly accelerated biological research and discovery. The rapid speed of sequencing attainable with modern DNA sequencing technology has been instrumental in the large-scale sequencing of the [[human genome]], in the [[Human Genome Project]]. Related projects, often by scientific collaboration across continents, have generated the complete DNA sequences of many animal, plant, and microbial genomes.
'''DNA sequencing''' is the process of determining the precise order of [[nucleotides]] within a [[DNA]] molecule. It includes any method or technology that is used to determine the order of the four bases—[[adenine]], [[guanine]], [[cytosine]], and [[thymine]]—in a strand of DNA. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.
[[Image:Mutation Surveyor Trace.jpg|thumb|500px|DNA Sequence Trace]]


==Early methods ==
Knowledge of DNA sequences has become indispensable for basic biological research, and in numerous applied fields such as diagnostic, [[biotechnology]], [[forensic biology]], and biological [[systematics]]. The rapid speed of sequencing attained with modern DNA sequencing technology has been instrumental in the sequencing of complete DNA sequences, or [[genomes]] of numerous types and species of life, including the [[human genome]] and other complete DNA sequences of many animal, plant, and [[microbe|microbial]] species.
For thirty years, a large proportion of DNA sequencing has been carried out with the chain-termination method developed by [[Frederick Sanger]] and coworkers in 1975<ref>Sanger F, Coulson AR. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J Mol Biol. 1975 May 25;94(3):441–448</ref><ref>F. Sanger, S. Nicklen, and A. R. Coulson, DNA sequencing with chain-terminating inhibitors, Proc Natl Acad Sci U S A. 1977 December; 74(12): 5463–5467</ref>. Prior to the development of rapid DNA sequencing methods in the early 1970s by Sanger in England and [[Walter Gilbert]] and [[Allan Maxam]] at [[Harvard University|Harvard]],<ref>Maxam AM, Gilbert W., A new method for sequencing DNA, Proc Natl Acad Sci U S A. 1977 Feb;74(2):560-4</ref><ref>http://nobelprize.org/nobel_prizes/chemistry/laureates/1980/gilbert-lecture.pdf</ref> a number of laborious methods were used. For instance, in 1973<ref> Proc Natl Acad Sci U S A. 1973 December; 70(12 Pt 1-2): 3581–3584. The Nucleotide Sequence of the lac Operator, Walter Gilbert and Allan Maxam</ref> Gilbert and Maxam reported the sequence of 24 basepairs using a method known as wandering-spot analysis.


RNA sequencing, which for technical reasons is easier to perform than DNA sequencing, was one of the earliest forms of nucleotide sequencing. The major landmark of RNA sequencing, dating from the pre-recombinant DNA era, is the sequence of the first complete gene and then the complete genome of [[Bacteriophage MS2]], identified and published by [[Walter Fiers]] and his coworkers at the [[University of Ghent]] ([[Ghent]], [[Belgium]]).<ref>Min Jou W, Haegeman G, Ysebaert M, Fiers W., Nucleotide sequence of the gene coding for the bacteriophage MS2 coat protein, Nature. 1972 May 12;237(5350):82-8</ref><ref>Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene. ''Nature''. 1976 Apr 8;260(5551):500-7.</ref>
The first DNA sequences were obtained in the early 1970s by academic researchers using laborious methods based on [[two-dimensional chromatography]]. Following the development of [[fluorescence]]-based sequencing methods with [[DNA sequencer|automated analysis]],<ref name=olsvik1993>{{cite journal
|last1=Olsvik |first1=Ørjan |last2=Wahlberg |first2=Johan |title=Use of automated sequencing of polymerase chain reaction-generated amplicons to identify three types of cholera toxin subunit B in Vibrio cholerae O1 strains |journal=[[J. Clin. Microbiol.]] |volume=31 |issue=1 |pages=22–25 |date=January 1993 |pmid=7678018 |pmc=262614 |url=http://jcm.asm.org/cgi/pmidlookup?view=long&pmid=7678018 |author-separator=, |author3=Petterson B |display-authors=3 |last4=Uhlén |first4=M |last5=Popovic |first5=T |last6=Wachsmuth |first6=IK |last7=Fields |first7=PI }}{{open access}}</ref> DNA sequencing has become easier and orders of magnitude faster.<ref name="pmid18992322">{{cite journal |author=Pettersson E, Lundeberg J, Ahmadian A |title=Generations of sequencing technologies |journal=Genomics |volume=93 |issue=2 |pages=105–11 |date=February 2009 |pmid=18992322 |doi=10.1016/j.ygeno.2008.10.003 |url=}}</ref>


== Maxam-Gilbert sequencing ==
== Use of Sequencing ==


In 1976-1977, [[Allan Maxam]] and [[Walter Gilbert]] developed a DNA sequencing method based on chemical modification of DNA and subsequent cleavage at specific bases [http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=265521].
DNA sequencing may be used to determine the sequence of individual [[gene]]s, larger genetic regions (i.e. clusters of genes or [[operons]]), full chromosomes or [[Whole genome sequencing|entire genomes]].  Sequencing provides the order of individual nucleotides in DNA or [[RNA]] (commonly represented as A, C, G, T, and U) isolated from cells of animals, plants, bacteria, [[archaea]], or virtually any other source of genetic information. This is useful for:
Although Maxam and Gilbert published their chemical sequencing method two years after the ground-breaking paper of Sanger and Coulson on plus-minus sequencing,<ref> Sanger, F. & Coulson, A. R. (1975) J. Mol. Biol. 94, 441-448</ref><ref>http://nobelprize.org/nobel_prizes/chemistry/laureates/1980/sanger-lecture.pdf</ref> Maxam-Gilbert sequencing rapidly became more popular, since purified DNA could be used directly, while the initial Sanger method required that each read start be cloned for production of single-stranded DNA. However, with the development and improvement of the chain-termination method (see below), Maxam-Gilbert sequencing has fallen out of favour due to its technical complexity, extensive use of hazardous chemicals, and difficulties with scale-up. In addition, unlike the chain-termination method, chemicals used in the Maxam-Gilbert method cannot easily be customized for use in a standard molecular biology kit.
* [[Molecular biology]] - studying the genome itself, how proteins are made, what proteins are made, identifying new genes and associations with diseases and phenotypes, and identifying potential drug targets
* [[Evolutionary biology]] - studying how different organisms are related and how they evolved
* [[Metagenomics]] - Identifying species present in a body of water, [[sewage]], dirt, debris filtred from the air, or swab samples of organisms. Helpful in [[ecology]], [[epidemiology]], [[microbiome]] research, and other fields.


In brief, the method requires radioactive labelling at one end and purification of the DNA fragment to be sequenced. Chemical treatment generates breaks at a small proportion of one or two of the four nucleotide bases in each of four reactions (G, A+G, C, C+T). Thus a series of labelled fragments is generated, from the radiolabelled end to the first 'cut' site in each molecule. The fragments are then size-separated by [[gel electrophoresis]], with the four reactions arranged side by side. To visualize the fragments generated in each reaction, the gel is exposed to [[radiography|X-ray film]] for [[autoradiography]], yielding an image of a series of dark 'bands' corresponding to the [[Radioisotopic labelling|radiolabelled]] DNA fragments, from which the sequence may be inferred.
Less-precise information is produced by non-sequencing techniques like [[DNA fingerprinting]]. This information may be easier to obtain and is useful for:
* Detect the presence of known genes for medical purposes (see [[genetic testing]])
* [[Forensic identification]]
* [[Parental testing]]


Also sometimes known as 'chemical sequencing', this method originated in the study of DNA-protein interactions (footprinting), nucleic acid structure and epigenetic modifications to DNA, and within these it still has important applications.
== History ==
Though the structure of DNA was established as a [[DNA double helix|double helix]] in 1953,<ref name="pmid13168976">{{cite journal |author=Watson JD, Crick FH |title=The structure of DNA |journal=Cold Spring Harb. Symp. Quant. Biol. |volume=18 |issue= |pages=123–31 |year=1953 |pmid=13168976 |doi= 10.1101/SQB.1953.018.01.020|url=}}</ref> several decades would pass before fragments of DNA could be reliably analyzed for their sequence in the laboratory.
RNA sequencing was one of the earliest forms of nucleotide sequencing. The major landmark of RNA sequencing is the sequence of the first complete gene and the complete genome of [[Bacteriophage MS2]], identified and published by [[Walter Fiers]] and his coworkers at the [[University of Ghent]] ([[Ghent]], [[Belgium]]), in 1972<ref>{{cite journal |author=Min Jou W, Haegeman G, Ysebaert M, Fiers W |title=Nucleotide sequence of the gene coding for the bacteriophage MS2 coat protein |journal=Nature |volume=237 |issue=5350 |pages=82–8 |date=May 1972 |pmid=4555447 |doi=10.1038/237082a0 |bibcode = 1972Natur.237...82J |last2=Haegeman |last3=Ysebaert |last4=Fiers }}</ref> and 1976.<ref>{{cite journal |author=Fiers W |title=Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene |journal=Nature |volume=260 |issue=5551 |pages=500–7 |date=April 1976 |pmid=1264203 |doi=10.1038/260500a0 |bibcode=1976Natur.260..500F |author-separator=, |author2=Contreras R |author3=Duerinck F |display-authors=3 |last4=Haegeman |first4=G. |last5=Iserentant |first5=D. |last6=Merregaert |first6=J. |last7=Min Jou |first7=W. |last8=Molemans |first8=F. |last9=Raeymaekers |first9=A.|last10=Van Den Berghe |first10=A. |last11=Volckaert |first11=G. |last12=Ysebaert |first12=M. }}</ref>


== Chain-termination methods ==
The first method for determining DNA sequences involved a location-specific primer extension strategy established by [[Ray Wu]] at [[Cornell University]] in 1970.<ref>{{cite web|url=http://mbg.cornell.edu/faculty-staff/faculty/wu.cfm|publisher=Cornell University}}</ref> DNA polymerase catalysis and specific nucleotide labeling, both of which figure prominently in current sequencing schemes, were used to sequence the cohesive ends of lambda phage DNA<ref>{{cite journal|last=PADMANABHAN|first=R|author2=Ray Wu |author3=Ernest Jay |title=Chemical Synthesis of a Primer and Its Use in the Sequence Analysis of the Lysozyme Gene of Bacteriophage T4|journal=Proceedings of the National Academy of Sciences|date=June 1974|volume=71|issue=6|pages=2510–2514|accessdate=May 7, 2014|doi=10.1073/pnas.71.6.2510}}</ref><ref>{{cite journal|last=Onaga|first=Lisa|title=Ray Wu as Fifth Business: Demonstrating Collective Memory in the History of DNA Sequencing|journal=Studies in the History and Philosophy of Science|date=June 2014|volume=46|series=Part C|pages=1–14|doi=10.1016/j.shpsc.2013.12.006|url=http://www.sciencedirect.com/science/article/pii/S136984861400003X|accessdate=May 7, 2014}}</ref><ref>{{cite journal|last=Wu|first=Ray|title=Nucleotide Sequence Analysis of DNA|journal=Nature|date=19 April 1972|pages=198–200|doi=10.1038/newbio236198a0|url=http://www.nature.com/nature-newbio/journal/v236/n68/abs/newbio236198a0.html|accessdate=May 7, 2014}}</ref>  Between 1970 and 1973, Wu, R Padmanabhan and colleagues demonstrated that this method can be employed to determine any DNA sequence using synthetic location-specific primers.<ref>{{cite journal|last=Padmanabhan|first=R|author2=Ray Wu|title=Use of oligonucleotides of defined sequences as primers in DNA sequence analysis|journal=Biochemical and Biophysical Research|date=1972|volume=48|series=1295-1302|accessdate=May 7, 2014}}</ref><ref>{{cite web|url=http://mbg.cornell.edu/faculty-staff/faculty/wu.cfm|publisher=Cornell|accessdate=May 7, 2014}}</ref><ref>{{cite journal|last=Wu|first=R|author2=Padmanabhan|title=R|journal=Biochemical and Biophysics Research|date=1973|volume=55|pages=1092–1098|accessdate=May 7, 2014}}</ref><ref>{{cite journal|last=Jay|first=Ernest|author2=Ray Wu |author3=R Padmanabhan |author4=Robert Bambara |title=DNA sequence analysis: a general, simple and rapid method for sequencing large oligodeoxyribonucleotide fragments by mapping|journal=Nucleic Acids Research|date=March 1974|volume=1|pages=331–353| pmc=344020 |accessdate=May 7, 2013 |doi=10.1093/nar/1.3.331}}</ref>  [[Frederick Sanger]] then adopted this primer-extension strategy to develop more rapid DNA sequencing methods at the [[Medical Research Council (United Kingdom)|MRC Centre]], [[Cambridge]], UK and published a method for "DNA sequencing with chain-terminating inhibitors" in 1977.<ref name="Sanger1977" /> [[Walter Gilbert]] and [[Allan Maxam]] at [[Harvard University|Harvard]] also developed sequencing methods, including one for "DNA sequencing by chemical degradation".<ref name=Maxam77/><ref>Gilbert, W. [http://nobelprize.org/nobel_prizes/chemistry/laureates/1980/gilbert-lecture.pdf DNA sequencing and gene structure]. Nobel lecture, 8 December 1980.</ref> In 1973, Gilbert and Maxam reported the sequence of 24 basepairs using a method known as wandering-spot analysis.<ref>{{cite journal |author=Gilbert W, Maxam A |title=The Nucleotide Sequence of the lac Operator |journal=Proc. Natl. Acad. Sci. U.S.A. |volume=70 |issue=12 |pages=3581–4 |date=December 1973 |pmid=4587255 |pmc=427284 |doi=10.1073/pnas.70.12.3581 |bibcode = 1973PNAS...70.3581G |last2=Maxam }}</ref> Advancements in sequencing were aided by the concurrent development of [[recombinant DNA]] technology, allowing DNA samples to be isolated from sources other than viruses.


[[Image:Sequencing.jpg|thumb|right|Part of a radioactively labelled sequencing gel]]While the chemical sequencing method of Maxam and Gilbert, and the plus-minus method of Sanger and Coulson were orders of magnitude faster than previous methods, the chain-terminator method developed by Sanger was even more efficient, and rapidly became the method of choice. The Maxam-Gilbert technique requires the use of highly toxic chemicals, and large amounts of [[Radioisotopic labelling|radiolabeled]] DNA, whereas the chain-terminator method uses fewer toxic chemicals and lower amounts of radioactivity. The key principle of the Sanger method was the use of [[dideoxynucleotides]] triphosphates (ddNTPs) as DNA chain terminators.
The first full DNA genome to be sequenced was that of [[bacteriophage φX174]] in 1977.<ref>{{cite journal |author=Sanger F |title=Nucleotide sequence of bacteriophage phi X174 DNA |journal=Nature |volume=265 |issue=5596 |pages=687–95 |date=February 1977 |pmid=870828 |doi=10.1038/265687a0|bibcode = 1977Natur.265..687S |author-separator=, |author2=Air GM |author3=Barrell BG |display-authors=3 |last4=Brown |first4=N. L. |last5=Coulson |first5=A. R. |last6=Fiddes|first6=J. C. |last7=Hutchison |first7=C. A. |last8=Slocombe |first8=P. M. |last9=Smith |first9=M. }}</ref> [[Medical Research Council (UK)|Medical Research Council]] scientists deciphered the complete DNA sequence of the [[Epstein-Barr virus]] in 1984, finding it to be 170 thousand base-pairs long.


The classical chain-termination or Sanger method requires a single-stranded DNA template, a DNA [[primer (molecular biology)|primer]], a [[DNA polymerase]], radioactively or [[fluorescence|fluorescent]]ly labeled nucleotides, and modified nucleotides that terminate DNA strand elongation. The DNA sample is divided into four separate sequencing reactions, containing the four standard [[deoxynucleotides]] (dATP, dGTP, dCTP and dTTP) and the [[DNA polymerase]]. To each reaction is added only one of the four [[dideoxynucleotides]] (ddATP, ddGTP, ddCTP, or ddTTP). These dideoxynucleotides are the chain-terminating nucleotides, lacking a 3'-[[hydroxyl|OH]] group required for the formation of a [[phosphodiester bond]] between two nucleotides during DNA strand elongation. Incorporation of a dideoxynucleotide into the nascent (elongating) DNA strand therefore terminates DNA strand extension, resulting in various DNA fragments of varying length. The dideoxynucleotides are added at lower concentration than the standard deoxynucleotides to allow strand elongation sufficient for sequence analysis.
A non-radioactive method for transferring the DNA molecules of sequencing reaction mixtures onto an immobilizing matrix during electrophoresis was developed by Pohl and co-workers in the early 80’s.<ref>Beck S and Pohl F M, 1984, DNA sequencing with direct blotting electrophoresis. EMBO J., 3(12): 2905 - 2909. [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC557787/ PMC557787]</ref><ref>United States Patent 4,631,122 (1986)</ref> Followed by the commercialization of the DNA sequencer “Direct-Blotting-Electrophoresis-System GATC 1500” by [[GATC Biotech]], which was intensively used in the framework of the EU genome-sequencing programme, the complete DNA sequence of the yeast ''[[Saccharomyces cerevisiae]]'' chromosome II.<ref>Feldmann, H et al., 1994, Complete DNA sequence of yeast chromosome II. EMBO J., 1994; 13(24): 5795–5809. [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC395553/ PMC395553]</ref> [[Leroy E. Hood]]'s laboratory at the [[California Institute of Technology]] announced the first semi-automated DNA sequencing machine in 1986.<ref>{{cite journal|last=Smith|first=LM|coauthors=Sanders, JZ; Kaiser, RJ; Hughes, P; Dodd, C; Connell, CR; Heiner, C; Kent, SBH; Hood, LE|title=Fluorescence Detection in Automated DNA Sequence Analysis.|journal=Nature|date=June 12, 1986|volume=321|issue=6071|pages=674–79|pmid=3713851|doi=10.1038/321674a0}}</ref> This was followed by [[Applied Biosystems]]' marketing of the first fully automated sequencing machine, the ABI 370, in 1987 and by Dupont's Genesis 2000<ref>{{cite journal|last=Prober|first=JM|coauthors=Trainor, GL; Dam, RJ; Hobbs, FW; Robertson, CW; Zagursky, RJ; Cocuzza, AJ; Jensen, MA; Baumeister, K|title=A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides.|journal=Science (New York, N.Y.)|date=Oct 16, 1987|volume=238|issue=4825|pages=336–41|pmid=2443975|doi=10.1126/science.2443975}}</ref>  which used a novel fluorescent labeling technique enabling all four dideoxynucleotides to be identified in a single lane. By 1990, the U.S. [[National Institutes of Health]] (NIH) had begun large-scale sequencing trials on ''[[Mycoplasma capricolum]]'', ''[[Escherichia coli]]'', ''[[Caenorhabditis elegans]]'', and ''[[Saccharomyces cerevisiae]]'' at a cost of US$0.75 per base. Meanwhile, sequencing of human [[cDNA]] sequences called [[expressed sequence tag]]s began in [[Craig Venter]]'s lab, an attempt to capture the coding fraction of the [[human genome]].<ref name="pmid2047873">{{cite journal |author=Adams MD |title=Complementary DNA sequencing: expressed sequence tags and human genome project|journal=Science|volume=252 |issue=5013 |pages=1651–6 |date=June 1991 |pmid=2047873 |doi= 10.1126/science.2047873|url=|bibcode = 1991Sci...252.1651A|author-separator=,|author2=Kelley JM |author3=Gocayne JD |display-authors=3 |last4=Dubnick |first4=M |last5=Polymeropoulos |first5=M. |last6=Xiao |first6=H|last7=Merril |first7=C.|last8=Wu |first8=A |last9=Olde |first9=B |last10=Moreno|first10=Ruben F.|last11=Kerlavage|first11=Anthony R.|last12=McCombie|first12=W. Richard|last13=Venter|first13=J. Craig}}</ref> In 1995, Venter, [[Hamilton O. Smith|Hamilton Smith]], and colleagues at [[The Institute for Genomic Research]] (TIGR) published the first complete genome of a free-living organism, the bacterium ''[[Haemophilus influenzae]]''. The circular chromosome contains 1,830,137 bases and its publication in the journal Science<ref>{{cite journal |author=Fleischmann RD |title=Whole-genome random sequencing and assembly of ''Haemophilus influenzae Rd''|journal=Science|volume=269 |issue=5223 |pages=496–512 |date=July 1995 |pmid=7542800 |url=http://www.sciencemag.org/cgi/pmidlookup?view=long&pmid=7542800|doi=10.1126/science.7542800|bibcode = 1995Sci...269..496F |author-separator=, |author2=Adams MD |author3=White O |display-authors=3|last4=Clayton |first4=R.|last5=Kirkness |first5=E. |last6=Kerlavage |first6=A. |last7=Bult |first7=C. |last8=Tomb |first8=J. |last9=Dougherty |first9=B. |last10=Merrick|first10=Joseph M.|last11=McKenney|first11=Keith|last12=Sutton|first12=Granger|last13=Fitzhugh|first13=Will|last14=Fields|first14=Chris|last15=Gocyne|first15=Jeannie D.|last16=Scott|first16=John|last17=Shirley|first17=Robert|last18=Liu|first18=Li-Ing|last19=Glodek|first19=Anna|last20=Kelley|first20=Jenny M.|last21=Weidman|first21=Janice F.|last22=Phillips|first22=Cheryl A.|last23=Spriggs|first23=Tracy|last24=Hedblom|first24=Eva|last25=Cotton|first25=Matthew D.|last26=Utterback|first26=Teresa R.|last27=Hanna|first27=Michael C.|last28=Nguyen|first28=David T.|last29=Saudek|first29=Deborah M.|last30=Brandon|first30=Rhonda C.}}</ref> marked the first published use of whole-genome shotgun sequencing, eliminating the need for initial mapping efforts. By 2001, shotgun sequencing methods had been used to produce a draft sequence of the human genome.<ref name="pmid11237011">{{cite journal |author=Lander ES |title=Initial sequencing and analysis of the human genome |journal=Nature |volume=409 |issue=6822 |pages=860–921 |date=February 2001 |pmid=11237011 |doi=10.1038/35057062|url=|author-separator=, |author2=Linton LM |author3=Birren B |display-authors=3 |last4=Nusbaum |first4=Chad |last5=Zody |first5=Michael C. |last6=Baldwin|first6=Jennifer|last7=Devon |first7=Keri |last8=Dewar |first8=Ken |last9=Doyle |first9=Michael}}</ref><ref name="pmid11181995">{{cite journal |author=Venter JC|title=The sequence of the human genome |journal=Science |volume=291 |issue=5507 |pages=1304–51 |date=February 2001 |pmid=11181995 |doi=10.1126/science.1058040|url=|bibcode = 2001Sci...291.1304V |author-separator=, |author2=Adams MD |author3=Myers EW |display-authors=3 |last4=Li |first4=PW |last5=Mural |first5=RJ|last6=Sutton |first6=GG|last7=Smith |first7=HO |last8=Yandell |first8=M |last9=Evans |first9=CA |last10=Holt |first10=Robert A. |last11=Gocayne |first11=Jeannine D. |last12=Amanatides |first12=Peter |last13=Ballew |first13=Richard M. |last14=Huson |first14=Daniel H. |last15=Wortman |first15=Jennifer Russo |last16=Zhang |first16=Qing |last17=Kodira |first17=Chinnappa D. |last18=Zheng |first18=Xiangqun H. |last19=Chen |first19=Lin |last20=Skupski |first20=Marian |last21=Subramanian |first21=Gangadharan |last22=Thomas |first22=Paul D. |last23=Zhang |first23=Jinghui |last24=Gabor Miklos |first24=George L. |last25=Nelson |first25=Catherine |last26=Broder |first26=Samuel |last27=Clark |first27=Andrew G. |last28=Nadeau |first28=Joe |last29=McKusick |first29=Victor A. |last30=Zinder |first30=Norton }}</ref>


The newly synthesized and labeled DNA fragments are heat [[melting temperature|denatured]], and separated by size (with a resolution of just one nucleotide) by [[gel electrophoresis]] on a denaturing [[Polyacrylamide gel|polyacrylamide]]-urea gel. Each of the four DNA synthesis reactions is run in one of four individual lanes (lanes A, T, G, C); the DNA bands are then visualized by [[autoradiography]] or UV light, and the DNA sequence can be directly read off the [[radiography|X-ray film]] or gel image. In the image on the right, X-ray film was exposed to the gel, and the dark bands correspond to DNA fragments of different lengths. A dark band in a lane indicates a DNA fragment that is the result of chain termination after incorporation of a dideoxynucleotide (ddATP, ddGTP, ddCTP, or ddTTP). The terminal nucleotide base can be identified according to which dideoxynucleotide was added in the reaction giving that band. The relative positions of the different bands among the four lanes are then used to read (from bottom to top) the DNA sequence as indicated.
Several new methods for DNA sequencing were developed in the mid to late 1990s. These techniques comprise the first of the "next-generation" sequencing methods. In 1996, [[Pål Nyrén]] and his student [[Mostafa Ronaghi]] at the Royal Institute of Technology in [[Stockholm]] published their method of [[pyrosequencing]].<ref name=Ronaghi>{{cite journal| title=Real-time DNA sequencing using detection of pyrophosphate release| author=M. Ronaghi, S. Karamohamed, B. Pettersson, M. Uhlen, and P. Nyren| journal=Analytical Biochemistry| volume=242| pages=84–9| year=1996| doi=10.1006/abio.1996.0432| pmid=8923969| issue=1}}</ref> A year later, Pascal Mayer and Laurent Farinelli submitted patents to the World Intellectual Property Organization describing DNA colony sequencing.<ref name=DNA_colony_patents>{{Cite web| last = Kawashima
| first = Eric H.
|author2=Laurent Farinelli|author3=Pascal Mayer
| title = Patent: Method of nucleic acid amplification
| accessdate = 2012-12-22
| date = 2005-05-12
| url = http://www.patentlens.net/patentlens/patent/WO_1998_044151_A1/en/
| postscript = <!-- Bot inserted parameter. Either remove it; or change its value to "." for the cite to end in a ".", as necessary. -->&#123;&#123;inconsistent citations&#125;&#125;
}}</ref> Lynx Therapeutics published and marketed "[[Massively parallel signature sequencing]]", or MPSS, in 2000. This method incorporated a parallelized, adapter/ligation-mediated, bead-based sequencing technology and served as the first commercially available "next-generation" sequencing method, though no [[DNA sequencers]] were sold to independent laboratories.<ref>{{cite journal | title=Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays| author=Brenner S | year=2000| publisher=[[Nature Biotechnology]]| volume=18| pages=630–634| doi=10.1038/76469| pmid=10835600|issue=6| journal=Nature Biotechnology | author-separator=, | display-authors=1 | last2=Johnson | first2=Maria | last3=Bridgham | first3=John | last4=Golda|first4=George | last5=Lloyd | first5=David H. | last6=Johnson | first6=Davida | last7=Luo | first7=Shujun | last8=McCurdy | first8=Sarah | last9=Foy|first9=Michael}}</ref> In 2004, [[454 Life Sciences]] marketed a parallelized version of pyrosequencing.<ref name=Stein2008>{{cite journal|url=http://www.genengnews.com/gen-articles/next-generation-sequencing-update/2584/| title=Next-Generation Sequencing Update| author=Stein RA| journal=Genetic Engineering & Biotechnology News | date=1 September 2008  |volume=28 |issue=15}}</ref><ref name="pmid16056220">{{cite journal |author=Margulies M |title=Genome Sequencing in Open Microfabricated High Density Picoliter Reactors |journal=Nature |volume=437 |issue=7057 |pages=376–80 |date=September 2005|pmid=16056220|pmc=1464427 |doi=10.1038/nature03959 |url=|bibcode = 2005Natur.437..376M |author-separator=, |author2=Egholm M |author3=Altman WE |display-authors=3|last4=Attiya|first4=Said |last5=Bader |first5=Joel S. |last6=Bemben |first6=Lisa A. |last7=Berka |first7=Jan |last8=Braverman |first8=Michael S. |last9=Chen|first9=Yi-Ju |last10=Chen |first10=Zhoutao |last11=Dewell |first11=Scott B. |last12=Du |first12=Lei |last13=Fierro |first13=Joseph M. |last14=Gomes |first14=Xavier V. |last15=Godwin |first15=Brian C. |last16=He |first16=Wen |last17=Helgesen |first17=Scott |last18=Ho |first18=Chun He |last19=Irzyk |first19=Gerard P. |last20=Jando |first20=Szilveszter C. |last21=Alenquer |first21=Maria L. I. |last22=Jarvie |first22=Thomas P. |last23=Jirage |first23=Kshama B. |last24=Kim |first24=Jong-Bum |last25=Knight |first25=James R. |last26=Lanza |first26=Janna R. |last27=Leamon |first27=John H. |last28=Lefkowitz |first28=Steven M. |last29=Lei |first29=Ming |last30=Li |first30=Jing }}</ref> The first version of their machine reduced sequencing costs 6-fold compared to automated Sanger sequencing, and was the second of the new generation of sequencing technologies, after MPSS.<ref name="pmid18165802">{{cite journal |author=Schuster Stephan C. |title=Next-generation sequencing transforms today's biology |journal=Nat. Methods |volume=5 |issue=1 |pages=16–8 |date=January 2008 |pmid=18165802 |doi=10.1038/nmeth1156 |url=}}</ref>


[[Image:DNA Sequencin 3 labeling methods.jpg|thumb|left|DNA fragments can be labeled by using a radioactive or fluorescent tag on the primer (1), in the new DNA strand with a labeled dNTP, or with a labeled ddNTP. (click to expand)]]There are some technical variations of chain-termination sequencing. In one method, the DNA fragments are tagged with nucleotides containing radioactive phosphorus for [[Radioisotopic labelling|radiolabel]]ling. Alternatively, a primer labeled at the 5’ end with a [[fluorescence|fluorescent]] dye is used for the tagging. Four separate reactions are still required, but DNA fragments with dye labels can be read using an optical system, facilitating faster and more economical analysis and automation.  This approach is known as 'dye-primer sequencing'. The later development by L Hood and coworkers<ref> Nature. 1986 Jun 12-18;321(6071):674-9. Fluorescence detection in automated DNA sequence analysis. We have developed a method for the partial automation of DNA sequence analysis. Fluorescence detection of the DNA fragments is accomplished by means of a fluorophore covalently attached to the oligonucleotide primer used in enzymatic DNA sequence analysis. A different coloured fluorophore is used for each of the reactions specific for the bases A, C, G and T. The reaction mixtures are combined and co-electrophoresed down a single polyacrylamide gel tube, the separated fluorescent bands of DNA are detected near the bottom of the tube, and the sequence information is acquired directly by computer.</ref><ref> Nucleic Acids Res. 1985 Apr 11;13(7):2399-412.   The synthesis of oligonucleotides containing an aliphatic amino group at the 5' terminus: synthesis of fluorescent DNA primers for use in DNA sequence analysis. Note that Oxford University Press, the publishers of the journal Nucleic Acids Research, make the full contents of this journal available online for free - you can download a copy of this paper for yourself !!</ref> of fluorescently labeled ddNTPs and primers set the stage for automated, high-throughput DNA sequencing.[[Image:Radioactive Fluorescent Seq.jpg|thumb|right|Sequence ladder by radioactive sequencing compared to fluorescent peaks (click to expand)]]
The large quantities of data produced by DNA sequencing have also required development of new methods and programs for sequence analysis. Phil Green and Brent Ewing of the University of Washington described their [[phred quality score]] for sequencer data analysis in 1998.<ref>{{cite journal |author=Ewing B, Green P |title=Base-calling of automated sequencer traces using phred. II. Error probabilities |journal=Genome Res.|volume=8 |issue=3 |pages=186–94 |date=March 1998 |pmid=9521922 |url=http://www.genome.org/cgi/pmidlookup?view=long&pmid=9521922|doi=10.1101/gr.8.3.186|doi_brokendate=2014-03-23}}</ref>


The different chain-termination methods have greatly simplified the amount of work and planning needed for DNA sequencing. For example, the chain-termination-based "Sequenase" kit from USB Biochemicals contains most of the reagents needed for sequencing, prealiquoted and ready to use. Some sequencing problems can occur with the Sanger Method, such as non-specific binding of the primer to the DNA, affecting accurate read out of the DNA sequence. In addition, secondary structures within the DNA template, or contaminating RNA randomly priming at the DNA template can also affect the fidelity of the obtained sequence. Other contaminants affecting the reaction may consist of extraneous DNA or inhibitors of the DNA polymerase.
==Basic Methods==


=== Dye-terminator sequencing ===
===Maxam-Gilbert sequencing===


[[Image:CE Basic.jpg|thumb|left|Capillary electrophoresis (click to expand)]]
[[Allan Maxam]] and [[Walter Gilbert]] published a DNA sequencing method in 1977 based on chemical modification of DNA and subsequent cleavage at specific bases.<ref name=Maxam77>{{cite journal |author=Maxam AM, Gilbert W |title=A new method for sequencing DNA |journal=Proc. Natl. Acad. Sci. U.S.A. |volume=74 |issue=2 |pages=560–4 |date=February 1977 |pmid=265521 |pmc=392330 |doi=10.1073/pnas.74.2.560 |bibcode = 1977PNAS...74..560M |last2=Gilbert }}</ref>  Also known as chemical sequencing, this method allowed purified samples of double-stranded DNA to be used without further cloning.  This method's use of radioactive labeling and its technical complexity discouraged extensive use after refinements in the Sanger methods had been made.


An alternative to primer labelling is labelling of the chain terminators, a method commonly called 'dye-terminator sequencing'. The major advantage of this method is that the sequencing can be performed in a single reaction, rather than four reactions as in the labelled-primer method. In dye-terminator sequencing, each of the four dideoxynucleotide chain terminators is labelled with a different fluorescent dye, each fluorescing at a different [[wavelength]]. This method is attractive because of its greater expediency and speed and is now the mainstay in automated sequencing with computer-controlled sequence analyzers (see below). Its potential limitations include dye effects due to differences in the incorporation of the dye-labelled chain terminators into the DNA fragment, resulting in unequal peak heights and shapes in the electronic DNA sequence trace [[chromatogram]] after [[capillary electrophoresis]] (see figure to the right). This problem has largely been overcome with the introduction of new DNA polymerase enzyme systems and dyes that minimize incorporation variability, as well as methods for eliminating "dye blobs", caused by certain chemical characteristics of the dyes that can result in artifacts in DNA sequence traces. The dye-terminator sequencing method, along with automated high-throughput DNA sequence analyzers, is now being used for the vast majority of sequencing projects, as it is both easier to perform and lower in cost than most previous sequencing methods.
Maxam-Gilbert sequencing requires radioactive labeling at one 5' end of the DNA and purification of the DNA fragment to be sequenced. Chemical treatment then generates breaks at a small proportion of one or two of the four nucleotide bases in each of four reactions (G, A+G, C, C+T). The concentration of the modifying chemicals is controlled to introduce on average one modification per DNA molecule. Thus a series of labeled fragments is generated, from the radiolabeled end to the first "cut" site in each molecule. The fragments in the four reactions are electrophoresed side by side in denaturing acrylamide gels for size separation. To visualize the fragments, the gel is exposed to X-ray film for autoradiography, yielding a series of dark bands each corresponding to a radiolabeled DNA fragment, from which the sequence may be inferred.<ref name=Maxam77 />


=== Automation and sample preparation ===
===Chain-termination Methods===


[[Image:Sanger sequencing read display.gif|thumb|right|View of the start of an example dye-terminator read (click to expand)]]Modern automated DNA sequencing instruments ([[DNA sequencers]]) can sequence up to 384 fluorescently labelled samples in a single batch (run) and perform as many as 24 runs a day. However, automated DNA sequencers carry out only DNA size separation by [[capillary electrophoresis]], detection and recording of dye fluorescence, and data output as fluorescent peak trace [[chromatogram]]s. Sequencing reactions by [[thermocycler|thermocycling]], cleanup and re-suspension in a [[buffer solution]] before loading onto the sequencer are performed separately. In the past, an operator had to trim the low quality ends (see image in the right) of every sequence manually in order to remove the sequencing errors. However, today, software like [http://www.dnabaser.com/download/Chromatogram-viewer/index.html Fast Chromatogram Viewer] can automatically trim the ends at batch.
The [[Sanger sequencing|chain-termination method]] developed by [[Frederick Sanger]] and coworkers in 1977 soon became the method of choice, owing to its relative ease and reliability.<ref name="Sanger1977">{{cite journal |author=Sanger F, Nicklen S, Coulson AR |title=DNA sequencing with chain-terminating inhibitors |journal=Proc. Natl. Acad. Sci. U.S.A.|volume=74 |issue=12 |pages=5463–7 |date=December 1977 |pmid=271968 |pmc=431765 |doi=10.1073/pnas.74.12.5463 |bibcode = 1977PNAS...74.5463S |last2=Nicklen |last3=Coulson }}</ref><ref name=Sanger75>{{cite journal |author=Sanger F, Coulson AR |title=A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase |journal=J. Mol. Biol. |volume=94 |issue=3 |pages=441–8 |date=May 1975 |pmid=1100841 |doi=10.1016/0022-2836(75)90213-2 }}</ref> When invented, the chain-terminator method used fewer toxic chemicals and lower amounts of radioactivity than the Maxam and Gilbert method. Because of its comparative ease, the Sanger method was soon automated and was the method used in the first generation of [[DNA sequencer]]s.


== Large-scale sequencing strategies ==
Sanger sequencing is the method which prevailed from the 80's until the mid-2000s. Over that period, great advances were made in the technique, such as fluorescent labelling, capillary electrophoresis, and general automation. These developments allowed much more efficient sequencing, leading to lower costs. The Sanger method, in mass production form, is the technology which produced the [[Human Genome Project|first human genome]] in 2001, ushering in the age of [[genomics]]. However, later in the decade, radically different approaches reached the market, bringing the cost per genome down from $100 million in 2001 to $10,000 in 2011.<ref>{{cite web |last=Wetterstrand |first=Kris |title=DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP) |publisher=[[National Human Genome Research Institute]] |accessdate=30 May 2013 |url=https://www.genome.gov/sequencingcosts }}</ref>


Current methods can directly sequence only relatively short (300-1000 [[nucleotides]] long) DNA fragments in a single reaction.  [http://www.appliedbiosystems.com/catalog/myab/StoreCatalog/products/CategoryDetails.jsp?hierarchyID=102&category1st=a50&category2nd=a51&category3rd=111907]. The main obstacle to sequencing DNA fragments above this size limit is insufficient power of separation for resolving large DNA fragments that differ in length by only one nucleotide. Limitations on ddNTP incorporation were largely solved by Tabor at Harvard Medical, Carl Fuller at USB biochemicals, and their coworkers<ref name='Reeve,Fuller'/>.
==Advanced Methods and ''de novo'' Sequencing==


[[Image:DNA Sequencing gDNA libraries.jpg|thumb|left|Genomic DNA is fragmented into random pieces and cloned as a bacterial library. DNA from individual bacterial clones is sequenced and the sequence is assembled by using overlapping regions.(click to expand)]]Large-scale sequencing aims at sequencing very long DNA fragments. Even relatively small bacterial [[genome]]s contain millions of nucleotides, and the [[chromosome 1 (human)|human chromosome 1]] alone contains about 246 million [[nucleotide|bases]]. Therefore, some approaches consist of cutting (with [[restriction enzyme]]s) or shearing (with mechanical forces) large DNA fragments into shorter DNA fragments. The fragmented DNA is [[clone (genetics)|cloned]] into a [[Vector DNA|DNA vector]], usually a bacterial [[plasmid]], and amplified in ''[[Escherichia coli]]''. The amplified DNA can then be purified from the bacterial cells (a disadvantage of bacterial clones for sequencing is that some DNA sequences may be inherently ''un-clonable'' in some or all available bacterial strains, due to deleterious effect of the cloned sequence on the host bacterium or other effects). These short DNA fragments purified from individual bacterial colonies are then individually and completely sequenced and [[sequence assembly|assembled electronically]] into one long, contiguous sequence by identifying 100%-identical overlapping sequences between them ([[shotgun sequencing]]). This method does not require any pre-existing information about the sequence of the DNA and is often referred to as ''de novo'' sequencing. Gaps in the assembled sequence may be filled by [[Primer walking]], often with sub-cloning steps (or [[transposon]]-based sequencing depending on the size of the remaining region to be sequenced). These strategies all involve taking many small ''reads'' of the DNA by one of the above methods and subsequently assembling them into a contiguous sequence. The different strategies have different tradeoffs in speed and accuracy; the shotgun method is the most practical for sequencing large genomes, but its assembly process is complex and potentially error-prone - particularly in the presence of [[microsatellites|sequence repeat]]s. Because of this, the assembly of the human genome is not literally complete &mdash; the repetitive sequences of the centromeres, telomeres, and some other parts of chromosomes result in gaps in the genome assembly. Despite having only 93% of the full genome assembled, the [[Human Genome Project]] was declared complete because their definition of human genome sequencing was limited to euchromatic sequence (99% complete at the time), excluding these intractable repetitive regions.<ref>{{cite journal | author = International Human Genome Sequencing Consortium| title=Finishing the euchromatic sequence of the human genome.| journal=Nature|volume=431 |issue=7011| pages=931-45 |year = 2004 |id=PMID 15496913}} [http://www.nature.com/nature/journal/v431/n7011/full/nature03001.html paper available online]</ref>
Large-scale sequencing often aims at sequencing very long DNA pieces, such as whole [[chromosome]]s, although large-scale sequencing can also be used to generate very large numbers of short sequences, such as found in [[phage display]]. For longer targets such as chromosomes, common approaches consist of cutting (with [[restriction enzyme]]s) or shearing (with mechanical forces) large DNA fragments into shorter DNA fragments. The fragmented DNA may then be [[clone (genetics)|cloned]] into a [[Vector DNA|DNA vector]] and amplified in a bacterial host such as ''[[Escherichia coli]]''. Short DNA fragments purified from individual bacterial colonies are individually sequenced and [[sequence assembly|assembled electronically]] into one long, contiguous sequence. Studies have shown that adding a size selection step to collect DNA fragments of uniform size can improve sequencing efficiency and accuracy of the genome assembly. In these studies, automated sizing has proven to be more reproducible and precise than manual gel sizing.<ref>http://onlinelibrary.wiley.com/doi/10.1002/elps.201200128/abstract</ref><ref>http://onlinelibrary.wiley.com/doi/10.1111/j.1462-2920.2012.02791.x/abstract;jsessionid=C705EAD430A7C16FE74774C8B4B6F814.f02t01</ref><ref>http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0037135</ref>


[[Image:Sequencing workflow.jpg|thumb|right|Resequencing steps. Sample prep: Extraction of nucleic acid. Template prep: Amplification and preparation of a small region of the target region. Sequencing steps. (click to expand)]]
The term "''de novo'' sequencing" specifically refers to methods used to determine the sequence of DNA with no previously known sequence. ''De novo'' translates from Latin as "from the beginning". Gaps in the assembled sequence may be filled by [[primer walking]]. The different strategies have different tradeoffs in speed and accuracy; [[shotgun sequencing|shotgun methods]] are often used for sequencing large genomes, but its assembly is complex and difficult, particularly with [[Microsatellite (genetics)|sequence repeat]]s often causing gaps in genome assembly.


The human genome is about 3 billion  (3,000,000,000) bp long;<ref>[http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml Human Genome Project Information<!-- Bot generated title -->]</ref> if the average fragment length is 500 bases, it would take a minimum of six million (3 billion/500) to sequence the human genome (not allowing for overlap = 1-fold coverage). Keeping track of such a high number of sequences presents significant challenges, only held down by developing and coordinating several procedural and computational [[algorithm]]s, such as efficient database development and management.
Most sequencing approaches use an ''in vitro'' cloning step to amplify individual DNA molecules, because their molecular detection methods are not sensitive enough for single molecule sequencing. Emulsion PCR<ref name=Williams2006ePCR>{{cite journal |author=Richard Williams, Sergio G Peisajovich, Oliver J Miller, Shlomo Magdassi, Dan S Tawfik, Andrew D Griffiths |title=Amplification of complex gene libraries by emulsion PCR |journal=Nature methods |volume=3 |pages=545–550|year=2006|issue=7 |doi=10.1038/nmeth896 |pmid=16791213}}</ref> isolates individual DNA molecules along with primer-coated beads in aqueous droplets within an oil phase. A [[polymerase chain reaction]] (PCR) then coats each bead with clonal copies of the DNA molecule followed by immobilization for later sequencing. Emulsion PCR is used in the methods developed by Marguilis et al. (commercialized by [[454 Life Sciences]]), Shendure and Porreca et al. (also known as "[[Polony (biology)|Polony sequencing]]") and [[ABI Solid Sequencing|SOLiD sequencing]], (developed by [[Agencourt]], later [[Applied Biosystems]], now [[Life Technologies (Thermo Fisher Scientific)|Life Technologies]]).<ref name=Margulies>{{cite journal|author=Margulies M |title=Genome Sequencing in Open Microfabricated High Density Picoliter Reactors |journal=Nature |volume=437 |issue=7057 |pages=376–80 |date=September 2005 |pmid=16056220 |pmc=1464427 |doi=10.1038/nature03959 |bibcode = 2005Natur.437..376M |author-separator=, |author2=Egholm M |author3=Altman WE|display-authors=3 |last4=Attiya |first4=Said |last5=Bader |first5=Joel S. |last6=Bemben |first6=Lisa A. |last7=Berka |first7=Jan |last8=Braverman |first8=Michael S.|last9=Chen |first9=Yi-Ju |last10=Chen |first10=Zhoutao |last11=Dewell |first11=Scott B. |last12=Du |first12=Lei |last13=Fierro |first13=Joseph M. |last14=Gomes |first14=Xavier V. |last15=Godwin |first15=Brian C. |last16=He |first16=Wen |last17=Helgesen |first17=Scott |last18=Ho |first18=Chun He |last19=Irzyk |first19=Gerard P. |last20=Jando |first20=Szilveszter C. |last21=Alenquer |first21=Maria L. I. |last22=Jarvie |first22=Thomas P. |last23=Jirage |first23=Kshama B. |last24=Kim |first24=Jong-Bum |last25=Knight |first25=James R. |last26=Lanza |first26=Janna R. |last27=Leamon |first27=John H. |last28=Lefkowitz |first28=Steven M. |last29=Lei |first29=Ming |last30=Li |first30=Jing }}</ref><ref name=polony_sequencing>{{Cite journal
| doi = 10.1126/science.1117389 | title = Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome | year = 2005 | author = Shendure, J. | journal = Science
| volume = 309 | pmid = 16081699
| last2 = Porreca
| first2 = GJ
| last3 = Reppas
| first3 = NB
| last4 = Lin
| first4 = X
| last5 = McCutcheon
| first5 = JP
| last6 = Rosenbaum
| first6 = AM
| last7 = Wang
| first7 = MD
| last8 = Zhang
| first8 = K
| last9 = Mitra
| first9 = RD
| last10 = Church | first10 = G. M. | issue = 5741 | bibcode=2005Sci...309.1728S
| pages = 1728–32| display-authors = 8 }}</ref><ref name=solid_sequencing>[http://solid.appliedbiosystems.com/ Applied Biosystems' SOLiD technology]</ref>


''Resequencing'' or ''targeted sequencing'' is utilized for determining a change in DNA sequence from a "reference" sequence. It is often performed using PCR to amplify the region of interest (pre-existing DNA sequence is required to design the PCR primers). Resequencing uses three steps, extraction of DNA or RNA from biological tissue; amplification of the RNA or DNA (often by PCR); followed by sequencing. The resultant sequence is compared to a reference or a normal sample to detect mutations.
===Shotgun Sequencing===


==New sequencing methods==
Shotgun sequencing is a sequencing method designed for analysis of DNA sequences longer than 1000 base pairs, up to and including entire chromosomes.  This method requires the target DNA to be broken into random fragments.  After sequencing individual fragments, the sequences can be reassembled on the basis of their overlapping regions.<ref>{{cite journal|last=Staden|first=R|title=A strategy of DNA sequencing employing computer programs.|journal=Nucleic Acids Research|date=Jun 11, 1979|volume=6|issue=7|pages=2601–10|pmid=461197|doi=10.1093/nar/6.7.2601|pmc=327874}}</ref>
===High-throughput sequencing===
The high demand for low cost sequencing has given rise to a number of high-throughput sequencing technologies.<ref>{{cite journal| title=Advanced sequencing technologies and their wider impact in microbiology| author=Neil Hall| journal=The Journal of Experimental Biology| volume=209| pages=1518-1525| year=2007}}</ref><ref>{{cite journal| title=Genomes for ALL| author=G.M. Church| journal=Scientific American| year=2006| volume=294| issue=1| pages=47-54| pmid=16468433}}</ref> These efforts have been funded by public and private institutions as well as privately researched and commercialized by biotechnology companies. High-throughput sequencing technologies are intended to lower the cost of sequencing DNA libraries beyond what is possible with the current dye-terminator method based on DNA separation by capillary electrophoresis. Many of the new high-throughput methods use methods that parallelize the sequencing process, producing thousands or millions of sequences at once.


;In vitro clonal amplification
===Bridge PCR===
As molecular detection methods are often not sensitive enough for single molecule sequencing, most approaches use an ''in vitro'' cloning step to generate many copies of each individual molecule. [[Emulsion PCR]] is one method, isolating individual DNA molecules along with primer-coated beads in aqueous bubbles within an oil phase. A [[polymerase chain reaction]] (PCR) then coats each bead with clonal copies of the isolated library molecule and these beads are subsequently immobilized for later sequencing. Emulsion PCR is used in the methods published by Marguilis et al. (commercialized by [[454 Life Sciences]], recently acquired by Roche) and Shendure and Porreca et al. (also known as "polony sequencing", commercialized by Agencourt and recently acquired by [[Applied Biosystems]]).<ref name=Margulies>{{cite journal| title=Genome sequencing in microfabricated high-density picolitre reactors| author=M. Margulies, et al.| journal=Nature| volume=437| pages=376-380| year=2005}}</ref><ref name=polony_sequencing>{{cite journal| title=Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome| author=J. Shendure, G.J. Porreca, N.B. Reppas, X. Lin, J.Pe McCutcheon, A.M. Rosenbaum, M.D. Wang, K. Zhang, R.D. Mitra and G.M. Church| journal=Science| volume=309| issue=5741| pages=1728-1732}}</ref> Another method for ''in vitro'' clonal amplification is "bridge PCR", where fragments are amplified upon primers attached to a solid surface, developed and used by Solexa (now owned by [[Illumina (company)|Illumina]]). These methods both produce many physically isolated locations which each contain many copies of a single fragment. The single-molecule method developed by Stephen Quake's laboratory (later commercialized by Helicos) skips this amplification step, directly fixing DNA molecules to a surface.<ref>{{cite journal| author=Braslavsky, I., Hebert, H., Kartalov, E. and Quake, S.R.| year=2003| title=Sequence information can be obtained from single DNA molecules| journal=Proceedings of the National Academy of Sciences of the United States of America| volume=100| pages=3960–3964}}[http://www.pnas.org/cgi/content/abstract/100/7/3960 full text available online]</ref>


;Parallelized sequencing
Another method for ''[[in vitro]]'' clonal amplification is bridge PCR, in which fragments are amplified upon primers attached to a solid surface<ref name=DNA_colony_patents /><ref name=DNA_colony_presentation>P. Mayer,L. Farinelli, G. Matton, C. Adessi, G. Turcatti, J. J. Mermod, E. Kawashima.[http://www.slideshare.net/pascalmayer/dna-colony-massively-parrallel-sequencing-ams98-presentation DNA colony massively parallel sequencing ams98 presentation]</ref><ref name=Mosaic_patent>{{US patent|5641658}}</ref> and form "[[#subsection Illumina (Solexa) sequencing|DNA colonies]]" or "DNA clusters". This method is used in the [[Illumina (company)|Illumina]] Genome Analyzer [[#subsection Illumina (Solexa) sequencing|sequencers]]. Single-molecule methods, such as that developed by [[Stephen Quake]]'s laboratory (later commercialized by [[Helicos Biosciences|Helicos]]) are an exception:  they use bright fluorophores and laser excitation to detect base addition events from individual DNA molecules fixed to a surface, eliminating the need for molecular amplification.<ref>{{cite journal |author=Braslavsky I, Hebert B, Kartalov E, Quake SR |title=Sequence information can be obtained from single DNA molecules |journal=Proc. Natl. Acad. Sci. U.S.A. |volume=100 |issue=7 |pages=3960–4 |date=April 2003 |pmid=12651960 |pmc=153030|doi=10.1073/pnas.0230489100 |bibcode = 2003PNAS..100.3960B |last2=Hebert |last3=Kartalov |last4=Quake }}</ref>
Once clonal DNA sequences are physically localized to separate positions on a surface, various sequencing approaches may be used to determine the DNA sequences of all locations, in parallel. "Sequencing by synthesis", like the popular dye-termination electrophoretic sequencing, uses the process of DNA synthesis by [[DNA polymerase]] to identify the bases present in the complementary DNA molecule. Reversible terminator methods (used by Illumina and Helicos) use reversible versions of dye-terminators, adding one nucleotide at a time, detecting fluorescence corresponding to that position, then removing the blocking group to allow the polymerization of another nucleotide. [[Pyrosequencing]] (used by 454) also uses DNA polymerization to add nucleotides, adding one type of nucleotide at a time, then detecting and quantifying the number of nucleotides added to a given location through the light emitted by the release of attached pyrophosphates.<ref name=Margulies /><ref>{{cite journal| title=Real-time DNA sequencing using detection of pyrophosphate release| author=M. Ronaghi, S. Karamohamed, B. Pettersson, M. Uhlen, and P. Nyren| journal=Analytical Biochemistry| volume=242| pages=84=89| year=1996}}</ref>  


"[[Sequencing by ligation]]" is another enzymatic method of sequencing, using a [[DNA ligase]] enzyme rather than polymerase to identify the target sequence.<ref name=polony_sequencing /><ref>http://solid.appliedbiosystems.com/ - Applied Biosystems' SOLiD technology</ref> Used in the polony method (and in the SOLiD technology offered by Applied Biosystems), this method uses a pool of random oligonucleotides labeled according to the sequenced position. Oligonucleotides are annealed and ligated; the preferential ligation by DNA ligase for matching sequences results in a signal corresponding to the complementary sequence at that position.
==Next-Generation Methods==


===Other sequencing technologies===
The high demand for low-cost sequencing has driven the development of high-throughput sequencing (or next-generation sequencing) technologies that [[multiplex (assay)|parallelize]] the sequencing process, producing thousands or millions of sequences concurrently.<ref name=hall2007>{{cite journal
Other methods of DNA sequencing may have advantages in terms of efficiency or accuracy. Like traditional dye-terminator sequencing, they are limited to sequencing single isolated DNA fragments. "[[Sequencing by hybridization]]" is a non-enzymatic method that uses a [[DNA microarray]]. In this method, a single pool of unknown DNA is fluorescently labeled and hybridized to an array of known sequences. If the unknown DNA hybridizes strongly to a given spot on the array, causing it to "light up", then that sequence is inferred to exist within the unknown DNA being sequenced.<ref>{{cite journal| title=Comparison of sequencing by hybridization and cycle sequencing for genotyping of human immunodeficiency virus type 1 reverse transcriptase| author=G.J. Hanna, V.A. Johnson, D.R. Kuritzkes, D.D. Richman, J. Martinez-Picado, L. Sutton, J.D. Hazelwood, R.T. D'Aquila| journal=Journal of Clinical Microbiology| year=2000| volume=38| issue=7| pages=2715|2721| pmid=10878069}}</ref> [[Mass spectrometry]] can also be used to sequence DNA molecules; conventional chain-termination reactions produce DNA molecules of different lengths and the length of these fragments is then determined by the mass differences between them (rather than using gel separation).<ref>{{cite journal| title=Mass-spectrometry DNA sequencing| author= J.R. Edwards, H.Ruparel, and J. Ju| journal=Mutation Research| volume=573| issue=1-2| pages=3-12| year-2005}}</ref>
|last=Hall |first=Nell |title=Advanced sequencing technologies and their wider impact in microbiology |journal=[[J. Exp. Biol.]] |volume=209 |issue=Pt 9 |pages=1518&ndash;1525 |date=May 2007 |pmid=17449817 |doi=10.1242/jeb.001370 }}{{open access}}</ref><ref name=church2006>{{cite journal
|last1=Church |first1=George M. |authorlink1=George M. Church |title=Genomes for all |journal=[[Sci. Am.]] |volume=294 |issue=1 |pages=46&ndash;54 |date=January 2006 |pmid=16468433 |doi=10.1038/scientificamerican0106-46 }}{{subscription required}}</ref> High-throughput sequencing technologies are intended to lower the cost of DNA sequencing beyond what is possible with standard dye-terminator methods.<ref name=pmid18165802/>  In ultra-high-throughput sequencing as many as 500,000 sequencing-by-synthesis operations may be run in parallel.<ref name=kalb1992>{{cite book
|title=Massively Parallel, Optical, and Neural Computing in the United States |first1=Gilbert |last1=Kalb |first2=Robert |last2=Moxley |publisher=[[IOS Press]] |year=1992 |isbn=90-5199-097-9 }}{{Page needed|date=June 2013}}</ref><ref name=tenBosch2008>{{Cite pmid
|18832462}}{{open access}}</ref><ref name=Tucker2009>{{Cite pmid
|19679224}}{{open access}}</ref>


There are new proposals for DNA sequencing, which are in development, but remain to be proven. These include labeling the DNA polymerase,<ref>[http://visigenbio.com/technology_overview.html VisiGen Biotechnologies Inc. - Technology Overview<!-- Bot generated title -->]</ref> reading the sequence as a DNA strand transits through [[nanopore sequencing|nanopores]],<ref>[http://mcb.harvard.edu/branton/index.htm The Harvard Nanopore Group<!-- Bot generated title -->]</ref> and microscopy-based techniques, such as [[Atomic force microscope|AFM]] or [[Electron Microscope|electron microscopy]] that are used to identify the positions of individual nucleotides within long DNA fragments by nucleotide labeling with heavier elements (e.g., halogens) for visual detection and recording.<ref> USPTO application # 20060029957 assigned to ZS genetics http://www.freepatentsonline.com/20060029957.html</ref> In October 2006 the [[NIH]] issued a news release describing novel sequencing techniques and announcing several grant awards.<ref>[http://www.genome.gov/19518500 NHGRI Aims to Make DNA Sequencing Faster, More Cost Effective], NIH News Release, 4 October 2006</ref>
{| class="wikitable" border="1"
|+ Comparison of next-generation sequencing methods<ref name=quail2012>{{cite journal |last1=Quail |first1=Michael |last2=Smith |first2=Miriam E |last3=Coupland |first3=Paul |last4=Otto |first4=Thomas D |last5=Harris |first5=Simon R |last6=Connor |first6=Thomas R |last7=Bertoni |first7=Anna |last8=Swerdlow |first8=Harold P |last9=Gu |first9=Yong |display-authors=3 |title=A tale of three next generation sequencing platforms: comparison of Ion torrent, pacific biosciences and illumina MiSeq sequencers |journal=[[BMC Genomics]] |date=1 January 2012 |volume=13 |issue=1 |pages=341 |doi=10.1186/1471-2164-13-341 |pmid=22827831 |pmc=3431227 }}{{open access}}</ref><ref name=lin2012>{{cite journal |last1=Liu |first1=Lin |last2=Li |first2=Yinhu |last3=Li |first3=Siliang |last4=Hu |first4=Ni |last5=He |first5=Yimin |last6=Pong |first6=Ray |last7=Lin |first7=Danni |last8=Lu |first8=Lihua |last9=Law |first9=Maggie |display-authors=3 |title=Comparison of Next-Generation Sequencing Systems |journal=Journal of Biomedicine and Biotechnology |publisher=[[Hindawi Publishing Corporation]] |date=1 January 2012 |volume=2012 |pages=1&ndash;11 |doi=10.1155/2012/251364 }}{{open access}}</ref>
! Method !! Single-molecule real-time sequencing (Pacific Bio) !! Ion semiconductor (Ion Torrent sequencing) !!Pyrosequencing (454) !! Sequencing by synthesis (Illumina) !! Sequencing by ligation (SOLiD sequencing) !! Chain termination (Sanger sequencing)
|-
| '''Read length''' ||5,500 bp to 8,500 bp avg (10,000 bp [[N50 statistic|N50]]); maximum read length >30,000 bases<ref>[http://www.genomeweb.com/sequencing/new-products-pacbios-rs-ii-cufflinks New Products: PacBio's RS II; Cufflinks | In Sequence | Sequencing | GenomeWeb<!-- Bot generated title -->]</ref><ref name=autogenerated1>{{cite web |url=http://www.genomeweb.com/sequencing/after-year-testing-two-early-pacbio-customers-expect-more-routine-use-rs-sequenc |title=After a Year of Testing, Two Early PacBio Customers Expect More Routine Use of RS Sequencer in 2012 |author=<!--Staff writer(s); no by-line.--> |date=10 January 2012 |publisher=[[GenomeWeb]] }}{{registration required}}</ref><ref>[http://globenewswire.com/news-release/2013/10/03/577891/10051072/en/Pacific-Biosciences-Introduces-New-Chemistry-With-Longer-Read-Lengths-to-Detect-Novel-Features-in-DNA-Sequence-and-Advance-Genome-Studies-of-Large-Organisms.html Pacific Biosciences Introduces New Chemistry With Longer Read Lengths]</ref>|| up to 400 bp|| 700 bp || 50 to 300 bp ||50+35 or 50+50 bp || 400 to 900 bp
|-
| '''Accuracy''' ||99.999% consensus accuracy; 87% single-read accuracy<ref>http://www.nature.com/nmeth/journal/v10/n6/full/nmeth.2474.html</ref>|| 98%|| 99.9% || 98% || 99.9% || 99.9%


In October 2006, the [[X Prize Foundation]] established the [[Archon X Prize]], intending to award $10 million to "the first Team that can build a device and use it to sequence 100 human genomes within 10 days or less, with an accuracy of no more than one error in every 100,000 bases sequenced, with sequences accurately covering at least 98% of the genome, and at a recurring cost of no more than $10,000 (US) per genome."<ref>[http://genomics.xprize.org/axp/prize_overview.html "PRIZE Overview: Archon X PRIZE for Genomics"]</ref>
|-
| '''Reads per run'''||50,000 per SMRT cell, or ~400 megabases<ref name="flxlexblog.wordpress.com">[http://flxlexblog.wordpress.com/2013/07/05/de-novo-bacterial-genome-assembly-a-solved-problem/ De novo bacterial genome assembly: a solved problem? | In between lines of code<!-- Bot generated title -->]</ref><ref name=rasko2011>{{cite journal |last1=Rasko |first1=David A. |last2=Webster |first2=Dale R. |last3=Sahl |first3=Jason W. |last4=Bashir |first4=Ali |last5=Boisen |first5=Nadia |last6=Scheutz |first6=Flemming |last7=Paxinos |first7=Ellen E. |last8=Sebra |first8=Robert |last9=Chin |first9=Chen-Shan |last10=Iliopoulos |first10=Dimitris |last11=Klammer |first11=Aaron |last12=Peluso |first12=Paul |last13=Lee |first13=Lawrence |last14=Kislyuk |first14=Andrey O. |last15=Bullard |first15=James |last16=Kasarskis |first16=Andrew |last17=Wang |first17=Susanna |last18=Eid |first18=John |last19=Rank |first19=David |last20=Redman |first20=Julia C. |last21=Steyert |first21=Susan R. |last22=Frimodt-Møller |first22=Jakob |last23=Struve |first23=Carsten |last24=Petersen |first24=Andreas M. |last25=Krogfelt |first25=Karen A. |last26=Nataro |first26=James P. |last27=Schadt |first27=Eric E. |last28=Waldor |first28=Matthew K. |display-authors=3 |title=Origins of the Strain Causing an Outbreak of Hemolytic–Uremic Syndrome in Germany |journal=[[N Engl J Med]] |date=25 August 2011 |volume=365 |issue=8 |pages=709&ndash;717 |doi=10.1056/NEJMoa1106920 }}{{open access}}</ref> || up to 80 million|| 1 million || up to 3 billion || 1.2 to 1.4 billion || N/A


== Major landmarks in DNA sequencing ==
|-
| '''Time per run''' ||30 minutes to 2 hours<ref name=tran2012>{{cite journal |last1=Tran |first1=Ben |last2=Brown |first2=Andrew M.K. |last3=Bedard |first3=Philippe L. |last4=Winquist |first4=Eric |last5=Goss |first5=Glenwood D. |last6=Hotte |first6=Sebastien J. |last7=Welch |first7=Stephen A. |last8=Hirte |first8=Hal W. |last9=Zhang |first9=Tong |last10=Stein |first10=Lincoln D. |authorlink10=Lincoln Stein |last11=Ferretti |first11=Vincent |last12=Watt |first12=Stuart |last13=Jiao |first13=Wei |last14=Ng |first14=Karen |last15=Ghai |first15=Sangeet |last16=Shaw |first16=Patricia |last17=Petrocelli |first17=Teresa |last18=Hudson |first18=Thomas J. |authorlink18=Thomas J. Hudson |last19=Neel |first19=Benjamin G. |last20=Onetto |first20=Nicole |last21=Siu |first21=Lillian L. |last22=McPherson |first22=John D. |last23=Kamel-Reid |first23=Suzanne |last24=Dancey |first24=Janet E. |display-authors=19 |title=Feasibility of real time next generation sequencing of cancer genes linked to drug response: Results from a clinical trial |journal=[[Int. J. Cancer]] |date=1 January 2012 |pages=1547&ndash;1555 |doi=10.1002/ijc.27817 }}{{subscription required}}</ref>|| 2 hours || 24 hours || 1 to 10 days, depending upon sequencer and specified read length<ref name=vliet2010>{{cite journal |last=van Vliet |first=Arnoud H.M. |title=Next generation sequencing of microbial transcriptomes: challenges and opportunities |journal=[[FEMS Microbiology Letters]] |date=1 January 2010 |volume=302 |issue=1 |pages=1&ndash;7 |doi=10.1111/j.1574-6968.2009.01767.x }}{{open access}}</ref>||1 to 2 weeks || 20 minutes to 3 hours
|-
| '''Cost per 1 million bases (in US$)''' ||$0.33-$1.00|| $1 || $10 || $0.05 to $0.15 || $0.13 || $2400
|-
| '''Advantages''' ||Longest read length. Fast. Detects 4mC, 5mC, 6mA.<ref>{{cite journal|last=Murray|first=I. A.|coauthors=Clark, T. A.; Morgan, R. D.; Boitano, M.; Anton, B. P.; Luong, K.; Fomenkov, A.; Turner, S. W.; Korlach, J.; Roberts, R. J.|title=The methylomes of six bacteria|journal=Nucleic Acids Research|date=2 October 2012|doi=10.1093/nar/gks891|pmid=23034806|pmc=3526280|volume=40|issue=22|pages=11450–62}}</ref> ||Less expensive equipment. Fast.|| Long read size. Fast. ||Potential for high sequence yield, depending upon sequencer model and desired application.||Low cost per base. || Long individual reads. Useful for many applications.


*[[1953 in science|1953]] Discovery of the structure of the [[DNA double helix]].
|-
| '''Disadvantages''' ||Moderate throughput. Equipment can be very expensive.||Homopolymer errors.|| Runs are expensive. Homopolymer errors. ||Equipment can be very expensive. Requires high concentrations of DNA. ||Slower than other methods. Have issue sequencing palindromic sequence.<ref name="Yu-Feng Huang, Sheng-Chung Chen, Yih-Shien Chiang, Tzu-Han Chen & Kuo-Ping Chiu 2012 S10">{{Cite journal
| author = [[Yu-Feng Huang]], [[Sheng-Chung Chen]], [[Yih-Shien Chiang]], [[Tzu-Han Chen]] & [[Kuo-Ping Chiu]]
| title = Palindromic sequence impedes sequencing-by-ligation mechanism
| journal = [[BMC systems biology]]
| volume = 6 Suppl 2
| pages = S10
| year = 2012
| doi = 10.1186/1752-0509-6-S2-S10
| pmid = 23281822
}}</ref>|| More expensive and impractical for larger sequencing projects.
|}


*[[1972 in science|1972]] Development of [[recombinant DNA]] technology, which permits isolation of defined fragments of DNA; prior to this, the only accessible samples for sequencing were from bacteriophage or virus DNA.
===Massively Parallel Signature Sequencing (MPSS)===


*[[1975 in science|1975]] The first complete DNA genome to be sequenced is that of [[bacteriophage φX174]]
The first of the next-generation sequencing technologies, [[massively parallel signature sequencing]] (or MPSS), was developed in the 1990s at Lynx Therapeutics, a company founded in 1992 by [[Sydney Brenner]] and [[Applied Biosystems#History|Sam Eletr]]. MPSS was a bead-based method that used a complex approach of adapter ligation followed by adapter decoding, reading the sequence in increments of four nucleotides. This method made it susceptible to sequence-specific bias or loss of specific sequences. Because the technology was so complex, MPSS was only performed 'in-house' by Lynx Therapeutics and no DNA sequencing machines were sold to independent laboratories. Lynx Therapeutics merged with Solexa (later acquired by [[Illumina (company)|Illumina]]) in 2004, leading to the development of sequencing-by-synthesis, a simpler approach acquired from [[Manteia Predictive Medicine]], which rendered MPSS obsolete. However, the essential properties of the MPSS output were typical of later "next-generation" data types, including hundreds of thousands of short DNA sequences. In the case of MPSS, these were typically used for sequencing [[cDNA]] for measurements of [[gene expression]] levels.<ref>{{cite journal
| title=Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays
| first9=M
| last9=Foy
| first8=S
| last8=McCurdy
| first7=S
| last7=Luo
| first6=D
| last6=Johnson
| first5=DH
| last5=Lloyd
| first4=G
| last4=Golda
| first3=J
| last3=Bridgham
| first2=M
| last2=Johnson
| first=Sidney
| last=Brenner
| year=2000
| publisher=[[Nature Biotechnology]]
| volume=18
| pages=630–634
| doi=10.1038/76469
| pmid=10835600
| issue=6
| journal=Nature Biotechnology
}}</ref>


*[[1977 in science|1977]] [[Allan Maxam]] and [[Walter Gilbert]] publish "DNA sequencing by chemical degradation" [http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=265521]. [[Fred Sanger]], independently, publishes "DNA sequencing by enzymatic synthesis".
===Polony Sequencing===


*[[1980 in science|1980]] Fred Sanger and Wally Gilbert receive the [[Nobel Prize in Chemistry]]
The [[Polony sequencing]] method, developed in the laboratory of [[George M. Church]] at Harvard, was among the first next-generation sequencing systems and was used to sequence a full genome in 2005.  It combined an in vitro paired-tag library with emulsion PCR, an automated microscope, and ligation-based sequencing chemistry to sequence an ''E. coli'' genome at an accuracy of >99.9999% and a cost approximately 1/9 that of Sanger sequencing.<ref>{{cite journal|last=Shendure|first=J|coauthors=Porreca, GJ; Reppas, NB; Lin, X; McCutcheon, JP; Rosenbaum, AM; Wang, MD; Zhang, K; Mitra, RD; Church, GM|title=Accurate multiplex polony sequencing of an evolved bacterial genome.|journal=Science|date=Sep 9, 2005|volume=309|issue=5741|pages=1728–32|pmid=16081699|doi=10.1126/science.1117389|bibcode = 2005Sci...309.1728S }}</ref>  The technology was licensed to Agencourt Biosciences, subsequently spun out into Agencourt Personal Genomics, and eventually incorporated into the Applied Biosystems SOLiD platform, which is now owned by [[Life Technologies (Thermo Fisher Scientific)|Life Technologies]], which was recently bought by Thermo Fisher Scientific.


*[[1982 in science|1982]] [[Genbank]] starts as a public repository of DNA sequences.
===454 Pyrosequencing===
**[[Andre Marion]] and [[Sam Eletr]] from [[Hewlett Packard]] start [[Applied Biosystems]] in May, which comes to dominate automated sequencing.
**[[Akiyoshi Wada]] proposes [[automated sequencing]] and gets support to build robots with help from [[Hitachi]].


*[[1984 in science|1984]] [[Medical Research Council (UK)|Medical Research Council]] scientists decipher the complete DNA sequence of the [[Epstein-Barr virus]], 170 kb.
A parallelized version of [[pyrosequencing]] was developed by [[454 Life Sciences]], which has since been acquired by [[Roche Diagnostics]]. The method amplifies DNA inside water droplets in an oil solution (emulsion PCR), with each droplet containing a single DNA template attached to a single primer-coated bead that then forms a clonal colony. The sequencing machine contains many [[picoliter]]-volume wells each containing a single bead and sequencing enzymes. Pyrosequencing uses [[luciferase]] to generate light for detection of the individual nucleotides added to the nascent DNA, and the combined data are used to generate sequence read-outs.<ref name="Margulies" /> This technology provides intermediate read length and price per base compared to Sanger sequencing on one end and Solexa and SOLiD on the other.<ref name="pmid18165802"/>


*[[1985 in science|1985]] [[Kary Mullis]] and colleagues develop the [[polymerase chain reaction]], a technique to replicate small fragments of DNA
===Illumina (Solexa) Sequencing===


*[[1986 in science|1986]] [[Leroy E. Hood]]'s laboratory at the [[California Institute of Technology]] and Smith announce the first semi-automated DNA sequencing machine.
[[Solexa]], now part of [[Illumina (company)|Illumina]], was founded by Shankar Balasubramanian and David Klenerman in 1998, and developed a sequencing method based on reversible dye-terminators technology, and engineered polymerases.<ref>{{cite pmid|18987734}}</ref> The terminated chemistry was developed internally at Solexa and the concept of the Solexa system was invented by Balasubramanian and Klenerman from Cambridge University's chemistry department. In 2004, Solexa acquired the company [[Manteia Predictive Medicine]] in order to gain a massivelly parallel sequencing technology based on "DNA Clusters", which involves the clonal amplification of DNA on a surface. The cluster technology was co-acquired with Lynx Therapeutics of California. Solexa Ltd. later merged with Lynx to form Solexa Inc.


*[[1987 in science|1987]] Applied Biosystems markets first automated sequencing machine, the model ABI 370.
In this method, DNA molecules and primers are first attached on a slide and amplified with [[polymerase]] so that local clonal DNA colonies, later coined "DNA clusters", are formed. To determine the sequence, four types of reversible terminator bases (RT-bases) are added and non-incorporated nucleotides are washed away. A camera takes images of the [[Fluorescent labeling|fluorescently labeled]] nucleotides, then the dye, along with the terminal 3' blocker, is chemically removed from the DNA, allowing for the next cycle to begin. Unlike pyrosequencing, the DNA chains are extended one nucleotide at a time and image acquisition can be performed at a delayed moment, allowing for very large arrays of DNA colonies to be captured by sequential images taken from a single camera.
**Walter Gilbert leaves the [[U.S. National Research Council]] genome panel to start [[Genome Corp.]], with the goal of sequencing and commercializing the data.


*[[1990 in science|1990]] The U.S. [[National Institutes of Health]] (NIS) begins large-scale sequencing trials on ''[[Mycoplasma capricolum]]'', ''[[Escherichia coli]]'', ''[[Caenorhabditis elegans]]'', and ''[[Saccharomyces cerevisiae]]'' (at 75 cents (US)/base).  
Decoupling the enzymatic reaction and the image capture allows for optimal throughput and theoretically unlimited sequencing capacity. With an optimal configuration, the ultimately reachable instrument throughput is thus dictated solely by the analog-to-digital conversion rate of the camera, multiplied by the number of cameras and divided by the number of pixels per DNA colony required for visualizing them optimally (approximately 10 pixels/colony). In 2012, with cameras operating at more than 10&nbsp;MHz A/D conversion rates and available optics, fluidics and enzymatics, throughput can be multiples of 1 million nucleotides/second, corresponding roughly to 1 human genome equivalent at 1x [[Read depth|coverage]] per hour per instrument, and 1 human genome re-sequenced (at approx. 30x) per day per instrument (equipped with a single camera).<ref name="pmid18576944">{{cite journal |author=Mardis ER |title=Next-generation DNA sequencing methods |journal=Annu Rev Genomics Hum Genet |volume=9 |issue= |pages=387–402 |year=2008 |pmid=18576944 |doi=10.1146/annurev.genom.9.081307.164359 |url=}}</ref>
**Lipman, Myers publish the [[BLAST]] [[algorithm]] for aligning sequences.
**Barry Karger (January<ref> {{cite journal|title=Analytical and micropreparative ultrahigh resolution of oligonucleotides by polyacrylamide gel high-performance capillary electrophoresis|journal=Analytical Chemistry|date=1990-01-15|first=Barry L.|last=Karger|coauthors=A. Guttman, A. S. Cohen, D. N. Heiger|volume=62|issue=2|pages=137 - 141|id= {{doi|10.1021/ac00201a010}}|url=|format=|accessdate=2007-10-08 }}</ref>), Lloyd Smith (August<ref> {{cite journal|title=High speed DNA sequencing by capillary electrophoresis.|journal=Nucleic Acids Research|date=1990-08-11|first=Lloyd M.|last=Smith|coauthors=Luckey JA, Drossman H, Kostichka AJ, Mead DA, D'Cunha J, Norris TB|volume=18|issue=|pages=4417-4421|id=PMID 2388826 {{doi|10.1093/nar/18.15.4417}}|url=http://nar.oxfordjournals.org/cgi/content/abstract/18/15/4417|format=|accessdate=2007-10-08 }}</ref>), and Norman Dovichi (September<ref>{{cite journal|title=Capillary gel electrophoresis for DNA sequencing: laser-induced fluorescence detection with the sheath flow cuvette|journal=Journal of Chromatography|date=1990-09-07|first=Norman J.|last=Dovichi|coauthors=H.P. Swerdlow, S. Wu , H.R. Harke|volume=516|issue=|pages=61-67|id=PMID 2286629 |url=|format=|accessdate=2007-10-08 }}</ref>) publish on [[capillary electrophoresis]].


*[[1991 in science|1991]] [[Craig Venter]] develops strategy to find expressed genes with ESTs (Expressed Sequence Tags).
===SOLiD Sequencing===
**Uberbacher develops GRAIL, a gene-prediction program.


*[[1992 in science|1992]] Craig Venter leaves [[NIH]] to set up The Institute for Genomic Research ([[TIGR]]).  
[[Applied Biosystems]] (now a [[Life Technologies (Thermo Fisher Scientific)|Life Technologies]] brand) SOLiD technology employs [[sequencing by ligation]]. Here, a pool of all possible oligonucleotides of a fixed length are labeled according to the sequenced position. Oligonucleotides are annealed and ligated; the preferential ligation by [[DNA ligase]] for matching sequences results in a signal informative of the nucleotide at that position. Before sequencing, the DNA is amplified by emulsion PCR. The resulting beads, each containing single copies of the same DNA molecule, are deposited on a glass slide.<ref name="pmid18477713">{{cite journal |author=Valouev A |title=A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning |journal=Genome Res. |volume=18 |issue=7 |pages=1051–63 |date=July 2008 |pmid=18477713 |pmc=2493394 |doi=10.1101/gr.076463.108 |author-separator=, |author2=Ichikawa J |author3=Tonthat T |display-authors=3 |last4=Stuart |first4=J. |last5=Ranade |first5=S. |last6=Peckham |first6=H. |last7=Zeng |first7=K. |last8=Malek |first8=J. A. |last9=Costa |first9=G.}}</ref> The result is sequences of quantities and lengths comparable to Illumina sequencing.<ref name="pmid18165802"/> This [[sequencing by ligation]] method has been reported to have some issue sequencing palindromic sequences.<ref name="Yu-Feng Huang, Sheng-Chung Chen, Yih-Shien Chiang, Tzu-Han Chen & Kuo-Ping Chiu 2012 S10"/>
**William Haseltine heads Human Genome Sciences, to commercialize TIGR products.
**[[Wellcome Trust]] begins participation in the [[Human Genome Project]].  
**Simon et al. develop BACs ([[Bacterial Artificial Chromosomes]]) for cloning.
**First chromosome physical maps published:
***Page et al. - Y chromosome<ref> {{cite journal|title=The human Y chromosome: overlapping DNA clones spanning the euchromatic region.|journal=Science|date=1992-10-02|first=DC|last=Page|coauthors=Foote S, Vollrath D, Hilton A|volume=258|issue=5079|pages=60-66|id=PMID 1359640 |url=|format=|accessdate=2007-10-08 }}</ref>;
***Cohen et al. chromosome 21<ref> {{cite journal|title=Continuum of overlapping clones spanning the entire human chromosome 21q|journal=Nature|date=1992-10-01|first=Daniel|last=Cohen|coauthors=Ilya Chumakov, Philippe Rigault, Sophie Guillou, Pierre Ougen, Alain Billaut, Ghislaine Guasconi, Patricia Gervy, Isabelle LeGall, Pascal Soularue, Laurent Grinas, Lydie Bougueleret, Christine Bellanné-Chantelot, Bruno Lacroix, Emmanuel Barillot, Philippe Gesnouin, Stuart Pook, Guy Vaysseix, Gerard Frelat, Annette Schmitz, Jean-Luc Sambucy, Assumpcio Bosch, Xavier Estivill, Jean Weissenbachparallel, Alain Vignal, Harold Riethman, David Cox, David Patterson, Kathleen Gardiner, Masahira Hattori, Yoshiyuki Sakaki, Hitoshi Ichikawa, Misao Ohki, Denis Le Paslier, Roland Heilig, Stylianos Antonarakis|volume=359|issue=6394|pages=380-387|id=PMID 1406950 {{doi|10.1038/359380a0}}|url=|format=|accessdate=2007-10-08 }}</ref>.  
***Lander - complete mouse genetic map<ref> {{cite journal|title=A Genetic Map of the Mouse Suitable for Typing Intraspecific Crosses|journal=Genetics|date=1992-06|first=E. S.|last=Lander|coauthors=Dietrich W, Katz H, Lincoln SE, Shin HS, Friedman J, Dracopoli NC|volume=131|issue=|pages=423-447|id=PMID 1353738 |url=http://www.genetics.org/cgi/content/abstract/131/2/423|format=|accessdate=2007-10-08 }}</ref>;
***Weissenbach - complete human genetic map<ref> {{cite journal|title=A second-generation linkage map of the human genome|journal=Nature|date=1992-10-29|first=Jean|last=Weissenbach|coauthors=Gyapay G, Dib C, Vignal A, Morissette J, Millasseau P, Vaysseix G, Lathrop M.|volume=359|issue=|pages=794 - 801|id=PMID 1436057 {{doi|10.1038/359794a0}}|url=|format=|accessdate=2007-10-08 }}</ref>.


*[[1993 in science|1993]] Wellcome Trust and MRC open [[Sanger Centre]], near Cambridge, UK.
===Ion Torrent Semiconductor Sequencing===
**The GenBank database migrates from Los Alamos (DOE) to [[NCBI]] (NIH).


*[[1995 in science|1995]] Venter, Fraser and Smith publish first sequence of free-living organism, ''[[Haemophilus influenzae]]'' (genome size of 1.8 Mb).
Ion Torrent Systems Inc. (now owned by [[Life Technologies (Thermo Fisher Scientific)|Life Technologies]]) developed a system based on using standard sequencing chemistry, but with a novel, semiconductor based detection system. This method of sequencing is based on the detection of [[hydrogen ion]]s that are released during the [[DNA polymerase|polymerisation]] of [[DNA]], as opposed to the optical methods used in other sequencing systems. A microwell containing a template DNA strand to be sequenced is flooded with a single type of [[nucleotide]]. If the introduced nucleotide is [[complementarity (molecular biology)|complementary]] to the leading template nucleotide it is incorporated into the growing complementary strand. This causes the release of a hydrogen ion that triggers a hypersensitive ion sensor, which indicates that a reaction has occurred. If [[homopolymer]] repeats are present in the template sequence multiple nucleotides will be incorporated in a single cycle. This leads to a corresponding number of released hydrogens and a proportionally higher electronic signal.<ref name="rusk">{{cite journal | author = Rusk N | year = 2011 | title = Torrents of sequence | url = | journal = Nat Meth | volume = 8 | issue = 1| pages = 44–44 | doi=10.1038/nmeth.f.330}}</ref>
**[[Richard Mathies]] et al. publish on sequencing dyes (PNAS, May)<ref> {{cite journal|title=Fluorescence Energy Transfer Dye-Labeled Primers for DNA Sequencing and Analysis|journal=PNAS|date=1995-05-09|first=R. A.|last=Mathies|coauthors=Ju J, Ruan C, Fuller CW, Glazer AN|volume=92|issue=|pages=4347-4351|id=PMID 7753809 |url=http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=7753809|format=|accessdate=2007-10-08 }}</ref>.
**Michael Reeve and Carl Fuller, thermostable polymerase for sequencing<ref name='Reeve,Fuller'> {{cite journal|title=A novel thermostable polymerase for DNA sequencing.|journal=Nature|date=1995-08-31|first=|last=|coauthors=Reeve MA, Fuller CW|volume=376|issue=6543|pages=796-797|id=PMID 7651542 |url=|format=|accessdate=2007-10-08 }}</ref>.


*[[1996 in science|1996]] International HGP partners agree to release sequence data into public databases within 24 hours.
===DNA Nanoball Sequencing===
**International consortium releases genome sequence of yeast ''S. cerevisiae'' (genome size of 12.1 Mb).
**Yoshihide Hayashizaki's at RIKEN completes the first set of full-length mouse cDNAs.
**ABI introduces a capillary electrophoresis system, the ABI310 sequence analyzer.


*[[1997 in science|1997]] Blattner, Plunkett et al. publish the sequence of E. coli (genome size of 5 Mb)<ref> {{cite journal|title=The Complete Genome Sequence of Escherichia coli K-12|journal=Science|date=1997-09-05|first=|last=|coauthors=Frederick R. Blattner, Guy Plunkett III, Craig A. Bloch, Nicole T. Perna, Valerie Burland, Monica Riley, Julio Collado-Vides, Jeremy D. Glasner, Christopher K. Rode, George F. Mayhew, Jason Gregor, Nelson Wayne Davis, Heather A. Kirkpatrick, Michael A. Goeden, Debra J. Rose, Bob Mau, Ying Shao|volume=277|issue=5331|pages=1453-1462|id=PMID 9278503 {{doi|10.1126/science.277.5331.1453}}|url=|format=|accessdate=2007-10-08 }}</ref>
[[DNA nanoball sequencing]] is a type of [[high throughput sequencing]] technology used to determine the entire [[genomic sequence]] of an organism.  The company [[Complete Genomics]] uses this technology to sequence samples submitted by independent researchers. The method uses [[rolling circle replication]] to amplify small fragments of genomic DNA into DNA nanoballs.  Unchained sequencing by ligation is then used to determine the nucleotide sequence.<ref name=orig>{{cite journal | author = Drmanac R. et al. | year = 2010 | title = Human Genome Sequencing Using Unchained Base Reads in Self-Assembling DNA Nanoarrays | url = | journal = Science | volume = 327 | issue = 5961| pages = 78–81 |bibcode = 2010Sci...327...78D |doi = 10.1126/science.1181498 | pmid=19892942}}</ref>  This method of DNA sequencing allows large numbers of DNA nanoballs to be sequenced per run and at low [[reagent]] costs compared to other next generation sequencing platforms.<ref>{{cite journal | author = Porreca JG | year = 2010 | title = Genome Sequencing on Nanoballs | url = | journal = Nature Biotechnology | volume = 28 | issue = 1| pages = 43–44 | pmid = 20062041 | doi=10.1038/nbt0110-43}}</ref>  However, only short sequences of DNA are determined from each DNA nanoball which makes mapping the short reads to a [[reference genome]] difficult.<ref>{{cite journal | author = Drmanac R. et al. | year = 2010 | title = Human Genome Sequencing Using Unchained Base Reaads in Self-Assembling DNA Nanoarrays, Supplementary Material | url = http://www.ncbi.nlm.nih.gov/pubmed?term=Human%20Genome%20Sequencing%20Using%20Unchained%20Base%20Reads%20in%20Self-Assembling%20DNA%20Nanoarrays | journal = Science | volume = 327 | issue = 5961| pages = 78–81 |bibcode = 2010Sci...327...78D |doi = 10.1126/science.1181498 | pmid=19892942}}</ref> This technology has been used for multiple genome sequencing projects and is scheduled to be used for more.<ref>[http://www.completegenomics.com/news-events/press-releases/ Complete Genomics] Press release, 2010</ref>


*[[1998 in science|1998]] Phil Green and Brent Ewing of Washington University publish <code>“phred”</code> for interpreting sequencer data (in use since ‘95)<ref> {{cite journal|title=Base-calling of automated sequencer traces using phred. II. Error probabilities.|journal=Genome Reserarch|date=1998-03|first=|last=|coauthors=Phil Green and Brent Ewing|volume=8|issue=3|pages=186-94|id=PMID 9521922 |url=|format=|accessdate=2007-10-08 }}</ref>.
===Heliscope Single Molecule Sequencing===
**Venter starts new company “Celera”; “will sequence HG in 3 yrs for $300m.”
**Applied Biosystems introduces the 3700 capillary sequencing machine.
**Wellcome Trust doubles support for the HGP to $330 million for 1/3 of the sequencing.
**NIH & DOE goal: "working draft" of the human genome by 2001.
**Sulston, Waterston et al finish sequence of ''C. elegans'' (genome size of 97Mb)<ref> {{cite journal|title=Genome Sequence of the Nematode C. elegans: A Platform for Investigating Biology|journal=Science|date=1998-12-11|first=|last=|coauthors=The C. elegans Sequencing Consortium|volume=282|issue=5396|pages=2012-2018|id=PMID 9851916 {{doi|10.1126/science.282.5396.2012}}|url=|format=|accessdate=2007-10-08 }}</ref>.


*[[1999 in science|1999]] NIH moves up completion date for rough draft, to spring 2000.  
Heliscope sequencing is a method of single-molecule sequencing developed by [[Helicos Biosciences]]. It uses DNA fragments with added poly-A tail adapters which are attached to the flow cell surface. The next steps involve extension-based sequencing with cyclic washes of the flow cell with fluorescently labeled nucleotides (one nucleotide type at a time, as with the Sanger method). The reads are performed by the Heliscope sequencer. The reads are short, up to 55 bases per run, but recent improvements allow for more accurate reads of stretches of one type of nucleotides.<ref>[http://www.helicosbio.com/Products/HelicosregGeneticAnalysisSystem/HeliScopetradeSequencer/tabid/87/Default.aspx HeliScope Gene Sequencing / Genetic Analyzer System : Helicos BioSciences<!-- Bot generated title -->]</ref><ref>{{cite journal|last=Thompson|first=JF|author2=Steinmann, KE|title=Single molecule sequencing with a HeliScope genetic analysis system.|journal=Current Protocols in Molecular Biology|date=October 2010|volume=Chapter 7|pages=Unit7.10|pmid=20890904|doi=10.1002/0471142727.mb0710s92|pmc=2954431}}</ref>
**NIH launches the mouse genome sequencing project.  
**First sequence of human chromosome 22 published<ref> {{cite journal|title=The DNA sequence of human chromosome 22|journal=Nature|date=1999-12-02|first=|last=|coauthors=Dunham I, Shimizu N, Roe BA, Chissoe S, Hunt AR, Collins JE, Bruskiewich R, Beare DM, Clamp M, Smink LJ, Ainscough R, Almeida JP, Babbage A, Bagguley C, Bailey J, Barlow K, Bates KN, Beasley O, Bird CP, Blakey S, Bridgeman AM, Buck D, Burgess J, Burrill WD, O'Brien KP, et al.|volume=402|issue=6761|pages=489-495|id=PMID 10591208 |url=|format=|accessdate=2007-10-08 }}</ref>.


*[[2000 in science|2000]] Celera and collaborators sequence fruit fly ''Drosophila melanogaster'' (genome size of 180Mb) - validation of Venter's shotgun method. HGP and Celera debate issues related to data release.
This sequencing method and equipment were used to sequence the genome of the [[M13 bacteriophage]].<ref>{{cite journal|last=Harris|first=TD|coauthors=Buzby, PR; Babcock, H; Beer, E; Bowers, J; Braslavsky, I; Causey, M; Colonell, J; Dimeo, J; Efcavitch, JW; Giladi, E; Gill, J; Healy, J; Jarosz, M; Lapen, D; Moulton, K; Quake, SR; Steinmann, K; Thayer, E; Tyurina, A; Ward, R; Weiss, H; Xie, Z|title=Single-molecule DNA sequencing of a viral genome.|journal=Science|date=Apr 4, 2008|volume=320|issue=5872|pages=106–9|pmid=18388294|doi=10.1126/science.1150427|bibcode = 2008Sci...320..106H }}</ref>
**HGP consortium publishes sequence of chromosome 21.<ref> Hattori, M.,  A. Fujiyama, T. D. Taylor, H. Watanabe, T. Yada, H.-S. Park, A. Toyoda, K. Ishii, Y. Totoki, D.-K. Choi, E. Soeda, M. Ohki, T. Takagi, Y. Sakaki; S. Taudienk, K. Blechschmidtk, A. Polleyk, U. Menzelk, J. Delabar, K. Kumpfk, R. Lehmannk, D. Patterson, K. Reichwaldk, A. Rumpk, M. Schillhabelk, A. Schudyk, W. Zimmermannk, A. Rosenthalk; J. KudohI, K. ShibuyaI, K. KawasakiI, S. AsakawaI, A. ShintaniI, T. SasakiI, K. NagamineI, S. MitsuyamaI, S. E. Antonarakis, S. MinoshimaI, N. ShimizuI, G. Nordsiek, K. Hornischer, P. Brandt, M. Scharfe, O. SchoÈn, A. Desario, J. Reichelt, G. Kauer, H. Bloecker; J. Ramser, A. Beck, S. Klages, S. Hennig, L. Riesselmann, E. Dagand, T. Haaf, S. Wehrmeyer, K. Borzym, K. Gardiner, D. Nizetickk, F. Francis, H. Lehrach, R. Reinhardt, and M.-L. Yaspo, (2000). The DNA sequence of human chromosome 21. Nature 405: 311-319.</ref>
**HGP & Celera jointly announce working drafts of HG sequence, promise joint publication.
**Estimates for the number of genes in the human genome range from 35,000 to 120,000. International consortium completes first plant sequence, ''Arabidopsis thaliana'' (genome size of 125 Mb).


*[[2001 in science|2001]] HGP consortium publishes Human Genome Sequence draft in Nature (15 Feb)<ref> {{cite journal|title=Initial sequencing and analysis of the human genome|journal=Nature|date=2001-02-15|first=|last=|coauthors=HGP consortium|volume=409|issue=6822|pages=860-921|id=PMID 11237011 |url=|format=|accessdate=2007-10-08 }}</ref>.
===Single Molecule Real Time (SMRT) sequencing===
**Celera publishes the Human Genome sequence<ref> {{cite journal|title=The sequence of the human genome.|journal=Science|date=2001-02-16|first=J. C.|last=Venter|coauthors=et al|volume=291|issue=5507|pages=1304-51|id=PMID 11181995 {{doi|10.1126/science.1058040}}|url=|format=|accessdate=2007-10-08 }}</ref>.


*[[2005 in science|2005]] 420,000 VariantSEQr human resequencing primer sequences published on new NCBI Probe database.
SMRT sequencing is based on the sequencing by synthesis approach. The DNA is synthesized in zero-mode wave-guides (ZMWs)&nbsp;– small well-like containers with the capturing tools located at the bottom of the well. The sequencing is performed with use of unmodified polymerase (attached to the ZMW bottom) and fluorescently labelled nucleotides flowing freely in the solution. The wells are constructed in a way that only the fluorescence occurring by the bottom of the well is detected. The fluorescent label is detached from the nucleotide at its incorporation into the DNA strand, leaving an unmodified DNA strand. According to [[Pacific Biosciences]], the SMRT technology developer, this methodology allows detection of nucleotide modifications (such as cytosine methylation). This happens through the observation of polymerase kinetics. This approach allows reads of 20,000 nucleotides or more, with average read lengths of 5 kilobases.<ref name="flxlexblog.wordpress.com"/><ref>[http://www.genomeweb.com/sequencing/pacbio-sales-start-pick-company-delivers-product-enhancements PacBio Sales Start to Pick Up as Company Delivers on Product Enhancements | In Sequence | Sequencing | GenomeWeb<!-- Bot generated title -->]</ref>


*[[2007 in science|2007]] For the first time, a set of closely related species (12 Drosophilidae) are sequenced, launching the era of phylogenomics.
==Methods in Development==
** Craig Venter publishes his full diploid genome: the first human genome to be sequenced completely.


== See also ==
DNA sequencing methods currently under development include labeling the DNA polymerase,<ref>{{cite web|url=http://visigenbio.com/technology_overview.html|title=VisiGen Biotechnologies Inc. – Technology Overview |publisher=Visigenbio.com |accessdate=2009-11-15}}</ref> reading the sequence as a DNA strand transits through [[nanopore sequencing|nanopores]],<ref>{{cite web|url=http://mcb.harvard.edu/branton/index.htm |title=The Harvard Nanopore Group |publisher=Mcb.harvard.edu|accessdate=2009-11-15}}</ref><ref name="Physorg">{{cite web |url=http://www.physorg.com/news157378086.html |title=Nanopore Sequencing Could Slash DNA Analysis Costs |work= |accessdate=}}</ref> and microscopy-based techniques, such as [[Atomic force microscope|atomic force microscopy]] or [[Transmission electron microscopy DNA sequencing|transmission electron microscopy]] that are used to identify the positions of individual nucleotides within long DNA fragments (&gt;5,000 bp) by nucleotide labeling with heavier elements (e.g., halogens) for visual detection and recording.<ref>{{US patent reference
* [[Sequencing]]
|number=20060029957
* [[Genome project]] - how entire genomes are assembled from these short sequences.
|y=2005
* [[Applied Biosystems]] - provided most of the chemistry and equipment for the genome projects. Next-generation technology for very high data generation rates.
|m=07
* [[454 Life Sciences]] - company specializing in high-throughput DNA sequencing using a sequencing-by-synthesis approach.
|d=14 |inventor=ZS Genetics
* [[Illumina (company)]] - Advancing genetic analysis one billion bases at a time; whole genome sequencing.
|title=Systems and methods of analyzing nucleic acid polymers and related components
* [[Joint Genome Institute]] - sequencing center from the [[United States Department of Energy|US Department of Energy]] whose mission is to provide integrated high-throughput sequencing and computational analysis to enable genomic-scale/systems-based scientific approaches to DOE-relevant challenges in energy and the environment.
}}</ref><ref>{{cite journal |author=Xu M, Fujita D, Hanagata N |title=Perspectives and challenges of emerging single-molecule DNA sequencing technologies|journal=Small |volume=5 |issue=23 |pages=2638–49 |date=December 2009 |pmid=19904762 |doi=10.1002/smll.200900976 |url=}}</ref>
Third generation technologies aim to increase throughput and decrease the time to result and cost by eliminating the need for excessive reagents and harnessing the processivity of DNA polymerase.<ref>{{cite journal|last=Schadt|first=E.E.|author2=S. Turner|author3=A. Kasarskis|title=A window into third-generation sequencing|journal=Human Molecular Genetics|year=2010|volume=19|issue=R2|pages=R227–40|doi=10.1093/hmg/ddq416|pmid=20858600}}</ref>
 
===Nanopore DNA Sequencing===
 
This method is based on the readout of electrical signals occurring at nucleotides passing by alpha-[[hemolysin]] pores covalently bound with [[cyclodextrin]]. The DNA passing through the nanopore changes its ion current. This change is dependent on the shape, size and length of the DNA sequence. Each type of the nucleotide blocks the ion flow through the pore for a different period of time. The method has a potential of development as it does not require modified nucleotides, however single nucleotide resolution is not yet available.<ref>{{cite journal|last=Stoddart|first=D|coauthors=Heron, AJ; Mikhailova, E; Maglia, G; Bayley, H|title=Single-nucleotide discrimination in immobilized DNA oligonucleotides with a biological nanopore.|journal=Proceedings of the National Academy of Sciences of the United States of America|date=May 12, 2009|volume=106|issue=19|pages=7702–7|pmid=19380741|doi=10.1073/pnas.0901054106|pmc=2683137|bibcode = 2009PNAS..106.7702S }}</ref>
 
Two main areas of nanopore sequencing in development are solid state nanopore sequencing, and protein based nanopore sequencing. Protein nanopore sequencing utilizes membrane protein complexes  ∝-[[Hemolysin]] and MspA (Mycobacterium Smegmatis Porin A), which show great promise given their ability to distinguish between individual and groups of nucleotides.<ref name="Torre 2012">{{Cite pmid|22948520}}</ref> Whereas, solid-state nanopore sequencing utilizes synthetic materials such as silicon nitride and aluminum oxide and it is preferred for its superior mechanical ability and thermal and chemical stability.<ref name="Pathak 2012">Pathak, B., Lofas, H., Prasongkit, J., Grigoriev, A., Ahuja, R., & Scheicher, R. H. (January 09, 2012). Double-functionalized nanopore-embedded gold electrodes for rapid DNA sequencing. Applied Physics Letters, 100, 2.)</ref> The fabrication method is essential for this type of sequencing given that the nanopore array can contain hundreds of pores with diameters smaller than eight nanometers.<ref name="Torre 2012"/>
 
The concept originated from the idea that single stranded DNA or RNA molecules can be electrophoretically driven in a strict linear sequence through a biological pore that can be less than eight nanometers, and can be detected given that the molecules release an ionic current while moving through the pore.  The pore contains a detection region capable of recognizing different bases, with each base generating various time specific signals corresponding to the sequence of bases as they cross the pore which are then evaluated.<ref name="Pathak 2012"/> When implementing this process it is important to note that precise control over the DNA transport through the pore is crucial for success. Various enzymes such as exonucleases and polymerases have been used to moderate this process by positioning them near the pore’s entrance.<ref name="Korlach 2008">{{Cite pmid|18216253}}</ref>
 
===Tunnelling Currents DNA Sequencing===
 
Another approach uses measurements of the electrical tunnelling currents across single-strand DNA as it moves through a channel. Depending on its electronic structure each base affects the tunnelling current differently, allowing differentiation between different bases.<ref>Massimiliano Di Ventra (2013). [http://iopscience.iop.org/0957-4484/24/34/342501/ "Fast DNA sequencing by electrical means inches closer"]. ''Nanotechnology'' 24 342501</ref>
 
The use of tunnelling currents has the potential to sequence orders of magnitude faster than ionic current methods and the sequencing of several DNA oligomers and micro-RNA has already been achieved.<ref>Ohshiro T et al 2012 Sci. Rep. 2 501–7</ref>
 
===Sequencing by hybridization===
 
''[[Sequencing by hybridization]]'' is a non-enzymatic method that uses a [[DNA microarray]]. A single pool of DNA whose sequence is to be determined is fluorescently labeled and hybridized to an array containing known sequences. Strong hybridization signals from a given spot on the array identifies its sequence in the DNA being sequenced.<ref>{{cite journal |author=Hanna GJ |title=Comparison of Sequencing by Hybridization and Cycle Sequencing for Genotyping of Human Immunodeficiency Virus Type 1 Reverse Transcriptase |journal=J. Clin. Microbiol. |volume=38 |issue=7 |pages=2715–21 |date=1 July 2000|pmid=10878069 |pmc=87006 |url=http://jcm.asm.org/cgi/pmidlookup?view=long&pmid=10878069 |author-separator=, |author2=Johnson VA |author3=Kuritzkes DR |display-authors=3 |last4=Richman |first4=DD |last5=Martinez-Picado |first5=J |last6=Sutton |first6=L |last7=Hazelwood |first7=JD |last8=d'Aquila |first8=RT }}</ref>
 
This method of sequencing utilizes binding characteristics of a library of short single stranded DNA molecules (oligonucleotides) also called DNA probes to reconstruct a target DNA sequence. Non-specific hybrids are removed by washing and the target DNA is eluted.<ref name="Morey">{{Cite pmid|23742747}}</ref> Hybrids are re-arranged such that the DNA sequence can be reconstructed. The benefit of this sequencing type is its ability to capture a large number of targets with a homogenous coverage.<ref name="Qin">{{Cite pmid|22574124}}</ref> Although a large number of chemicals and starting DNA is usually required. But, with the advent of solution based hybridization much less equipment and chemicals are necessary.<ref name="Morey"/>
 
=== Sequencing with mass spectrometry ===
[[Mass spectrometry]] may be used to determine DNA sequences. Matrix-assisted laser desorption ionization time-of-flight mass spectrometry, or [[Matrix-assisted laser desorption/ionization|MALDI-TOF MS]], has specifically been investigated as an alternative method to gel electrophoresis for visualizing DNA fragments. With this method, DNA fragments generated by chain-termination sequencing reactions are compared by mass rather than by size. The mass of each nucleotide is different from the others and this difference is detectable by mass spectrometry. Single-nucleotide mutations in a fragment can be more easily detected with MS than by gel electrophoresis alone. MALDI-TOF MS can more easily detect differences between RNA fragments, so researchers may indirectly sequence DNA with MS-based methods by converting it to RNA first.<ref>{{cite journal| title=Mass-spectrometry DNA sequencing| author= J.R. Edwards, H.Ruparel, and J. Ju| journal=Mutation Research| volume=573| issue=1–2| pages=3–12|year=2005| pmid=15829234| doi=10.1016/j.mrfmmm.2004.07.021}}</ref>
 
The higher resolution of DNA fragments permitted by MS-based methods is of special interest to researchers in forensic science, as they may wish to find [[single-nucleotide polymorphisms]] in human DNA samples to identify individuals. These samples may be highly degraded so forensic researchers often prefer [[mitochondrial DNA]] for its higher stability and applications for lineage studies. MS-based sequencing methods have been used to compare the sequences of human mitochondrial DNA from samples in a [[Federal Bureau of Investigation]] database<ref>{{cite journal|last=Hall|first=Thomas A.|coauthors=Budowle, Bruce; Jiang, Yun; Blyn, Lawrence; Eshoo, Mark; Sannes-Lowery, Kristin A.; Sampath, Rangarajan; Drader, Jared J.; Hannis, James C.; Harrell, Patina; Samant, Vivek; White, Neill; Ecker, David J.; Hofstadler, Steven A.|title=Base composition analysis of human mitochondrial DNA using electrospray ionization mass spectrometry: A novel tool for the identification and differentiation of humans|journal=Analytical Biochemistry|year=2005|volume=344|issue=1|pages=53–69|doi=10.1016/j.ab.2005.05.028|pmid=16054106}}</ref> and from bones found in mass graves of World War I soldiers.<ref>{{cite journal|last=Howard|first=R|coauthors=Encheva, V; Thomson, J; Bache, K; Chan, YT; Cowen, S; Debenham, P; Dixon, A; Krause, JU; Krishan, E; Moore, D; Moore, V; Ojo, M; Rodrigues, S; Stokes, P; Walker, J; Zimmermann, W; Barallon, R|title=Comparative analysis of human mitochondrial DNA from World War I bone samples by DNA sequencing and ESI-TOF mass spectrometry.|journal=Forensic science international. Genetics|date=Jun 15, 2011|pmid=21683667|doi=10.1016/j.fsigen.2011.05.009|volume=7|issue=1|pages=1–9}}</ref>
 
Early chain-termination and TOF MS methods demonstrated read lengths of up to 100 base pairs.<ref>{{cite journal|last=Monforte|first=Joseph A.|author2=Becker, Christopher H.|title=High-throughput DNA analysis by time-of-flight mass spectrometry|journal=Nature Medicine|date=1 March 1997|volume=3|issue=3|pages=360–362|doi=10.1038/nm0397-360|pmid=9055869}}</ref> Researchers have been unable to exceed this average read size; like chain-termination sequencing alone, MS-based DNA sequencing may not be suitable for large ''de novo'' sequencing projects. Even so, a recent study did use the short sequence reads and mass spectroscopy to compare single-nucleotide polymorphisms in pathogenic ''[[Streptococcus]]'' strains.<ref>{{cite journal|last=Beres|first=S. B.|coauthors=Carroll, R. K.; Shea, P. R.; Sitkiewicz, I.; Martinez-Gutierrez, J. C.; Low, D. E.; McGeer, A.; Willey, B. M.; Green, K.; Tyrrell, G. J.; Goldman, T. D.; Feldgarden, M.; Birren, B. W.; Fofanov, Y.; Boos, J.; Wheaton, W. D.; Honisch, C.; Musser, J. M.|title=Molecular complexity of successive bacterial epidemics deconvoluted by comparative pathogenomics|journal=Proceedings of the National Academy of Sciences|date=8 February 2010|volume=107|issue=9|pages=4371–4376|doi=10.1073/pnas.0911295107|bibcode = 2010PNAS..107.4371B }}</ref>
 
===Microfluidic Sanger Sequencing ===
 
In microfluidic Sanger sequencing the entire thermocycling amplification of DNA fragments as well as their separation by electrophoresis is done on a single glass wafer (approximately 10&nbsp;cm in diameter) thus reducing the reagent usage as well as cost.<ref>{{cite journal|last=Kan|first=Cheuk-Wai|coauthors=Fredlake, Christopher P.; Doherty, Erin A. S.; Barron, Annelise E.|title=DNA sequencing and genotyping in miniaturized electrophoresis systems|journal=Electrophoresis|date=1 November 2004|volume=25|issue=21–22|pages=3564–3588|doi=10.1002/elps.200406161|pmid=15565709}}</ref> In some instances researchers have shown that they can increase the throughput of conventional sequencing through the use of microchips.<ref>{{cite journal |author=Ying-Ja Chen, Eric E. Roller and Xiaohua Huang |title=DNA sequencing by denaturation: experimental proof of concept with an integrated fluidic device |journal=Lab on Chip |volume= 10|issue=10 |pages=1153–1159 |year=2010  |doi= 10.1039/b921417h |url=}}</ref> Research will still need to be done in order to make this use of technology effective.
 
===Microscopy-Based Techniques===
 
This approach directly visualizes the sequence of DNA molecules using electron microscopy. The first identification of DNA base pairs within intact DNA molecules by enzymatically incorporating modified bases, which contain atoms of increased atomic number, direct visualization and identification of individually labeled bases within a synthetic 3,272 base-pair DNA molecule and a 7,249 base-pair viral genome has been demonstrated.<ref>{{cite journal|last=Bell|first=DC|coauthors=Thomas, WK; Murtagh, KM; Dionne, CA; Graham, AC; Anderson, JE; Glover, WR|title=DNA Base Identification by Electron Microscopy.|journal=Microscopy and microanalysis : the official journal of Microscopy Society of America, Microbeam Analysis Society, Microscopical Society of Canada|date=Oct 9, 2012|pages=1–5|pmid=23046798|doi=10.1017/S1431927612012615|volume=18|issue=5|bibcode = 2012MiMic..18.1049B }}</ref>
 
===RNAP Sequencing===
 
This method is based on use of [[RNA polymerase]] (RNAP), which is attached to a [[polystyrene]] bead. One end of DNA to be sequenced is attached to another bead, with both beads being placed in optical traps. RNAP motion during transcription brings the beads in closer and their relative distance changes, which can then be recorded at a single nucleotide resolution. The sequence is deduced based on the four readouts with lowered concentrations of each of the four nucleotide types, similarly to the Sanger method.<ref>{{cite journal|last=Pareek|first=CS|coauthors=Smoczynski, R; Tretyn, A|title=Sequencing technologies and genome sequencing.|journal=Journal of applied genetics|date=November 2011|volume=52|issue=4|pages=413–35|pmid=21698376|doi=10.1007/s13353-011-0057-x|pmc=3189340}}</ref>
 
RNA polymerase is attached to one end of a polystyrene bead and the other end is attached to the distal end of a DNA fragment. Each bead is then stuck in to an optical trap that levitates the beads. The interactions between the RNAP and the DNA result in a change in the length of the DNA between the two beads. This change is the measured with precision resulting in a single base resolution on a single DNA molecule.  This is then repeated four times where each time there is a lower concentration of one of the four nucleotides, this shares some similarity with the primers used in the Sanger Sequencing method. A comparison is made between regions and sequence information is deduced by comparing the known sequence regions to the unknown sequence regions.<ref name="Pareek CS">{{Cite pmid|21698376}}</ref>
 
===''In vitro'' Virus High-Throughput Sequencing===
 
A method has been developed to analyze full sets of [[Interactome|protein interactions]] using a combination of 454 pyrosequencing and an  ''in vitro'' virus [[mRNA display]] method. Specifically, this method covalently links proteins of interest to the mRNAs encoding them, then detects the mRNA pieces using reverse transcription [[Polymerase chain reaction|PCR]]s. The mRNA may then be amplified and sequenced. The combined method was titled IVV-HiTSeq and can be performed under cell-free conditions, though its results may not be representative of ''in vivo'' conditions.<ref>{{cite journal|last=Fujimori|first=S|coauthors=Hirai, N; Ohashi, H; Masuoka, K; Nishikimi, A; Fukui, Y; Washio, T; Oshikubo, T; Yamashita, T; Miyamoto-Sato, E|title=Next-generation sequencing coupled with a cell-free display technology for high-throughput production of reliable interactome data.|journal=Scientific reports|year=2012|volume=2|pages=691|pmid=23056904|doi=10.1038/srep00691|pmc=3466446|bibcode = 2012NatSR...2E.691F }}</ref>
 
==Development Initiatives==
 
In October 2006, the [[X Prize Foundation]] established an initiative to promote the development of [[full genome sequencing]] technologies, called the [[Archon X Prize]], intending to award $10 million to "the first Team that can build a device and use it to sequence 100 human genomes within 10 days or less, with an accuracy of no more than one error in every 100,000 bases sequenced, with sequences accurately covering at least 98% of the genome, and at a recurring cost of no more than $10,000 (US) per genome."<ref>[http://genomics.xprize.org/ "PRIZE Overview: Archon X PRIZE for Genomics"]</ref>
 
Each year the [[National Human Genome Research Institute]], or NHGRI, promotes grants for new research and developments in [[genomics]]. 2010 grants and 2011 candidates include continuing work in microfluidic, polony and base-heavy sequencing methodologies.<ref>[http://www.dnasequencing.org/future-outlook The Future of DNA Sequencing]</ref>
 
==Computational Challenges==
 
The sequencing technologies described here produce raw data that needs to be assembled into longer sequences such as complete genomes ([[sequence assembly]]). There are many computational challenges to achieve this, such as the evaluation of the raw sequence data which is done by programs and algorithms such as [[Phred base calling|Phred]] and [[Phrap]]. Other challenges have to deal with [[Repetitive DNA|repetitive]] sequences that often prevent complete genome assemblies because they occur in many places of the genome. As a consequence, many sequences may not be assigned to particular [[chromosome]]s. The production of raw sequence data is only the beginning of its detailed [[Bioinformatics|bioinformatical]] analysis.<ref>Jessica Severin, Marina Lizio, Jayson Harshbarger, Hideya Kawaji, Carsten O Daub, Yoshihide Hayashizaki, the FANTOM consortium, Nicolas Bertin, and Alistair RR Forrest. Interactive visualization and analysis of large-scale NGS data-sets using ZENBU. Nature Biotechnology, March 2014 {{DOI|10.1038/nbt.2840}}</ref> Yet new methods for sequencing and correcting sequencing errors were developed<ref>{{cite paper |url= http://www.eng.tau.ac.il/~bengal/VOM_EST.pdf|title= Using a VOM Model for Reconstructing Potential Coding Regions in EST Sequences, |author= Shmilovici A. and Ben-Gal I.|publisher= Journal of Computational Statistics, vol. 22, no. 1, 49 - 69.|year=2007}}</ref> [http://www.eng.tau.ac.il/~bengal/VOM_EST.pdf]
 
===Read Trimming===
 
Sometimes, the raw reads produced by the sequencer are correct and precise only in a fraction of their length. Using the entire read may introduce artifacts in the downstream analyses like genome assembly, snp calling, or gene expression estimation. Two classes of trimming programs have been introduced, based on the window-based or the running-sum classes of algorithms.<ref>{{cite journal |author=Del Fabbro C, Scalabrin S, Morgante M and Giorgi FM |title=An Extensive Evaluation of Read Trimming Effects on Illumina NGS Data Analysis |journal=PLoS ONE |volume=8 |issue=12 |pages=e85024 |year=2013 |pmid=24376861  |pmc=3871669 |doi=10.1371/journal.pone.0085024 |url=}}</ref> This is a partial list of the trimming algorithms currently available, specifying the algorithm class they belong to:
* [https://code.google.com/p/cutadapt/ Cutadapt] Running sum
* [https://code.google.com/p/condetri/ ConDeTri] Window based
* [http://erne.sourceforge.net/ ERNE-FILTER] Running sum
* [http://hannonlab.cshl.edu/fastx_toolkit/download.html FASTX quality trimmer]        Window based
* [http://sourceforge.net/projects/prinseq/files/ PRINSEQ] Window based
* [http://www.usadellab.org/cms/index.php?page=trimmomatic Trimmomatic] Window based
* [http://sourceforge.net/projects/solexaqa/files/ SolexaQA] Window based
* [http://sourceforge.net/projects/solexaqa/files/ SolexaQA-BWA] Running sum
* [https://github.com/najoshi/sickle Sickle] Window based
 
==See Also==
 
{{col-begin}}   
{{col-2}}
* [[Cancer genome sequencing]]
* [[DNA field-effect transistor]]
* [[DNA field-effect transistor]]
* For a description of the basic technology for tagging DNA with high Z atoms for direct imaging using transmission electron microsopy [TEM] and sequencing strands of >10,000 bp per image captured, see  [http://zsgenetics.com/application/GenSeq/fundamentals.html High Z tagging technology].
* [[DNA sequencing theory]]
* [[DNA sequencer]]
* [[Genome project]]
* [[Jumping library]]
{{col-2}}
* [[Multiplex ligation-dependent probe amplification]]
* [[Sequence mining]]
* [[Sequence profiling tool]]
* [[Single molecule real time sequencing]]
* [[Transmission electron microscopy DNA sequencing]]
{{col-end}}
 
==References==


==Citations==
{{reflist|2}}
{{reflist|2}}


== External links ==
==External Links==


* [http://www.jgi.doe.gov/education/how/how30minflash.html DNA Sequencing: Dye Terminator Animation]
* A [http://en.wikibooks.org/wiki/Next_Generation_Sequencing_%28NGS%29 wikibook on next generation sequencing]
* [http://www.genomics.xprize.org Archon Genomics X PRIZE] - $10 million competition for fast and inexpensive sequencing technology
* A [http://www.selectscience.net/next_generation_sequencing_buying_guide.aspx guide on next generation sequencing technologies]
* A [http://omictools.com/ free didactic directory for DNA sequencing analysis.]


[[Category:Molecular biology]]
[[Category:Molecular biology]]
 
[[Category:Molecular biology techniques]]
[[de:DNA-Sequenzierung]]
[[Category:Biotechnology]]
[[es:Secuenciación de ADN]]
[[eo:DNA-vicrivelado]]
[[fr:Séquençage de l'ADN]]
[[id:Sekuensing]]
[[he:ריצוף DNA]]
[[nl:Sequenering]]
[[ja:DNAシークエンシング]]
[[pl:Sekwencjonowanie DNA]]
[[pt:Sequenciamento de DNA]]
[[ru:Секвенирование]]
[[sv:DNA-sekvensering]]
[[vi:Phương pháp Dideoxy]]
[[zh:測序]]
 
{{jb1}}
{{WikiDoc Help Menu}}
{{WikiDoc Sources}}

Latest revision as of 22:51, 5 June 2014

WikiDoc Resources for DNA sequencing

Articles

Most recent articles on DNA sequencing

Most cited articles on DNA sequencing

Review articles on DNA sequencing

Articles on DNA sequencing in N Eng J Med, Lancet, BMJ

Media

Powerpoint slides on DNA sequencing

Images of DNA sequencing

Photos of DNA sequencing

Podcasts & MP3s on DNA sequencing

Videos on DNA sequencing

Evidence Based Medicine

Cochrane Collaboration on DNA sequencing

Bandolier on DNA sequencing

TRIP on DNA sequencing

Clinical Trials

Ongoing Trials on DNA sequencing at Clinical Trials.gov

Trial results on DNA sequencing

Clinical Trials on DNA sequencing at Google

Guidelines / Policies / Govt

US National Guidelines Clearinghouse on DNA sequencing

NICE Guidance on DNA sequencing

NHS PRODIGY Guidance

FDA on DNA sequencing

CDC on DNA sequencing

Books

Books on DNA sequencing

News

DNA sequencing in the news

Be alerted to news on DNA sequencing

News trends on DNA sequencing

Commentary

Blogs on DNA sequencing

Definitions

Definitions of DNA sequencing

Patient Resources / Community

Patient resources on DNA sequencing

Discussion groups on DNA sequencing

Patient Handouts on DNA sequencing

Directions to Hospitals Treating DNA sequencing

Risk calculators and risk factors for DNA sequencing

Healthcare Provider Resources

Symptoms of DNA sequencing

Causes & Risk Factors for DNA sequencing

Diagnostic studies for DNA sequencing

Treatment of DNA sequencing

Continuing Medical Education (CME)

CME Programs on DNA sequencing

International

DNA sequencing en Espanol

DNA sequencing en Francais

Business

DNA sequencing in the Marketplace

Patents on DNA sequencing

Experimental / Informatics

List of terms related to DNA sequencing

Editor-In-Chief: C. Michael Gibson, M.S., M.D. [1]

Overview

An example of the results of automated chain-termination DNA sequencing.

DNA sequencing is the process of determining the precise order of nucleotides within a DNA molecule. It includes any method or technology that is used to determine the order of the four bases—adenine, guanine, cytosine, and thymine—in a strand of DNA. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.

Knowledge of DNA sequences has become indispensable for basic biological research, and in numerous applied fields such as diagnostic, biotechnology, forensic biology, and biological systematics. The rapid speed of sequencing attained with modern DNA sequencing technology has been instrumental in the sequencing of complete DNA sequences, or genomes of numerous types and species of life, including the human genome and other complete DNA sequences of many animal, plant, and microbial species.

The first DNA sequences were obtained in the early 1970s by academic researchers using laborious methods based on two-dimensional chromatography. Following the development of fluorescence-based sequencing methods with automated analysis,[1] DNA sequencing has become easier and orders of magnitude faster.[2]

Use of Sequencing

DNA sequencing may be used to determine the sequence of individual genes, larger genetic regions (i.e. clusters of genes or operons), full chromosomes or entire genomes. Sequencing provides the order of individual nucleotides in DNA or RNA (commonly represented as A, C, G, T, and U) isolated from cells of animals, plants, bacteria, archaea, or virtually any other source of genetic information. This is useful for:

  • Molecular biology - studying the genome itself, how proteins are made, what proteins are made, identifying new genes and associations with diseases and phenotypes, and identifying potential drug targets
  • Evolutionary biology - studying how different organisms are related and how they evolved
  • Metagenomics - Identifying species present in a body of water, sewage, dirt, debris filtred from the air, or swab samples of organisms. Helpful in ecology, epidemiology, microbiome research, and other fields.

Less-precise information is produced by non-sequencing techniques like DNA fingerprinting. This information may be easier to obtain and is useful for:

History

Though the structure of DNA was established as a double helix in 1953,[3] several decades would pass before fragments of DNA could be reliably analyzed for their sequence in the laboratory. RNA sequencing was one of the earliest forms of nucleotide sequencing. The major landmark of RNA sequencing is the sequence of the first complete gene and the complete genome of Bacteriophage MS2, identified and published by Walter Fiers and his coworkers at the University of Ghent (Ghent, Belgium), in 1972[4] and 1976.[5]

The first method for determining DNA sequences involved a location-specific primer extension strategy established by Ray Wu at Cornell University in 1970.[6] DNA polymerase catalysis and specific nucleotide labeling, both of which figure prominently in current sequencing schemes, were used to sequence the cohesive ends of lambda phage DNA[7][8][9] Between 1970 and 1973, Wu, R Padmanabhan and colleagues demonstrated that this method can be employed to determine any DNA sequence using synthetic location-specific primers.[10][11][12][13] Frederick Sanger then adopted this primer-extension strategy to develop more rapid DNA sequencing methods at the MRC Centre, Cambridge, UK and published a method for "DNA sequencing with chain-terminating inhibitors" in 1977.[14] Walter Gilbert and Allan Maxam at Harvard also developed sequencing methods, including one for "DNA sequencing by chemical degradation".[15][16] In 1973, Gilbert and Maxam reported the sequence of 24 basepairs using a method known as wandering-spot analysis.[17] Advancements in sequencing were aided by the concurrent development of recombinant DNA technology, allowing DNA samples to be isolated from sources other than viruses.

The first full DNA genome to be sequenced was that of bacteriophage φX174 in 1977.[18] Medical Research Council scientists deciphered the complete DNA sequence of the Epstein-Barr virus in 1984, finding it to be 170 thousand base-pairs long.

A non-radioactive method for transferring the DNA molecules of sequencing reaction mixtures onto an immobilizing matrix during electrophoresis was developed by Pohl and co-workers in the early 80’s.[19][20] Followed by the commercialization of the DNA sequencer “Direct-Blotting-Electrophoresis-System GATC 1500” by GATC Biotech, which was intensively used in the framework of the EU genome-sequencing programme, the complete DNA sequence of the yeast Saccharomyces cerevisiae chromosome II.[21] Leroy E. Hood's laboratory at the California Institute of Technology announced the first semi-automated DNA sequencing machine in 1986.[22] This was followed by Applied Biosystems' marketing of the first fully automated sequencing machine, the ABI 370, in 1987 and by Dupont's Genesis 2000[23] which used a novel fluorescent labeling technique enabling all four dideoxynucleotides to be identified in a single lane. By 1990, the U.S. National Institutes of Health (NIH) had begun large-scale sequencing trials on Mycoplasma capricolum, Escherichia coli, Caenorhabditis elegans, and Saccharomyces cerevisiae at a cost of US$0.75 per base. Meanwhile, sequencing of human cDNA sequences called expressed sequence tags began in Craig Venter's lab, an attempt to capture the coding fraction of the human genome.[24] In 1995, Venter, Hamilton Smith, and colleagues at The Institute for Genomic Research (TIGR) published the first complete genome of a free-living organism, the bacterium Haemophilus influenzae. The circular chromosome contains 1,830,137 bases and its publication in the journal Science[25] marked the first published use of whole-genome shotgun sequencing, eliminating the need for initial mapping efforts. By 2001, shotgun sequencing methods had been used to produce a draft sequence of the human genome.[26][27]

Several new methods for DNA sequencing were developed in the mid to late 1990s. These techniques comprise the first of the "next-generation" sequencing methods. In 1996, Pål Nyrén and his student Mostafa Ronaghi at the Royal Institute of Technology in Stockholm published their method of pyrosequencing.[28] A year later, Pascal Mayer and Laurent Farinelli submitted patents to the World Intellectual Property Organization describing DNA colony sequencing.[29] Lynx Therapeutics published and marketed "Massively parallel signature sequencing", or MPSS, in 2000. This method incorporated a parallelized, adapter/ligation-mediated, bead-based sequencing technology and served as the first commercially available "next-generation" sequencing method, though no DNA sequencers were sold to independent laboratories.[30] In 2004, 454 Life Sciences marketed a parallelized version of pyrosequencing.[31][32] The first version of their machine reduced sequencing costs 6-fold compared to automated Sanger sequencing, and was the second of the new generation of sequencing technologies, after MPSS.[33]

The large quantities of data produced by DNA sequencing have also required development of new methods and programs for sequence analysis. Phil Green and Brent Ewing of the University of Washington described their phred quality score for sequencer data analysis in 1998.[34]

Basic Methods

Maxam-Gilbert sequencing

Allan Maxam and Walter Gilbert published a DNA sequencing method in 1977 based on chemical modification of DNA and subsequent cleavage at specific bases.[15] Also known as chemical sequencing, this method allowed purified samples of double-stranded DNA to be used without further cloning. This method's use of radioactive labeling and its technical complexity discouraged extensive use after refinements in the Sanger methods had been made.

Maxam-Gilbert sequencing requires radioactive labeling at one 5' end of the DNA and purification of the DNA fragment to be sequenced. Chemical treatment then generates breaks at a small proportion of one or two of the four nucleotide bases in each of four reactions (G, A+G, C, C+T). The concentration of the modifying chemicals is controlled to introduce on average one modification per DNA molecule. Thus a series of labeled fragments is generated, from the radiolabeled end to the first "cut" site in each molecule. The fragments in the four reactions are electrophoresed side by side in denaturing acrylamide gels for size separation. To visualize the fragments, the gel is exposed to X-ray film for autoradiography, yielding a series of dark bands each corresponding to a radiolabeled DNA fragment, from which the sequence may be inferred.[15]

Chain-termination Methods

The chain-termination method developed by Frederick Sanger and coworkers in 1977 soon became the method of choice, owing to its relative ease and reliability.[14][35] When invented, the chain-terminator method used fewer toxic chemicals and lower amounts of radioactivity than the Maxam and Gilbert method. Because of its comparative ease, the Sanger method was soon automated and was the method used in the first generation of DNA sequencers.

Sanger sequencing is the method which prevailed from the 80's until the mid-2000s. Over that period, great advances were made in the technique, such as fluorescent labelling, capillary electrophoresis, and general automation. These developments allowed much more efficient sequencing, leading to lower costs. The Sanger method, in mass production form, is the technology which produced the first human genome in 2001, ushering in the age of genomics. However, later in the decade, radically different approaches reached the market, bringing the cost per genome down from $100 million in 2001 to $10,000 in 2011.[36]

Advanced Methods and de novo Sequencing

Large-scale sequencing often aims at sequencing very long DNA pieces, such as whole chromosomes, although large-scale sequencing can also be used to generate very large numbers of short sequences, such as found in phage display. For longer targets such as chromosomes, common approaches consist of cutting (with restriction enzymes) or shearing (with mechanical forces) large DNA fragments into shorter DNA fragments. The fragmented DNA may then be cloned into a DNA vector and amplified in a bacterial host such as Escherichia coli. Short DNA fragments purified from individual bacterial colonies are individually sequenced and assembled electronically into one long, contiguous sequence. Studies have shown that adding a size selection step to collect DNA fragments of uniform size can improve sequencing efficiency and accuracy of the genome assembly. In these studies, automated sizing has proven to be more reproducible and precise than manual gel sizing.[37][38][39]

The term "de novo sequencing" specifically refers to methods used to determine the sequence of DNA with no previously known sequence. De novo translates from Latin as "from the beginning". Gaps in the assembled sequence may be filled by primer walking. The different strategies have different tradeoffs in speed and accuracy; shotgun methods are often used for sequencing large genomes, but its assembly is complex and difficult, particularly with sequence repeats often causing gaps in genome assembly.

Most sequencing approaches use an in vitro cloning step to amplify individual DNA molecules, because their molecular detection methods are not sensitive enough for single molecule sequencing. Emulsion PCR[40] isolates individual DNA molecules along with primer-coated beads in aqueous droplets within an oil phase. A polymerase chain reaction (PCR) then coats each bead with clonal copies of the DNA molecule followed by immobilization for later sequencing. Emulsion PCR is used in the methods developed by Marguilis et al. (commercialized by 454 Life Sciences), Shendure and Porreca et al. (also known as "Polony sequencing") and SOLiD sequencing, (developed by Agencourt, later Applied Biosystems, now Life Technologies).[41][42][43]

Shotgun Sequencing

Shotgun sequencing is a sequencing method designed for analysis of DNA sequences longer than 1000 base pairs, up to and including entire chromosomes. This method requires the target DNA to be broken into random fragments. After sequencing individual fragments, the sequences can be reassembled on the basis of their overlapping regions.[44]

Bridge PCR

Another method for in vitro clonal amplification is bridge PCR, in which fragments are amplified upon primers attached to a solid surface[29][45][46] and form "DNA colonies" or "DNA clusters". This method is used in the Illumina Genome Analyzer sequencers. Single-molecule methods, such as that developed by Stephen Quake's laboratory (later commercialized by Helicos) are an exception: they use bright fluorophores and laser excitation to detect base addition events from individual DNA molecules fixed to a surface, eliminating the need for molecular amplification.[47]

Next-Generation Methods

The high demand for low-cost sequencing has driven the development of high-throughput sequencing (or next-generation sequencing) technologies that parallelize the sequencing process, producing thousands or millions of sequences concurrently.[48][49] High-throughput sequencing technologies are intended to lower the cost of DNA sequencing beyond what is possible with standard dye-terminator methods.[33] In ultra-high-throughput sequencing as many as 500,000 sequencing-by-synthesis operations may be run in parallel.[50][51][52]

Comparison of next-generation sequencing methods[53][54]
Method Single-molecule real-time sequencing (Pacific Bio) Ion semiconductor (Ion Torrent sequencing) Pyrosequencing (454) Sequencing by synthesis (Illumina) Sequencing by ligation (SOLiD sequencing) Chain termination (Sanger sequencing)
Read length 5,500 bp to 8,500 bp avg (10,000 bp N50); maximum read length >30,000 bases[55][56][57] up to 400 bp 700 bp 50 to 300 bp 50+35 or 50+50 bp 400 to 900 bp
Accuracy 99.999% consensus accuracy; 87% single-read accuracy[58] 98% 99.9% 98% 99.9% 99.9%
Reads per run 50,000 per SMRT cell, or ~400 megabases[59][60] up to 80 million 1 million up to 3 billion 1.2 to 1.4 billion N/A
Time per run 30 minutes to 2 hours[61] 2 hours 24 hours 1 to 10 days, depending upon sequencer and specified read length[62] 1 to 2 weeks 20 minutes to 3 hours
Cost per 1 million bases (in US$) $0.33-$1.00 $1 $10 $0.05 to $0.15 $0.13 $2400
Advantages Longest read length. Fast. Detects 4mC, 5mC, 6mA.[63] Less expensive equipment. Fast. Long read size. Fast. Potential for high sequence yield, depending upon sequencer model and desired application. Low cost per base. Long individual reads. Useful for many applications.
Disadvantages Moderate throughput. Equipment can be very expensive. Homopolymer errors. Runs are expensive. Homopolymer errors. Equipment can be very expensive. Requires high concentrations of DNA. Slower than other methods. Have issue sequencing palindromic sequence.[64] More expensive and impractical for larger sequencing projects.

Massively Parallel Signature Sequencing (MPSS)

The first of the next-generation sequencing technologies, massively parallel signature sequencing (or MPSS), was developed in the 1990s at Lynx Therapeutics, a company founded in 1992 by Sydney Brenner and Sam Eletr. MPSS was a bead-based method that used a complex approach of adapter ligation followed by adapter decoding, reading the sequence in increments of four nucleotides. This method made it susceptible to sequence-specific bias or loss of specific sequences. Because the technology was so complex, MPSS was only performed 'in-house' by Lynx Therapeutics and no DNA sequencing machines were sold to independent laboratories. Lynx Therapeutics merged with Solexa (later acquired by Illumina) in 2004, leading to the development of sequencing-by-synthesis, a simpler approach acquired from Manteia Predictive Medicine, which rendered MPSS obsolete. However, the essential properties of the MPSS output were typical of later "next-generation" data types, including hundreds of thousands of short DNA sequences. In the case of MPSS, these were typically used for sequencing cDNA for measurements of gene expression levels.[65]

Polony Sequencing

The Polony sequencing method, developed in the laboratory of George M. Church at Harvard, was among the first next-generation sequencing systems and was used to sequence a full genome in 2005. It combined an in vitro paired-tag library with emulsion PCR, an automated microscope, and ligation-based sequencing chemistry to sequence an E. coli genome at an accuracy of >99.9999% and a cost approximately 1/9 that of Sanger sequencing.[66] The technology was licensed to Agencourt Biosciences, subsequently spun out into Agencourt Personal Genomics, and eventually incorporated into the Applied Biosystems SOLiD platform, which is now owned by Life Technologies, which was recently bought by Thermo Fisher Scientific.

454 Pyrosequencing

A parallelized version of pyrosequencing was developed by 454 Life Sciences, which has since been acquired by Roche Diagnostics. The method amplifies DNA inside water droplets in an oil solution (emulsion PCR), with each droplet containing a single DNA template attached to a single primer-coated bead that then forms a clonal colony. The sequencing machine contains many picoliter-volume wells each containing a single bead and sequencing enzymes. Pyrosequencing uses luciferase to generate light for detection of the individual nucleotides added to the nascent DNA, and the combined data are used to generate sequence read-outs.[41] This technology provides intermediate read length and price per base compared to Sanger sequencing on one end and Solexa and SOLiD on the other.[33]

Illumina (Solexa) Sequencing

Solexa, now part of Illumina, was founded by Shankar Balasubramanian and David Klenerman in 1998, and developed a sequencing method based on reversible dye-terminators technology, and engineered polymerases.[67] The terminated chemistry was developed internally at Solexa and the concept of the Solexa system was invented by Balasubramanian and Klenerman from Cambridge University's chemistry department. In 2004, Solexa acquired the company Manteia Predictive Medicine in order to gain a massivelly parallel sequencing technology based on "DNA Clusters", which involves the clonal amplification of DNA on a surface. The cluster technology was co-acquired with Lynx Therapeutics of California. Solexa Ltd. later merged with Lynx to form Solexa Inc.

In this method, DNA molecules and primers are first attached on a slide and amplified with polymerase so that local clonal DNA colonies, later coined "DNA clusters", are formed. To determine the sequence, four types of reversible terminator bases (RT-bases) are added and non-incorporated nucleotides are washed away. A camera takes images of the fluorescently labeled nucleotides, then the dye, along with the terminal 3' blocker, is chemically removed from the DNA, allowing for the next cycle to begin. Unlike pyrosequencing, the DNA chains are extended one nucleotide at a time and image acquisition can be performed at a delayed moment, allowing for very large arrays of DNA colonies to be captured by sequential images taken from a single camera.

Decoupling the enzymatic reaction and the image capture allows for optimal throughput and theoretically unlimited sequencing capacity. With an optimal configuration, the ultimately reachable instrument throughput is thus dictated solely by the analog-to-digital conversion rate of the camera, multiplied by the number of cameras and divided by the number of pixels per DNA colony required for visualizing them optimally (approximately 10 pixels/colony). In 2012, with cameras operating at more than 10 MHz A/D conversion rates and available optics, fluidics and enzymatics, throughput can be multiples of 1 million nucleotides/second, corresponding roughly to 1 human genome equivalent at 1x coverage per hour per instrument, and 1 human genome re-sequenced (at approx. 30x) per day per instrument (equipped with a single camera).[68]

SOLiD Sequencing

Applied Biosystems (now a Life Technologies brand) SOLiD technology employs sequencing by ligation. Here, a pool of all possible oligonucleotides of a fixed length are labeled according to the sequenced position. Oligonucleotides are annealed and ligated; the preferential ligation by DNA ligase for matching sequences results in a signal informative of the nucleotide at that position. Before sequencing, the DNA is amplified by emulsion PCR. The resulting beads, each containing single copies of the same DNA molecule, are deposited on a glass slide.[69] The result is sequences of quantities and lengths comparable to Illumina sequencing.[33] This sequencing by ligation method has been reported to have some issue sequencing palindromic sequences.[64]

Ion Torrent Semiconductor Sequencing

Ion Torrent Systems Inc. (now owned by Life Technologies) developed a system based on using standard sequencing chemistry, but with a novel, semiconductor based detection system. This method of sequencing is based on the detection of hydrogen ions that are released during the polymerisation of DNA, as opposed to the optical methods used in other sequencing systems. A microwell containing a template DNA strand to be sequenced is flooded with a single type of nucleotide. If the introduced nucleotide is complementary to the leading template nucleotide it is incorporated into the growing complementary strand. This causes the release of a hydrogen ion that triggers a hypersensitive ion sensor, which indicates that a reaction has occurred. If homopolymer repeats are present in the template sequence multiple nucleotides will be incorporated in a single cycle. This leads to a corresponding number of released hydrogens and a proportionally higher electronic signal.[70]

DNA Nanoball Sequencing

DNA nanoball sequencing is a type of high throughput sequencing technology used to determine the entire genomic sequence of an organism. The company Complete Genomics uses this technology to sequence samples submitted by independent researchers. The method uses rolling circle replication to amplify small fragments of genomic DNA into DNA nanoballs. Unchained sequencing by ligation is then used to determine the nucleotide sequence.[71] This method of DNA sequencing allows large numbers of DNA nanoballs to be sequenced per run and at low reagent costs compared to other next generation sequencing platforms.[72] However, only short sequences of DNA are determined from each DNA nanoball which makes mapping the short reads to a reference genome difficult.[73] This technology has been used for multiple genome sequencing projects and is scheduled to be used for more.[74]

Heliscope Single Molecule Sequencing

Heliscope sequencing is a method of single-molecule sequencing developed by Helicos Biosciences. It uses DNA fragments with added poly-A tail adapters which are attached to the flow cell surface. The next steps involve extension-based sequencing with cyclic washes of the flow cell with fluorescently labeled nucleotides (one nucleotide type at a time, as with the Sanger method). The reads are performed by the Heliscope sequencer. The reads are short, up to 55 bases per run, but recent improvements allow for more accurate reads of stretches of one type of nucleotides.[75][76]

This sequencing method and equipment were used to sequence the genome of the M13 bacteriophage.[77]

Single Molecule Real Time (SMRT) sequencing

SMRT sequencing is based on the sequencing by synthesis approach. The DNA is synthesized in zero-mode wave-guides (ZMWs) – small well-like containers with the capturing tools located at the bottom of the well. The sequencing is performed with use of unmodified polymerase (attached to the ZMW bottom) and fluorescently labelled nucleotides flowing freely in the solution. The wells are constructed in a way that only the fluorescence occurring by the bottom of the well is detected. The fluorescent label is detached from the nucleotide at its incorporation into the DNA strand, leaving an unmodified DNA strand. According to Pacific Biosciences, the SMRT technology developer, this methodology allows detection of nucleotide modifications (such as cytosine methylation). This happens through the observation of polymerase kinetics. This approach allows reads of 20,000 nucleotides or more, with average read lengths of 5 kilobases.[59][78]

Methods in Development

DNA sequencing methods currently under development include labeling the DNA polymerase,[79] reading the sequence as a DNA strand transits through nanopores,[80][81] and microscopy-based techniques, such as atomic force microscopy or transmission electron microscopy that are used to identify the positions of individual nucleotides within long DNA fragments (>5,000 bp) by nucleotide labeling with heavier elements (e.g., halogens) for visual detection and recording.[82][83] Third generation technologies aim to increase throughput and decrease the time to result and cost by eliminating the need for excessive reagents and harnessing the processivity of DNA polymerase.[84]

Nanopore DNA Sequencing

This method is based on the readout of electrical signals occurring at nucleotides passing by alpha-hemolysin pores covalently bound with cyclodextrin. The DNA passing through the nanopore changes its ion current. This change is dependent on the shape, size and length of the DNA sequence. Each type of the nucleotide blocks the ion flow through the pore for a different period of time. The method has a potential of development as it does not require modified nucleotides, however single nucleotide resolution is not yet available.[85]

Two main areas of nanopore sequencing in development are solid state nanopore sequencing, and protein based nanopore sequencing. Protein nanopore sequencing utilizes membrane protein complexes ∝-Hemolysin and MspA (Mycobacterium Smegmatis Porin A), which show great promise given their ability to distinguish between individual and groups of nucleotides.[86] Whereas, solid-state nanopore sequencing utilizes synthetic materials such as silicon nitride and aluminum oxide and it is preferred for its superior mechanical ability and thermal and chemical stability.[87] The fabrication method is essential for this type of sequencing given that the nanopore array can contain hundreds of pores with diameters smaller than eight nanometers.[86]

The concept originated from the idea that single stranded DNA or RNA molecules can be electrophoretically driven in a strict linear sequence through a biological pore that can be less than eight nanometers, and can be detected given that the molecules release an ionic current while moving through the pore. The pore contains a detection region capable of recognizing different bases, with each base generating various time specific signals corresponding to the sequence of bases as they cross the pore which are then evaluated.[87] When implementing this process it is important to note that precise control over the DNA transport through the pore is crucial for success. Various enzymes such as exonucleases and polymerases have been used to moderate this process by positioning them near the pore’s entrance.[88]

Tunnelling Currents DNA Sequencing

Another approach uses measurements of the electrical tunnelling currents across single-strand DNA as it moves through a channel. Depending on its electronic structure each base affects the tunnelling current differently, allowing differentiation between different bases.[89]

The use of tunnelling currents has the potential to sequence orders of magnitude faster than ionic current methods and the sequencing of several DNA oligomers and micro-RNA has already been achieved.[90]

Sequencing by hybridization

Sequencing by hybridization is a non-enzymatic method that uses a DNA microarray. A single pool of DNA whose sequence is to be determined is fluorescently labeled and hybridized to an array containing known sequences. Strong hybridization signals from a given spot on the array identifies its sequence in the DNA being sequenced.[91]

This method of sequencing utilizes binding characteristics of a library of short single stranded DNA molecules (oligonucleotides) also called DNA probes to reconstruct a target DNA sequence. Non-specific hybrids are removed by washing and the target DNA is eluted.[92] Hybrids are re-arranged such that the DNA sequence can be reconstructed. The benefit of this sequencing type is its ability to capture a large number of targets with a homogenous coverage.[93] Although a large number of chemicals and starting DNA is usually required. But, with the advent of solution based hybridization much less equipment and chemicals are necessary.[92]

Sequencing with mass spectrometry

Mass spectrometry may be used to determine DNA sequences. Matrix-assisted laser desorption ionization time-of-flight mass spectrometry, or MALDI-TOF MS, has specifically been investigated as an alternative method to gel electrophoresis for visualizing DNA fragments. With this method, DNA fragments generated by chain-termination sequencing reactions are compared by mass rather than by size. The mass of each nucleotide is different from the others and this difference is detectable by mass spectrometry. Single-nucleotide mutations in a fragment can be more easily detected with MS than by gel electrophoresis alone. MALDI-TOF MS can more easily detect differences between RNA fragments, so researchers may indirectly sequence DNA with MS-based methods by converting it to RNA first.[94]

The higher resolution of DNA fragments permitted by MS-based methods is of special interest to researchers in forensic science, as they may wish to find single-nucleotide polymorphisms in human DNA samples to identify individuals. These samples may be highly degraded so forensic researchers often prefer mitochondrial DNA for its higher stability and applications for lineage studies. MS-based sequencing methods have been used to compare the sequences of human mitochondrial DNA from samples in a Federal Bureau of Investigation database[95] and from bones found in mass graves of World War I soldiers.[96]

Early chain-termination and TOF MS methods demonstrated read lengths of up to 100 base pairs.[97] Researchers have been unable to exceed this average read size; like chain-termination sequencing alone, MS-based DNA sequencing may not be suitable for large de novo sequencing projects. Even so, a recent study did use the short sequence reads and mass spectroscopy to compare single-nucleotide polymorphisms in pathogenic Streptococcus strains.[98]

Microfluidic Sanger Sequencing

In microfluidic Sanger sequencing the entire thermocycling amplification of DNA fragments as well as their separation by electrophoresis is done on a single glass wafer (approximately 10 cm in diameter) thus reducing the reagent usage as well as cost.[99] In some instances researchers have shown that they can increase the throughput of conventional sequencing through the use of microchips.[100] Research will still need to be done in order to make this use of technology effective.

Microscopy-Based Techniques

This approach directly visualizes the sequence of DNA molecules using electron microscopy. The first identification of DNA base pairs within intact DNA molecules by enzymatically incorporating modified bases, which contain atoms of increased atomic number, direct visualization and identification of individually labeled bases within a synthetic 3,272 base-pair DNA molecule and a 7,249 base-pair viral genome has been demonstrated.[101]

RNAP Sequencing

This method is based on use of RNA polymerase (RNAP), which is attached to a polystyrene bead. One end of DNA to be sequenced is attached to another bead, with both beads being placed in optical traps. RNAP motion during transcription brings the beads in closer and their relative distance changes, which can then be recorded at a single nucleotide resolution. The sequence is deduced based on the four readouts with lowered concentrations of each of the four nucleotide types, similarly to the Sanger method.[102]

RNA polymerase is attached to one end of a polystyrene bead and the other end is attached to the distal end of a DNA fragment. Each bead is then stuck in to an optical trap that levitates the beads. The interactions between the RNAP and the DNA result in a change in the length of the DNA between the two beads. This change is the measured with precision resulting in a single base resolution on a single DNA molecule. This is then repeated four times where each time there is a lower concentration of one of the four nucleotides, this shares some similarity with the primers used in the Sanger Sequencing method. A comparison is made between regions and sequence information is deduced by comparing the known sequence regions to the unknown sequence regions.[103]

In vitro Virus High-Throughput Sequencing

A method has been developed to analyze full sets of protein interactions using a combination of 454 pyrosequencing and an in vitro virus mRNA display method. Specifically, this method covalently links proteins of interest to the mRNAs encoding them, then detects the mRNA pieces using reverse transcription PCRs. The mRNA may then be amplified and sequenced. The combined method was titled IVV-HiTSeq and can be performed under cell-free conditions, though its results may not be representative of in vivo conditions.[104]

Development Initiatives

In October 2006, the X Prize Foundation established an initiative to promote the development of full genome sequencing technologies, called the Archon X Prize, intending to award $10 million to "the first Team that can build a device and use it to sequence 100 human genomes within 10 days or less, with an accuracy of no more than one error in every 100,000 bases sequenced, with sequences accurately covering at least 98% of the genome, and at a recurring cost of no more than $10,000 (US) per genome."[105]

Each year the National Human Genome Research Institute, or NHGRI, promotes grants for new research and developments in genomics. 2010 grants and 2011 candidates include continuing work in microfluidic, polony and base-heavy sequencing methodologies.[106]

Computational Challenges

The sequencing technologies described here produce raw data that needs to be assembled into longer sequences such as complete genomes (sequence assembly). There are many computational challenges to achieve this, such as the evaluation of the raw sequence data which is done by programs and algorithms such as Phred and Phrap. Other challenges have to deal with repetitive sequences that often prevent complete genome assemblies because they occur in many places of the genome. As a consequence, many sequences may not be assigned to particular chromosomes. The production of raw sequence data is only the beginning of its detailed bioinformatical analysis.[107] Yet new methods for sequencing and correcting sequencing errors were developed[108] [2]

Read Trimming

Sometimes, the raw reads produced by the sequencer are correct and precise only in a fraction of their length. Using the entire read may introduce artifacts in the downstream analyses like genome assembly, snp calling, or gene expression estimation. Two classes of trimming programs have been introduced, based on the window-based or the running-sum classes of algorithms.[109] This is a partial list of the trimming algorithms currently available, specifying the algorithm class they belong to:

See Also

Template:Col-2Template:Col-2

References

  1. Olsvik, Ørjan; Wahlberg, Johan; Petterson B; et al. (January 1993). "Use of automated sequencing of polymerase chain reaction-generated amplicons to identify three types of cholera toxin subunit B in Vibrio cholerae O1 strains". J. Clin. Microbiol. 31 (1): 22–25. PMC 262614. PMID 7678018. Unknown parameter |author-separator= ignored (help)open access publication – free to read
  2. Pettersson E, Lundeberg J, Ahmadian A (February 2009). "Generations of sequencing technologies". Genomics. 93 (2): 105–11. doi:10.1016/j.ygeno.2008.10.003. PMID 18992322.
  3. Watson JD, Crick FH (1953). "The structure of DNA". Cold Spring Harb. Symp. Quant. Biol. 18: 123–31. doi:10.1101/SQB.1953.018.01.020. PMID 13168976.
  4. Min Jou W, Haegeman G, Ysebaert M, Fiers W; Haegeman; Ysebaert; Fiers (May 1972). "Nucleotide sequence of the gene coding for the bacteriophage MS2 coat protein". Nature. 237 (5350): 82–8. Bibcode:1972Natur.237...82J. doi:10.1038/237082a0. PMID 4555447.
  5. Fiers W; Contreras R; Duerinck F; et al. (April 1976). "Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene". Nature. 260 (5551): 500–7. Bibcode:1976Natur.260..500F. doi:10.1038/260500a0. PMID 1264203. Unknown parameter |author-separator= ignored (help)
  6. . Cornell University http://mbg.cornell.edu/faculty-staff/faculty/wu.cfm. Missing or empty |title= (help)
  7. PADMANABHAN, R; Ray Wu; Ernest Jay (June 1974). "Chemical Synthesis of a Primer and Its Use in the Sequence Analysis of the Lysozyme Gene of Bacteriophage T4". Proceedings of the National Academy of Sciences. 71 (6): 2510–2514. doi:10.1073/pnas.71.6.2510. |access-date= requires |url= (help)
  8. Onaga, Lisa (June 2014). "Ray Wu as Fifth Business: Demonstrating Collective Memory in the History of DNA Sequencing". Studies in the History and Philosophy of Science. Part C. 46: 1–14. doi:10.1016/j.shpsc.2013.12.006. Retrieved May 7, 2014.
  9. Wu, Ray (19 April 1972). "Nucleotide Sequence Analysis of DNA". Nature: 198–200. doi:10.1038/newbio236198a0. Retrieved May 7, 2014.
  10. Padmanabhan, R; Ray Wu (1972). "Use of oligonucleotides of defined sequences as primers in DNA sequence analysis". Biochemical and Biophysical Research. 1295-1302. 48. |access-date= requires |url= (help)
  11. . Cornell http://mbg.cornell.edu/faculty-staff/faculty/wu.cfm. Retrieved May 7, 2014. Missing or empty |title= (help)
  12. Wu, R; Padmanabhan (1973). "R". Biochemical and Biophysics Research. 55: 1092–1098. |access-date= requires |url= (help)
  13. Jay, Ernest; Ray Wu; R Padmanabhan; Robert Bambara (March 1974). "DNA sequence analysis: a general, simple and rapid method for sequencing large oligodeoxyribonucleotide fragments by mapping". Nucleic Acids Research. 1: 331–353. doi:10.1093/nar/1.3.331. PMC 344020. |access-date= requires |url= (help)
  14. 14.0 14.1 Sanger F, Nicklen S, Coulson AR; Nicklen; Coulson (December 1977). "DNA sequencing with chain-terminating inhibitors". Proc. Natl. Acad. Sci. U.S.A. 74 (12): 5463–7. Bibcode:1977PNAS...74.5463S. doi:10.1073/pnas.74.12.5463. PMC 431765. PMID 271968.
  15. 15.0 15.1 15.2 Maxam AM, Gilbert W; Gilbert (February 1977). "A new method for sequencing DNA". Proc. Natl. Acad. Sci. U.S.A. 74 (2): 560–4. Bibcode:1977PNAS...74..560M. doi:10.1073/pnas.74.2.560. PMC 392330. PMID 265521.
  16. Gilbert, W. DNA sequencing and gene structure. Nobel lecture, 8 December 1980.
  17. Gilbert W, Maxam A; Maxam (December 1973). "The Nucleotide Sequence of the lac Operator". Proc. Natl. Acad. Sci. U.S.A. 70 (12): 3581–4. Bibcode:1973PNAS...70.3581G. doi:10.1073/pnas.70.12.3581. PMC 427284. PMID 4587255.
  18. Sanger F; Air GM; Barrell BG; et al. (February 1977). "Nucleotide sequence of bacteriophage phi X174 DNA". Nature. 265 (5596): 687–95. Bibcode:1977Natur.265..687S. doi:10.1038/265687a0. PMID 870828. Unknown parameter |author-separator= ignored (help)
  19. Beck S and Pohl F M, 1984, DNA sequencing with direct blotting electrophoresis. EMBO J., 3(12): 2905 - 2909. PMC557787
  20. United States Patent 4,631,122 (1986)
  21. Feldmann, H et al., 1994, Complete DNA sequence of yeast chromosome II. EMBO J., 1994; 13(24): 5795–5809. PMC395553
  22. Smith, LM (June 12, 1986). "Fluorescence Detection in Automated DNA Sequence Analysis". Nature. 321 (6071): 674–79. doi:10.1038/321674a0. PMID 3713851. Unknown parameter |coauthors= ignored (help)
  23. Prober, JM (Oct 16, 1987). "A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides". Science (New York, N.Y.). 238 (4825): 336–41. doi:10.1126/science.2443975. PMID 2443975. Unknown parameter |coauthors= ignored (help)
  24. Adams MD; Kelley JM; Gocayne JD; et al. (June 1991). "Complementary DNA sequencing: expressed sequence tags and human genome project". Science. 252 (5013): 1651–6. Bibcode:1991Sci...252.1651A. doi:10.1126/science.2047873. PMID 2047873. Unknown parameter |author-separator= ignored (help)
  25. Fleischmann RD; Adams MD; White O; et al. (July 1995). "Whole-genome random sequencing and assembly of Haemophilus influenzae Rd". Science. 269 (5223): 496–512. Bibcode:1995Sci...269..496F. doi:10.1126/science.7542800. PMID 7542800. Unknown parameter |author-separator= ignored (help)
  26. Lander ES; Linton LM; Birren B; et al. (February 2001). "Initial sequencing and analysis of the human genome". Nature. 409 (6822): 860–921. doi:10.1038/35057062. PMID 11237011. Unknown parameter |author-separator= ignored (help)
  27. Venter JC; Adams MD; Myers EW; et al. (February 2001). "The sequence of the human genome". Science. 291 (5507): 1304–51. Bibcode:2001Sci...291.1304V. doi:10.1126/science.1058040. PMID 11181995. Unknown parameter |author-separator= ignored (help)
  28. M. Ronaghi, S. Karamohamed, B. Pettersson, M. Uhlen, and P. Nyren (1996). "Real-time DNA sequencing using detection of pyrophosphate release". Analytical Biochemistry. 242 (1): 84–9. doi:10.1006/abio.1996.0432. PMID 8923969.
  29. 29.0 29.1 Kawashima, Eric H.; Laurent Farinelli; Pascal Mayer (2005-05-12). "Patent: Method of nucleic acid amplification". Retrieved 2012-12-22{{inconsistent citations}}
  30. Brenner S; et al. (2000). "Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays". Nature Biotechnology. Nature Biotechnology. 18 (6): 630–634. doi:10.1038/76469. PMID 10835600. Unknown parameter |author-separator= ignored (help)
  31. Stein RA (1 September 2008). "Next-Generation Sequencing Update". Genetic Engineering & Biotechnology News. 28 (15).
  32. Margulies M; Egholm M; Altman WE; et al. (September 2005). "Genome Sequencing in Open Microfabricated High Density Picoliter Reactors". Nature. 437 (7057): 376–80. Bibcode:2005Natur.437..376M. doi:10.1038/nature03959. PMC 1464427. PMID 16056220. Unknown parameter |author-separator= ignored (help)
  33. 33.0 33.1 33.2 33.3 Schuster Stephan C. (January 2008). "Next-generation sequencing transforms today's biology". Nat. Methods. 5 (1): 16–8. doi:10.1038/nmeth1156. PMID 18165802.
  34. Ewing B, Green P (March 1998). "Base-calling of automated sequencer traces using phred. II. Error probabilities". Genome Res. 8 (3): 186–94. doi:10.1101/gr.8.3.186. PMID 9521922. Unknown parameter |doi_brokendate= ignored (help)
  35. Sanger F, Coulson AR (May 1975). "A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase". J. Mol. Biol. 94 (3): 441–8. doi:10.1016/0022-2836(75)90213-2. PMID 1100841.
  36. Wetterstrand, Kris. "DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP)". National Human Genome Research Institute. Retrieved 30 May 2013.
  37. http://onlinelibrary.wiley.com/doi/10.1002/elps.201200128/abstract
  38. http://onlinelibrary.wiley.com/doi/10.1111/j.1462-2920.2012.02791.x/abstract;jsessionid=C705EAD430A7C16FE74774C8B4B6F814.f02t01
  39. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0037135
  40. Richard Williams, Sergio G Peisajovich, Oliver J Miller, Shlomo Magdassi, Dan S Tawfik, Andrew D Griffiths (2006). "Amplification of complex gene libraries by emulsion PCR". Nature methods. 3 (7): 545–550. doi:10.1038/nmeth896. PMID 16791213.
  41. 41.0 41.1 Margulies M; Egholm M; Altman WE; et al. (September 2005). "Genome Sequencing in Open Microfabricated High Density Picoliter Reactors". Nature. 437 (7057): 376–80. Bibcode:2005Natur.437..376M. doi:10.1038/nature03959. PMC 1464427. PMID 16056220. Unknown parameter |author-separator= ignored (help)
  42. Shendure, J.; Porreca, GJ; Reppas, NB; Lin, X; McCutcheon, JP; Rosenbaum, AM; Wang, MD; Zhang, K; et al. (2005). "Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome". Science. 309 (5741): 1728–32. Bibcode:2005Sci...309.1728S. doi:10.1126/science.1117389. PMID 16081699.
  43. Applied Biosystems' SOLiD technology
  44. Staden, R (Jun 11, 1979). "A strategy of DNA sequencing employing computer programs". Nucleic Acids Research. 6 (7): 2601–10. doi:10.1093/nar/6.7.2601. PMC 327874. PMID 461197.
  45. P. Mayer,L. Farinelli, G. Matton, C. Adessi, G. Turcatti, J. J. Mermod, E. Kawashima.DNA colony massively parallel sequencing ams98 presentation
  46. U.S. Patent 5,641,658
  47. Braslavsky I, Hebert B, Kartalov E, Quake SR; Hebert; Kartalov; Quake (April 2003). "Sequence information can be obtained from single DNA molecules". Proc. Natl. Acad. Sci. U.S.A. 100 (7): 3960–4. Bibcode:2003PNAS..100.3960B. doi:10.1073/pnas.0230489100. PMC 153030. PMID 12651960.
  48. Hall, Nell (May 2007). "Advanced sequencing technologies and their wider impact in microbiology". J. Exp. Biol. 209 (Pt 9): 1518&ndash, 1525. doi:10.1242/jeb.001370. PMID 17449817.open access publication – free to read
  49. Church, George M. (January 2006). "Genomes for all". Sci. Am. 294 (1): 46&ndash, 54. doi:10.1038/scientificamerican0106-46. PMID 16468433.(subscription required)
  50. Kalb, Gilbert; Moxley, Robert (1992). Massively Parallel, Optical, and Neural Computing in the United States. IOS Press. ISBN 90-5199-097-9.[page needed]
  51. PMID 18832462 (PMID 18832462)
    Citation will be completed automatically in a few minutes. Jump the queue or expand by handopen access publication – free to read
  52. PMID 19679224 (PMID 19679224)
    Citation will be completed automatically in a few minutes. Jump the queue or expand by handopen access publication – free to read
  53. Quail, Michael; Smith, Miriam E; Coupland, Paul; et al. (1 January 2012). "A tale of three next generation sequencing platforms: comparison of Ion torrent, pacific biosciences and illumina MiSeq sequencers". BMC Genomics. 13 (1): 341. doi:10.1186/1471-2164-13-341. PMC 3431227. PMID 22827831.open access publication – free to read
  54. Liu, Lin; Li, Yinhu; Li, Siliang; et al. (1 January 2012). "Comparison of Next-Generation Sequencing Systems". Journal of Biomedicine and Biotechnology. Hindawi Publishing Corporation. 2012: 1&ndash, 11. doi:10.1155/2012/251364.open access publication – free to read
  55. New Products: PacBio's RS II; Cufflinks | In Sequence | Sequencing | GenomeWeb
  56. "After a Year of Testing, Two Early PacBio Customers Expect More Routine Use of RS Sequencer in 2012". GenomeWeb. 10 January 2012.Template:Registration required
  57. Pacific Biosciences Introduces New Chemistry With Longer Read Lengths
  58. http://www.nature.com/nmeth/journal/v10/n6/full/nmeth.2474.html
  59. 59.0 59.1 De novo bacterial genome assembly: a solved problem? | In between lines of code
  60. Rasko, David A.; Webster, Dale R.; Sahl, Jason W.; et al. (25 August 2011). "Origins of the Strain Causing an Outbreak of Hemolytic–Uremic Syndrome in Germany". N Engl J Med. 365 (8): 709&ndash, 717. doi:10.1056/NEJMoa1106920.open access publication – free to read
  61. Tran, Ben; Brown, Andrew M.K.; Bedard, Philippe L.; Winquist, Eric; Goss, Glenwood D.; Hotte, Sebastien J.; Welch, Stephen A.; Hirte, Hal W.; Zhang, Tong; Stein, Lincoln D.; Ferretti, Vincent; Watt, Stuart; Jiao, Wei; Ng, Karen; Ghai, Sangeet; Shaw, Patricia; Petrocelli, Teresa; Hudson, Thomas J.; Neel, Benjamin G.; et al. (1 January 2012). "Feasibility of real time next generation sequencing of cancer genes linked to drug response: Results from a clinical trial". Int. J. Cancer: 1547&ndash, 1555. doi:10.1002/ijc.27817.(subscription required)
  62. van Vliet, Arnoud H.M. (1 January 2010). "Next generation sequencing of microbial transcriptomes: challenges and opportunities". FEMS Microbiology Letters. 302 (1): 1&ndash, 7. doi:10.1111/j.1574-6968.2009.01767.x.open access publication – free to read
  63. Murray, I. A. (2 October 2012). "The methylomes of six bacteria". Nucleic Acids Research. 40 (22): 11450–62. doi:10.1093/nar/gks891. PMC 3526280. PMID 23034806. Unknown parameter |coauthors= ignored (help)
  64. 64.0 64.1 Yu-Feng Huang, Sheng-Chung Chen, Yih-Shien Chiang, Tzu-Han Chen & Kuo-Ping Chiu (2012). "Palindromic sequence impedes sequencing-by-ligation mechanism". BMC systems biology. 6 Suppl 2: S10. doi:10.1186/1752-0509-6-S2-S10. PMID 23281822.
  65. Brenner, Sidney; Johnson, M; Bridgham, J; Golda, G; Lloyd, DH; Johnson, D; Luo, S; McCurdy, S; Foy, M (2000). "Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays". Nature Biotechnology. Nature Biotechnology. 18 (6): 630–634. doi:10.1038/76469. PMID 10835600.
  66. Shendure, J (Sep 9, 2005). "Accurate multiplex polony sequencing of an evolved bacterial genome". Science. 309 (5741): 1728–32. Bibcode:2005Sci...309.1728S. doi:10.1126/science.1117389. PMID 16081699. Unknown parameter |coauthors= ignored (help)
  67. PMID 18987734 (PMID 18987734)
    Citation will be completed automatically in a few minutes. Jump the queue or expand by hand
  68. Mardis ER (2008). "Next-generation DNA sequencing methods". Annu Rev Genomics Hum Genet. 9: 387–402. doi:10.1146/annurev.genom.9.081307.164359. PMID 18576944.
  69. Valouev A; Ichikawa J; Tonthat T; et al. (July 2008). "A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning". Genome Res. 18 (7): 1051–63. doi:10.1101/gr.076463.108. PMC 2493394. PMID 18477713. Unknown parameter |author-separator= ignored (help)
  70. Rusk N (2011). "Torrents of sequence". Nat Meth. 8 (1): 44–44. doi:10.1038/nmeth.f.330.
  71. Drmanac R.; et al. (2010). "Human Genome Sequencing Using Unchained Base Reads in Self-Assembling DNA Nanoarrays". Science. 327 (5961): 78–81. Bibcode:2010Sci...327...78D. doi:10.1126/science.1181498. PMID 19892942.
  72. Porreca JG (2010). "Genome Sequencing on Nanoballs". Nature Biotechnology. 28 (1): 43–44. doi:10.1038/nbt0110-43. PMID 20062041.
  73. Drmanac R.; et al. (2010). "Human Genome Sequencing Using Unchained Base Reaads in Self-Assembling DNA Nanoarrays, Supplementary Material". Science. 327 (5961): 78–81. Bibcode:2010Sci...327...78D. doi:10.1126/science.1181498. PMID 19892942.
  74. Complete Genomics Press release, 2010
  75. HeliScope Gene Sequencing / Genetic Analyzer System : Helicos BioSciences
  76. Thompson, JF; Steinmann, KE (October 2010). "Single molecule sequencing with a HeliScope genetic analysis system". Current Protocols in Molecular Biology. Chapter 7: Unit7.10. doi:10.1002/0471142727.mb0710s92. PMC 2954431. PMID 20890904.
  77. Harris, TD (Apr 4, 2008). "Single-molecule DNA sequencing of a viral genome". Science. 320 (5872): 106–9. Bibcode:2008Sci...320..106H. doi:10.1126/science.1150427. PMID 18388294. Unknown parameter |coauthors= ignored (help)
  78. PacBio Sales Start to Pick Up as Company Delivers on Product Enhancements | In Sequence | Sequencing | GenomeWeb
  79. "VisiGen Biotechnologies Inc. – Technology Overview". Visigenbio.com. Retrieved 2009-11-15.
  80. "The Harvard Nanopore Group". Mcb.harvard.edu. Retrieved 2009-11-15.
  81. "Nanopore Sequencing Could Slash DNA Analysis Costs".
  82. US patent 20060029957, ZS Genetics, "Systems and methods of analyzing nucleic acid polymers and related components", issued 2005-07-14 
  83. Xu M, Fujita D, Hanagata N (December 2009). "Perspectives and challenges of emerging single-molecule DNA sequencing technologies". Small. 5 (23): 2638–49. doi:10.1002/smll.200900976. PMID 19904762.
  84. Schadt, E.E.; S. Turner; A. Kasarskis (2010). "A window into third-generation sequencing". Human Molecular Genetics. 19 (R2): R227–40. doi:10.1093/hmg/ddq416. PMID 20858600.
  85. Stoddart, D (May 12, 2009). "Single-nucleotide discrimination in immobilized DNA oligonucleotides with a biological nanopore". Proceedings of the National Academy of Sciences of the United States of America. 106 (19): 7702–7. Bibcode:2009PNAS..106.7702S. doi:10.1073/pnas.0901054106. PMC 2683137. PMID 19380741. Unknown parameter |coauthors= ignored (help)
  86. 86.0 86.1 PMID 22948520 (PMID 22948520)
    Citation will be completed automatically in a few minutes. Jump the queue or expand by hand
  87. 87.0 87.1 Pathak, B., Lofas, H., Prasongkit, J., Grigoriev, A., Ahuja, R., & Scheicher, R. H. (January 09, 2012). Double-functionalized nanopore-embedded gold electrodes for rapid DNA sequencing. Applied Physics Letters, 100, 2.)
  88. PMID 18216253 (PMID 18216253)
    Citation will be completed automatically in a few minutes. Jump the queue or expand by hand
  89. Massimiliano Di Ventra (2013). "Fast DNA sequencing by electrical means inches closer". Nanotechnology 24 342501
  90. Ohshiro T et al 2012 Sci. Rep. 2 501–7
  91. Hanna GJ; Johnson VA; Kuritzkes DR; et al. (1 July 2000). "Comparison of Sequencing by Hybridization and Cycle Sequencing for Genotyping of Human Immunodeficiency Virus Type 1 Reverse Transcriptase". J. Clin. Microbiol. 38 (7): 2715–21. PMC 87006. PMID 10878069. Unknown parameter |author-separator= ignored (help)
  92. 92.0 92.1 PMID 23742747 (PMID 23742747)
    Citation will be completed automatically in a few minutes. Jump the queue or expand by hand
  93. PMID 22574124 (PMID 22574124)
    Citation will be completed automatically in a few minutes. Jump the queue or expand by hand
  94. J.R. Edwards, H.Ruparel, and J. Ju (2005). "Mass-spectrometry DNA sequencing". Mutation Research. 573 (1–2): 3–12. doi:10.1016/j.mrfmmm.2004.07.021. PMID 15829234.
  95. Hall, Thomas A. (2005). "Base composition analysis of human mitochondrial DNA using electrospray ionization mass spectrometry: A novel tool for the identification and differentiation of humans". Analytical Biochemistry. 344 (1): 53–69. doi:10.1016/j.ab.2005.05.028. PMID 16054106. Unknown parameter |coauthors= ignored (help)
  96. Howard, R (Jun 15, 2011). "Comparative analysis of human mitochondrial DNA from World War I bone samples by DNA sequencing and ESI-TOF mass spectrometry". Forensic science international. Genetics. 7 (1): 1–9. doi:10.1016/j.fsigen.2011.05.009. PMID 21683667. Unknown parameter |coauthors= ignored (help)
  97. Monforte, Joseph A.; Becker, Christopher H. (1 March 1997). "High-throughput DNA analysis by time-of-flight mass spectrometry". Nature Medicine. 3 (3): 360–362. doi:10.1038/nm0397-360. PMID 9055869.
  98. Beres, S. B. (8 February 2010). "Molecular complexity of successive bacterial epidemics deconvoluted by comparative pathogenomics". Proceedings of the National Academy of Sciences. 107 (9): 4371–4376. Bibcode:2010PNAS..107.4371B. doi:10.1073/pnas.0911295107. Unknown parameter |coauthors= ignored (help)
  99. Kan, Cheuk-Wai (1 November 2004). "DNA sequencing and genotyping in miniaturized electrophoresis systems". Electrophoresis. 25 (21–22): 3564–3588. doi:10.1002/elps.200406161. PMID 15565709. Unknown parameter |coauthors= ignored (help)
  100. Ying-Ja Chen, Eric E. Roller and Xiaohua Huang (2010). "DNA sequencing by denaturation: experimental proof of concept with an integrated fluidic device". Lab on Chip. 10 (10): 1153–1159. doi:10.1039/b921417h.
  101. Bell, DC (Oct 9, 2012). "DNA Base Identification by Electron Microscopy". Microscopy and microanalysis : the official journal of Microscopy Society of America, Microbeam Analysis Society, Microscopical Society of Canada. 18 (5): 1–5. Bibcode:2012MiMic..18.1049B. doi:10.1017/S1431927612012615. PMID 23046798. Unknown parameter |coauthors= ignored (help)
  102. Pareek, CS (November 2011). "Sequencing technologies and genome sequencing". Journal of applied genetics. 52 (4): 413–35. doi:10.1007/s13353-011-0057-x. PMC 3189340. PMID 21698376. Unknown parameter |coauthors= ignored (help)
  103. PMID 21698376 (PMID 21698376)
    Citation will be completed automatically in a few minutes. Jump the queue or expand by hand
  104. Fujimori, S (2012). "Next-generation sequencing coupled with a cell-free display technology for high-throughput production of reliable interactome data". Scientific reports. 2: 691. Bibcode:2012NatSR...2E.691F. doi:10.1038/srep00691. PMC 3466446. PMID 23056904. Unknown parameter |coauthors= ignored (help)
  105. "PRIZE Overview: Archon X PRIZE for Genomics"
  106. The Future of DNA Sequencing
  107. Jessica Severin, Marina Lizio, Jayson Harshbarger, Hideya Kawaji, Carsten O Daub, Yoshihide Hayashizaki, the FANTOM consortium, Nicolas Bertin, and Alistair RR Forrest. Interactive visualization and analysis of large-scale NGS data-sets using ZENBU. Nature Biotechnology, March 2014 doi:10.1038/nbt.2840
  108. Template:Cite paper
  109. Del Fabbro C, Scalabrin S, Morgante M and Giorgi FM (2013). "An Extensive Evaluation of Read Trimming Effects on Illumina NGS Data Analysis". PLoS ONE. 8 (12): e85024. doi:10.1371/journal.pone.0085024. PMC 3871669. PMID 24376861.

External Links