7.13A: Strategies Used in Sequencing Projects - Biology

7.13A: Strategies Used in Sequencing Projects - Biology

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

The strategies used for sequencing genomes include the Sanger method, shotgun sequencing, pairwise end, and next-generation sequencing.


Compare the different strategies used for whole-genome sequencing: Sanger method, shotgun sequencing, pairwise-end sequencing, and next-generation sequencing

Key Points

  • The Sanger method is a basic sequencing technique that uses fluorescently-labeled dideoxynucleotides (ddNTPs) during DNA replication which results in multiple short strands of replicated DNA that terminate at different points, based on where the ddNTP was incorporated.
  • Shotgun sequencing is a method that randomly cuts DNA fragments into smaller pieces and then, with the help of a computer, takes the DNA fragments, analyzes them for overlapping sequences, and reassembles the entire DNA sequence.
  • Pairwise-end sequencing is a type of shotgun sequencing which is used for larger genomes and analyzes both ends of the DNA fragments for overlap.
  • Next-generation sequencing is a type of sequencing which is automated and relies on sophisticated software for rapid DNA sequencing.

Key Terms

  • fluorophore: a molecule or functional group which is capable of fluorescence
  • contig: a set of overlapping DNA segments, derived from a single source of genetic material, from which the complete sequence may be deduced
  • dideoxynucleotide: any nucleotide formed from a deoxynucleotide by loss of an a second hydroxyl group from the deoxyribose group

Strategies Used in Sequencing Projects

The basic sequencing technique used in all modern day sequencing projects is the chain termination method (also known as the dideoxy method), which was developed by Fred Sanger in the 1970s. The chain termination method involves DNA replication of a single-stranded template with the use of a primer and a regular deoxynucleotide (dNTP), which is a monomer, or a single unit, of DNA. The primer and dNTP are mixed with a small proportion of fluorescently-labeled dideoxynucleotides (ddNTPs). The ddNTPs are monomers that are missing a hydroxyl group (–OH) at the site at which another nucleotide usually attaches to form a chain. Each ddNTP is labeled with a different color of fluorophore. Every time a ddNTP is incorporated in the growing complementary strand, it terminates the process of DNA replication, which results in multiple short strands of replicated DNA that are each terminated at a different point during replication. When the reaction mixture is processed by gel electrophoresis after being separated into single strands, the multiple, newly-replicated DNA strands form a ladder due to their differing sizes. Because the ddNTPs are fluorescently labeled, each band on the gel reflects the size of the DNA strand and the ddNTP that terminated the reaction. The different colors of the fluorophore-labeled ddNTPs help identify the ddNTP incorporated at that position. Reading the gel on the basis of the color of each band on the ladder produces the sequence of the template strand.

Early Strategies: Shotgun Sequencing and Pair-Wise End Sequencing

In the shotgun sequencing method, several copies of a DNA fragment are cut randomly into many smaller pieces (somewhat like what happens to a round shot cartridge when fired from a shotgun). All of the segments are then sequenced using the chain-sequencing method. Then, with the help of a computer, the fragments are analyzed to see where their sequences overlap. By matching overlapping sequences at the end of each fragment, the entire DNA sequence can be reformed. A larger sequence that is assembled from overlapping shorter sequences is called a contig. As an analogy, consider that someone has four copies of a landscape photograph that you have never seen before and know nothing about how it should appear. The person then rips up each photograph with their hands, so that different size pieces are present from each copy. The person then mixes all of the pieces together and asks you to reconstruct the photograph. In one of the smaller pieces you see a mountain. In a larger piece, you see that the same mountain is behind a lake. A third fragment shows only the lake, but it reveals that there is a cabin on the shore of the lake. Therefore, from looking at the overlapping information in these three fragments, you know that the picture contains a mountain behind a lake that has a cabin on its shore. This is the principle behind reconstructing entire DNA sequences using shotgun sequencing.

Originally, shotgun sequencing only analyzed one end of each fragment for overlaps. This was sufficient for sequencing small genomes. However, the desire to sequence larger genomes, such as that of a human, led to the development of double-barrel shotgun sequencing, more formally known as pairwise-end sequencing. In pairwise-end sequencing, both ends of each fragment are analyzed for overlap. Pairwise-end sequencing is, therefore, more cumbersome than shotgun sequencing, but it is easier to reconstruct the sequence because there is more available information.

Next-generation Sequencing

Since 2005, automated sequencing techniques used by laboratories are under the umbrella of next-generation sequencing, which is a group of automated techniques used for rapid DNA sequencing. These automated, low-cost sequencers can generate sequences of hundreds of thousands or millions of short fragments (25 to 500 base pairs) in the span of one day. Sophisticated software is used to manage the cumbersome process of putting all the fragments in order.

Chapter 15 clickers

b. Their construction is based on genetic recombination.

c. They are limited because their construction is dependent upon single-locus traits.

a. recombination frequency

d. sequence-tagged site (STS) mapping

a. The shotgun approach does not require the use of plasmids.

b. The shotgun approach is highly automated.

c. Because of the efficiency of the shotgun approach, the genome is sequenced only one time.

a. The human genome project has led to development of more efficient and less-expensive sequencing technologies.

b. The human genome project has helped identify many genes related to diseases.

c. The human genome project has helped scientists understand basic biological processes.

d. The human genome project has led to widespread use of genetic tests by health insurers to assess people's risk of developing disease.

a. they are rare among persons of the same ethnic group.

b. they are inherited as allelic variants.

c. they are useful in linkage studies to locate disease-causing alleles.

d. two or more SNPs located close together on a chromosome will exhibit linkage disequilibrium.

a. Bioinformatics combines the fields of molecular biology and computer science.

b. Bioinformatics develops DNA and protein sequence databases.

c. Bioinformatics is essential in managing the vast amount of sequencing data that has been generated.

a. allelic variants homologs

b. the use of reporter sequences

a. should be smaller (in number of base pairs).

b. should have approximately the same number of genes for basic housekeeping functions.

c. should have fewer genes overall.

a. chromosome architecture

a. more phenotypically complex eukaryotes have larger genomes.

b. more phenotypically complex eukaryotes have more genes.

c. eukaryotic genomes contain multiple copies of many genes.

a. Bioinformatics: combines molecular biology and computer science to develop databases of DNA and protein sequence and the tools for their analysis.

b. Functional Genomics: develops and uses methods that allow gene function and expression to be determined from DNA sequence alone.

c. Metagenomics: examines genomes of communities of organisms that inhabit a common environment.

d. Synthetic Biology: creates from scratch novel genomes and organisms that are going to be released into natural environments.

How new is DNA sequencing?

Since the completion of the Human Genome Project, technological improvements and automation have increased speed and lowered costs to the point where individual genes can be sequenced routinely, and some labs can sequence well over 100,000 billion bases per year, and an entire genome can be sequenced for just a few thousand dollars.

Many of these new technologies were developed with support from the National Human Genome Research Institute (NHGRI) Genome Technology Program and its Advanced DNA Sequencing Technology awards. One of NHGRI's goals is to promote new technologies that could eventually reduce the cost of sequencing a human genome of even higher quality than is possible today and for less than $1,000.

Since the completion of the Human Genome Project, technological improvements and automation have increased speed and lowered costs to the point where individual genes can be sequenced routinely, and some labs can sequence well over 100,000 billion bases per year, and an entire genome can be sequenced for just a few thousand dollars.

Many of these new technologies were developed with support from the National Human Genome Research Institute (NHGRI) Genome Technology Program and its Advanced DNA Sequencing Technology awards. One of NHGRI's goals is to promote new technologies that could eventually reduce the cost of sequencing a human genome of even higher quality than is possible today and for less than $1,000.


Language Arts

Story maps provide one way to help students organize the events from a story.

Helping students learn transition or signal words that indicate a sequence (first, second, last) will also help them learn about sequence.

Sequence sticks, story chains, story retelling ropes, and story sequence crafts all help students practice ordering events within a story. See these resources for ideas:

Most math curricula include worksheets on ordinal numbers (first, second, third, etc). Patterns are also a form of sequencing my encouraging the use of vocabulary words such as "What bead goes first? Then which bead? Which bead is third?" Encouraging students to write out the steps for solving addition and subtraction problems that include regrouping is an excellent way to have them think through the steps in order. Teachers can use a simple sheet of paper folded into four squares. Ask students to write the steps in order in the squares.


Helping children sequence also develops their scientific inquiry skills. In order to study or observe changes in something, students must follow along and record changes. The changes happen in a particular order, which kids can document by writing or drawing pictures.

Social Studies

Timelines are a great way to teach sequence in social studies. Kids may enjoy making a timeline of their own life, and include important milestones such as when they learned to walk, talk, ride a bike and go to school. Once students understand the process of charting important milestones on a timeline, topics from the social studies curricula can be used.

This simple example of an explorers timeline illustrates how the spacing between dates indicates the passage of time.

Other ideas for sequencing

  • Arts & crafts activities. It may be desirable to consider quilt-making and/or other arts & crafts activities with children. This and other arts and craft activities may reinforce the idea of sequencing and may introduce math concepts (measurement, addition & subtraction and basic computation, etc). Alex Henderson's Kids Start Quilting with Alex Anderson: 7 Fun & Easy Projects Quilts for Kids by Kids, Tips for Quilting with Children provides easy instructions for adults quilting with children.
  • Cooking with kids. Cookbooks for children can reinforce stories read, math concepts (measurement, etc), as well as sequencing. The Little House Cookbook: Frontier Foods from Laura Ingalls Wilder's Classic Stories by Barbara Walker (HarperCollins 0064460908) presents recipes for foods mentioned in the Little House series by Laura Ingalls Wilder.
  • Wordless books. There are many wordless books that can be used with younger children and with English language learners (or students who may have limited English proficiency). For younger children, Pancakes for Breakfast by Tomie dePaola humorously details a woman making pancakes from scratch or the wordless adventures of Mark Newgarden's a small dog named Bow-Wow (e.g., Bow-Wow Bugs a Bug). For older or more sophisticated readers, books by Barbara Lehmann and David Weisner may be considered.
  • Everyday activities. Create a sequence page for a simple activity around the house or at school. Use any blank sheet of paper. Fold the paper into squares. Start with 4 large squares, for older students create more squares. Ask kids to draw the steps they know in the order in which the steps occur. For example, draw each step it takes to make a peanut butter and jelly sandwich or to brush their teeth.
  • Calendar time. Cut or tear out the pages from an old calendar. Mix up the months and hand out the stack of pages. Ask the kids to order the months from January to December by laying the pages out on the floor. Which month goes first? Then which one? Which month is last?

Download blank templates

Talk Overview

Single cell sequencing, as the name implies, allows researchers to examine the genomic information for individual cells. This provides an opportunity to study cell-to-cell differences and identify cell subtypes, which offers insight into how specific cells function within and respond to their environment. Dr. Eric Chow begins his talk with an overview of single cell sequencing with a focus on RNA. He then goes on to outline the predominant approaches, including plate-based, microfluidic-based, and combinatorial indexing methods. He finishes by addressing approaches to single cell analysis that don’t rely on RNA, including methods that use DNA, proteins, and antibodies. He also reviews some of the benefits and limitations of analysis at the level of individual cells.

Download and print this article for your personal scholarly, research, and educational use.

Buy a single issue of Science for just $15 USD.


Vol 326, Issue 5950
09 October 2009

Article Tools

Please log in to add an alert for this article.

By P. S. G. Chain , D. V. Grafham , R. S. Fulton , M. G. FitzGerald , J. Hostetler , D. Muzny , J. Ali , B. Birren , D. C. Bruce , C. Buhay , J. R. Cole , Y. Ding , S. Dugan , D. Field , G. M. Garrity , R. Gibbs , T. Graves , C. S. Han , S. H. Harrison , S. Highlander , P. Hugenholtz , H. M. Khouri , C. D. Kodira , E. Kolker , N. C. Kyrpides , D. Lang , A. Lapidus , S. A. Malfatti , V. Markowitz , T. Metha , K. E. Nelson , J. Parkhill , S. Pitluck , X. Qin , T. D. Read , J. Schmutz , S. Sozhamannan , P. Sterk , R. L. Strausberg , G. Sutton , N. R. Thomson , J. M. Tiedje , G. Weinstock , A. Wollam , Genomic Standards Consortium Human Microbiome Project Jumpstart Consortium , J. C. Detter

Science 09 Oct 2009 : 236-237

More detailed sequence standards that keep up with revolutionary sequencing technologies will aid the research community in evaluating data.

What is It?

As defined in the PMP certification course, Sequence Activities is the process of identifying and documenting relationships among the project activities . So the main purpose of the sequence activities process is finalizing the interrelationship of activities to complete the project scope and reach the project goals.

The key result of the Sequence Activities process is Network Diagram . Network diagram of a project visualizes the project activities in boxes with activity ID and shows the interrelationship of activities with arrows. Now let’s briefly review this output of Sequence Activities process with a sample.

A Sample Network Diagram for Sequence Activities Process

This figure shows a sample network diagram as a result of sequence activities process.

As you see, after the start of the project,

  • Activity #1 must start first.
  • After Activity #1 finishes, Activity #2 and Activity #3 will begin.
  • Activity #4 can start only after Activity #2 finishes.
  • Activity #5 depends on Activity #2 and Activity #3, therefore, it will start only after these two activities are completed.
  • And the last Activity, activity #6 can start only if Activity #4 and Activity #5 are completed.
  • After activity #6 completion, the project will end.

Note that this is just a simple and sample network diagram in order to show you how a network diagram is. In real-life projects, there will be lots of project activities so the network diagram and also sequence activities process will be much more complex than this.

How to Create a Good Network Diagram in Sequence Activities process?

If the activity durations are added in the Network diagram during sequence activities process, the critical path of the project can be seen as well. A network diagram is a critical input to determine the critical path of the project and if activity durations are placed on these activity boxes, these will help in seeing the critical path of the project as well.

Precedence Diagramming Method which is abbreviated as PDM is the most common method to draw network diagrams. PDM is also referred as Activity-On-Node and abbreviated as AON. This is because the activities are represented as boxes or nodes in the network diagram.

In Precedence Diagramming Method boxes represent the activities of the project and arrows represent the dependencies of these activities.

For instance, in this figure above, Activity A is connected to Activity B with a forward arrow. This means, Activity B is depending on Activity A, and Activity B can start only after Activity A is finished. Or in other words, Activity A is the predecessor of the Activity B.

Application of Human Genome Project

Some of the different fields where human genome project application is used are: (a) Molecular Medicine (b) Waste Control and Environmental Cleanup (c) Biotechnology (d) Energy Sources (e) Risk Assessment (f) DNA Forensics (Identification).

(a) Molecular Medicine:

Genetic screening will enable rapid and specific diagnostic tests making it possible to treat countless maladies.

DNA- based tests clarify diagnosis quickly and enable geneticists to detect carriers within families. Genomic information can indicate the future likelihood of some diseases. The diseases where susceptibility may be determined include heart disease, cancer, and diabetes.

(b) Waste Control and Environmental Cleanup:

In 1994, the Microbial Genome Initiative was formulated to sequence the genomes of bacteria useful in the areas of energy production, environmental remediation, toxic waste reduction, and industrial processing. Resulting from that project, six microbes that live under extreme temperature and pressure conditions have been sequenced. By learning the unique protein structure of these microbes, it may be possible to use the organisms and their enzymes for such practical purposes as waste control and environmental cleanup.

(c) Biotechnology:

Sales of biotechnology products are projected to be very high in U.S.A. by the year 2000. The HGP has stimulated significant investment by large corporations and promoted the development of new biotechnology.

(d) Energy Sources:

Biotechnology will be important in improving the use of fossil-based resources. Increased energy demands require strategies to circumvent the many problems with today’s dominant energy technologies. Biotechnology will help address these needs by providing a cleaner means for the bioconversion of raw materials to refined products. Additionally, there is the possibility of developing entirely new biomass-based energy sources. Having the genomic sequence of the methane-producing microorganism.

(e) Risk Assessment:

Scientists know that genetic differences cause some people to be more susceptible than others to such agents. More work must be done to determine the genetic basis of such variability, but this knowledge will directly address the DOE’s long-term mission to understand the effects of low-level exposures to radiation and other energy-related agents, especially in terms of cancer risk.

(f) DNA Forensics (Identification):

To identify individuals, forensic scientists scan 13 DNA regions, or loci, that vary from person to person and use the data to create a DNA profile of that individual (sometimes called a DNA fingerprint).

Classroom Management and Teacher Survival

These strategies deal with establishing a framework for positive teacher and student experiences.

Set Clear Learning Expectations: After you’ve set them, communicate them, clearly. Tell students on day one or two what your goals are for them. Since they’re probably not listening, tell them again. Write classroom expectations on the course syllabus. Tell them specific expectations before every assignment. Write down assignment expectations in the form of a rubric. Write them down in instructions.

Establish Clear Behavioral Expectations: After you’ve established them, communicate them, clearly. Behavioral expectations are best established through your actions. If you expect kids to be on time, for example, you better be ready to crack down on that first tardy kid. Both your learning and behavioral expectations should be clearly stated in your course expectations.

Establish a Routine: Ordinary tasks, such as collecting papers, moving into groups, or getting a book off the shelf, can quickly become chaotic if there’s not an established routine. Take the time the first few weeks of school to go over in detail how things are done. Adjust, if necessary.

Document Everything: This is especially important early in your career when administrators, students and parents may not find you as credible as your more experienced colleagues. Items to document include lesson plans, student conferences, parent contacts, student misbehavior, meetings attended and parent/teacher conferences.

Find a Mentor: If you’re new, find a mentor. There’s another teacher at your school who understands what you’re going through and, more importantly, knows how to help. If it’s someone who teaches the same subject, great. If not, that’s OK. You’re better off learning the ropes from someone competent who teaches a different subject than an idiot who has similar interests.


In the early days of genetics, scientists did not have the resources to look at more than a few genes at a time. This made the process of understanding the influence of genetics on an organism slow and arduous. Scientists were faced with the enormous task of attempting to understand genetic influence with little information to complete the task. The understanding of genes would have been very helpful in solving this problem.

The year 1995 saw the completion of the first two complete non-viral genomes, Haemophilus influenzae [1] and Mycoplasma genitalium [1], two bacteria that can cause human disease. Since then, over 100 genomes have been fully sequenced, including those of higher organisms like baker’s yeast, the fruit fly, and the nematode [2]. With the announcement in June of 2001 that the first draft of the human genome had been completed [3], scientists’ approach to biology completely changed. The entire set of human genes was now available. This represented an irresistible amount of data that breached the bioinformatic gap that lay between biologists and their understanding of genetics.

To begin to see the significance of such an historical event, it is necessary to look at why uncovering a genome is an important biological task.

The genome refers to all DNA present in an organism.

DNA is the “genetic blueprint” that determines the genotypic make-up of each organism. In its barest form, DNA consists of two strings of nucleotides, or bases (abbreviated A, C, G, and T), wound around each other. The bases composing DNA have specific binding capabilities: A always binds to T, and C always binds to G. These binding capabilities are useful for scientists to understand since, if the nucleotide sequence of one DNA strand is determined, complementary binding allows the sequence of other strand to be deduced.

In the case of humans, DNA is organized into 24 structural units called chromosomes. Each chromosome consists of compacted coils of DNA. While much of this DNA has no known function (these stretches of DNA are conveniently referred to as spacer DNA or junk DNA), a significant portion of the DNA codes for genes. Each gene provides the information necessary to produce a protein, which is responsible for carrying out cellular functions. The complement of proteins in an organism is very important, with diseases often manifesting when a protein does not function properly.

Why Sequence Genomes, Especially Non-Human Genomes?

One of the interesting things about biological organisms is their remarkable similarity at the molecular level, despite their obvious outward differences. For instance, many genes are found in morphologically different organisms despite the phylogenetic distance between them4. Not only are these genes very similar in their DNA sequence composition they also tend to perform the same functions. Thus, by understanding the function of a gene in one organism, scientists can get an idea of what function that gene may perform in a more complex organism such as humans. The knowledge gained can then be applied to various fields such as medicine, biological engineering and forensics.

The Sequencing Reaction: How the Nucleotide Composition of DNA is Determined

To understand how DNA is sequenced, one must first know a little about the structure of DNA:

  • A segment of DNA, which is ordinarily double stranded, has a specific orientation, as it has a 5′ (read as 𔄝 prime”) and a 3′ (𔄛 prime”) end. This can be simply thought of as a front and tail end to the DNA segment.
  • When DNA is synthesized in the lab, the two strands are separated and new bases are added to the 3′ end-thus DNA is assembled from the 5′ to 3′ end.
  • DNA cannot be synthesized from scratch. A short piece of DNA, called a primer, is required for the reaction to begin.
  • Primers are designed such that they are able to bind to the target DNA, the binding of which is the initiator for DNA synthesis.

DNA sequencing is accomplished by the Fredrick Sanger method (see Figure 1), for which he won his second Nobel Prize in 1980.

Figure 1. The Sanger sequencing reaction. Single stranded DNA is amplified in the presence of fluorescently labelled ddNTPs that serve to terminate the reaction and label all the fragments of DNA produced. The fragments of DNA are then separated via polyacrylamide gel electrophoresis and the sequence read using a laser beam and computer.

This method essentially involves amplifying a single stranded piece of DNA many times [5]. Normally, when DNA is amplified, new deoxy-nucleotides ( dNTPs) are added as the strand of DNA grows. The Sanger method employs special bases called dideoxy-nucleotides ( ddNTPs). These are similar to dNTPs, except for two important differences: they have fluorescent tags attached to them (a different tag for each of the 4 ddNTPs) and are missing a crucial atom that prevents new bases from being added to a DNA strand after a ddNTPs has been added. Thus, once a ddNTP is inserted into a growing DNA strand, synthesis of that strand is stopped. After many repeated cycles of amplification this will result in all the possible lengths of DNA being represented and every piece of synthesized DNA containing a fluorescent label at its terminus.

Amplified DNA can then be separated according to size via gel electrophoresis. As the fluorescent DNA reaches the bottom of the gel (now separated from smallest to largest), a laser can pick up the fluorescence of each piece of DNA. The trick to the Sanger method lies in the fact that each ddNTP emits a different fluorescent signal, so that the presence of a ddNTP at the terminus can be recorded on a computer (see Figure 2). The reaction is set up so that a fluorescent ddNTP is present at every position in the DNA strand (i.e. every possible size of DNA strand is present) so that every nucleotide in the strand can be determined. A computer program can then compile the data into a coloured graph showing the determined sequence.

In the past, the separation of the DNA strands by electrophoresis was a time consuming step, requiring the use of radioisotopes for labelling ddNTPs. This was less than trivial, as four different sequencing reactions were required (one for each ddNTP) and the resulting sequencing gel needed to be analyzed manually. Today, fluorescent labels and new advances in gel electrophoresis have made DNA sequencing not only fast and far more accurate, but also almost fully automated, including the read out of the final sequence.

Figure 2. An electropherogram of a finished sequencing reaction. As the fragments from the sequencing reaction are resolved via electrophoresis, a laser reads the fluorescence of each fragment (blue, green, red or yellow) and compiles the data into an image. Each colour, or fluorescence intensity, represents a different nucleotide (e.g. blue for C) and reveals where that nucleotide is in the sequence.

While the Sanger method is the accepted method for sequencing DNA, one cannot sequence a complete genome using this method alone. The main reason for this is that as the pieces of DNA get larger, resolving two pieces by one base becomes virtually impossible [6]. In fact, only about 1000 bases can be sequenced accurately, a far cry from the 50 to 250 million bases that comprise a human chromosome. Furthermore, as stated above, a primer of known sequence is required for each sequencing reaction. Thus, one cannot take any piece of DNA and “just sequence it.” A known starting point, and thus some knowledge of the sequence, is required to begin the reaction. To circumvent this problem, DNA is usually cut up into smaller, more manageable chunks and then placed into a small circular piece of DNA known as a plasmid or cloning vector (a process generally referred to as cloning). The cloning vector’s sequence is known and therefore allows any piece of DNA introduced into it to be sequenced.

With these ideas in mind, scientists set out to design methods to make possible the sequencing of an entire genome. No small task when you consider that the human genome contains approximately three billion bases that needed to be sequenced.

The first method of sequencing a genome, employed by the publicly funded Human Genome Project, involves cloning a large piece of DNA into smaller pieces called sub-clones. With the use of known genetic markers (i.e. physical characteristics that have been attributed to specific areas of a chromosome) a simple and poorly resolved map of where the sub-clones would be located on a chromosome is prepared. This allows the sub-clones to be placed in an order based on the structure of the chromosome. Each individual sub-clone is then sequenced. The resulting sequence is used to create a new primer to sequence flanking regions of the DNA that could not be sequenced in the first round of reactions. This process is continued until the sequences overlap (are contiguous). These contiguous sequences can then be assembled into a group of overlapping sequences, termed a contig. As this method progresses, larger and larger contigs will be produced, until a single ordered contig of the genome is achieved.

A common named for the above method is a ‘top-down’ approach (See Figure 3). If you look at a jigsaw puzzle as an analogy, a top-down approach is similar to starting the puzzle form one corner and working your way down and across in an ordered manner, always building on the last piece that was added. The advantages of this method are that each individual clone can be sent to different people for sequencing and that each stretch of DNA only needs to be sequenced once, as the DNA has already been mapped. However, a large disadvantage to this method is the slow process of sub-cloning and mapping of the clones, requiring significant human manipulation.

Figure 3. The top-down sequencing method. In this approach, a large source clone is first physically mapped before it is broken up into smaller sub-clones. This is done by taking the fragmented source clone and sequentially ordering the sub-clones, based on their original order in the source clone. This requires a physical map of the source clone to work, meaning you need to know that #1 (blue) comes before #2 (yellow) in the source clone. Once the clones have been ordered, each sub-clone is sequenced, and using the overlapping sequences of neighbouring sub-clones, the whole piece is put together.

A second method is the so-called ‘shotgun’ method of sequencing (see Figure 4), which was employed by the privately funded company Celera Genomics to sequence the human genome. This method was the subject of a good deal of debate, as it is relatively crude in comparison to the method employed by the Human Genome Project. It involves each contig being sub-cloned into smaller fragments in the same way as the top-down approach, with the exception that a physical genetic map is not created. Instead, each clone is sequenced first, and then overlapping sequences are joined together to create the contig. In other words, random clones are sequenced (as they are not ordered) in the hopes that overlapping sequences will be found to piece together the contiguous sequence.

Figure 4. Shotgun Sequencing. A relatively crude method of sequencing, shotgun sequencing does not produce a physical map of the source clone first. Instead, the source clone is fragmented, producing a random mixture, and a random sub-clone (i.e. an unordered sequencing clone of blue, yellow, black, red or green) is selected for sequencing by the Sanger method. To ensure that that the whole source clone has been sequenced, this stretch of DNA must be sequenced numerous times (represented by multiples of a single coloured sub-clone) to produce an ordered overlapping sequence. Gaps in this process will occur where a sub-clone is not fully sequenced (blue coloured sub-clone).

Using the jigsaw puzzle analogy again, the shotgun method is similar to starting with random pieces of the puzzle and looking for pieces that fit to it, regardless of where in the puzzle the piece originated from. One major problem with this method is uncertainty. You lack an initial map to guide you, making it difficult to be sure that the entire contig is represented. To get around this problem, the same contig needs to be sequenced many times to ensure that the probability of missing a sub-clone is less than 1%. After which the gaps between contigs must still be filled in, usually through the use of a technique called chromosome walking. The shotgun method is advantageous in that the laborious process of mapping and sub-cloning, requiring human hands, is eliminated. So, while this method requires much more sequencing compared to the first, it proves to be much more economical and faster due to the sequencing reactions being virtually fully automated and the sequences being assembled by computer programs.

When is a Genome Sequence Finished?

When it was announced that the first draft of the human genome was completed [3], it was commonly misreported by many media outlets that the human genome was sequenced. In fact, much more sequencing needs to be done to finish the job. This is because the genome sequence was still in the ‘draft’ stage, meaning that the genome had been sequenced about 4 to 5 times, and the data organized into fragments that are approximately 10,000 bases in size.

To prepare a high quality sequence of the human genome, potential errors in the sequence must still be statistically removed. This is done primarily by closing the gaps between contigs with additional sequencing, ultimately reducing ambiguity and ensuring that there is at most 1 error in every 10,000 bases. The finished version will require that a chromosome be sequenced about 9 to 10 times. Furthermore, not all regions of the chromosome can be cloned, resulting in them being unavailable for sequencing. Luckily, these regions, called heterochromatin, consist of telomeres and centromeres (the tips and centre of the chromosome, respectively), which are rich in repeating sequences (making cloning very difficult) and low in genes. Most of the genes reside in euchromatin, the part of the chromosome that can be sequenced. Therefore, a complete genome sequence actually refers to a high quality sequence of an organism’s euchromatin.

Benefits of Sequencing Projects

Why do we want to determine the A’s, T’s, C’s, and G’s of an organism?

When you get right down to it, a genome is the blueprint of how an organisms functions. If we are interested in understanding the complexity of life (and every biologist and doctor is), having a genome to study is a big step forward.

Scientists are revving up their computers to study genomes and the benefits of this are already being seen. Take the field of medicine as an example. As the population begins to become increasingly health conscious, more attention is being paid to the ongoing research in the medical sciences. As the chromosome maps have become more detailed, genes associated with genetic diseases such as Alzheimer’s disease [7] and familial breast cancer [8] have been identified. This has led to the hope that these diseases can be identified early and that new drugs and treatments can be discovered.

Genome projects also give us insight into other organisms, which has many applications in the industrial sector [9]. Increasing knowledge about domesticated plants and animals can reduce costs in agriculture, for example, by reducing the need for pesticides. Microbes are also an important resource. It has already been shown that bacteria can be used to clean up toxic chemical and oil spills and aid in the clean-up of sewage and waste. Bacteria have also been used to replace many industrial processes that require large amounts of toxic reagents or harsh conditions, making many workplaces, and their surrounding environment, much safer.

Final Words: Where is Genome Science Taking Us?

Even though the numbers of completed genomes is ever increasing, the real work is just beginning. New advances in technology must accommodate the increasing amount of data, as the information available to researchers can be overwhelming. Already new fields of science have been created by the sequencing of genomes. An example of this is functional genomics, which aims to look at the practical aspects of sequenced genomes by looking at genome-wide responses to various elements.

Finally, a whole can of ethical issues have been opened as researchers have begun patenting genes in the hopes of financial reward. Is it right to patent genes that are present in all humans? Who controls the genetic information? Can the use of genetic information oppress and control people, like in the movie Gattaca? Only education, debate and time will produce these answers.

Texts Consulted and Additional Reading

1. Dale JW, von Schantz M. 2002. From Genes to Genomes: Concepts and Applications of DNA Technology. West Sussex, England / New York: Wiley. 360p.

2. Town C, ed. 2002. Functional Genomics. Dordrecht/Boston: Kluwer Academic. 200p.

3. Caporale LH. 2003. Darwin in the Genome: Molecular Strategies in Biological Evolution. New York: McGraw-Hill. 245p.

4. Rangel P, Giovannetti J. 2002. Genomes and databases on the Internet: A Practical Guide to Functions and Applications. Wymondham: Horizon Scientific. 223p.

5. Primrose SB, Twyman RM. 2003. Principles of genome analysis and genomics. Malden, MA: Blackwell Pub. 263p.

1. Two Bacterial Genomes Sequenced. 1995. Human Genome News, May-June 7(1).

2. Genome-Scale Science. National Centre for Biotechnology Information:

3. The Genome International Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860-921.

4. Griffiths et al, eds. 2002. Modern Genetic Analysis: Integrating Genes and Genomes. New York: W.H. Freeman and Co. 736p.

6. Alphey L. 1997. DNA Sequencing: From Experimental Methods to Bioinformatics. New York: Springer. 206p.

7. Lahiri DK, et al. 2003. A Critical Analysis of New Molecular Targets and Strategies for Drug Developments in Alzheimer’s Disease. Curr Drug Targets 4(2): 97-112.

8. Marsh D, Zori R. 2002. Genetic Insights into Familial Cancers — Update and Recent Discoveries. Cancer Lett 181(2): 125-64.

9. Goujon P. 2001. From Biotechnology to Genomes: The Meaning of the Double Helix. NJ: World Scientific. 728p.