Cell-free Protein Synthesis

In the Central Dogma of molecular biology, DNA encodes blueprints, RNA transcribes the blueprints and carries them outside the nucleus, and proteins are build from blueprints.  This general format is found in all life.  For this process to occur, a few more molecules are needed.  Specifically, RNA polymerase is needed to convert DNA to RNA, ribosomes are needed to convert RNA to protein, 4 ribonucleotides are needed as the building blocks of RNA, 20 amino acids are needed as the building blocks of protein, tRNA are needed to translate RNA to protein, and all these machines need ATP as an energy source.  Aside from those components, no other part of a cell is needed to create protein from DNA.  Removing the rest of the cell creates a simplified environment ideal for studying transcription (DNA to RNA) and translation (RNA to protein).  In the image DNA is being transcribed by a polymerase.  The resulting mRNA is being translated by a ribosome into protein (green chain).  This is all happening in vitro.

Cell-free systems were first used in the 1960s by Alfred Tissieres to study ribosomes and by Marshall Nirenberg and Heinrich Matthaei to break the amino acid code.  These systems were raw extracts.  Cells (usually E. coli) are broken open, lysed, by high pressure, chemicals, or sonication.  The resulting slurry is centrifuged to remove large pieces of the membrane and chromosome, and the extract is ready.  

The benefits of CFPS are greater control over the system, quicker design cycle times, and the ability to produce chemicals that may be toxic to cells.  Now, CFPS research focuses on improving its weaknesses including increasing reaction times (up to 100 hours), increasing protein yields (up to 2.3 mg/mL), increasing reaction scale (up to 100L), lowering the cost of the energy source (14 times cheaper), and studying fundamental questions like incorporating unnatural amino acids (Michael Jewett Lab), testing genetic circuit designs (Richard Murray Lab), and studying noise in transcription and translation (my lab!).  

Noisy Genes

Gene expression is inherently noisy.  This means there is variability in the gene's response to an input.  Say a population of genetically identical cells all receive the same input signal (e.g. concentration of an inducer like IPTG, light, or a stressor like heat).  If you measure the response of each cell to the input, their responses will vary across the population.  Some will respond too much, some too little.  This variability is called noise.  Some of the noise is Poissonian and is a result of the discrete nature of reactions in cells.  

Cells typically only have a single copy of a gene.  So, when an input signal arrives, the gene will start making mRNA which will be translated into protein.  Because there is only one gene, mRNA must be made one at a time (discretely).  The Poissonian description of this noise means these independent, uncorrelated, discrete events are spaced in time according to an exponential distribution.  

This variation was explored by Michael Elowitz et al. in the 2002 paper Stochastic Gene Expression in a Single Cell.  This important paper was one of the first to study noise in gene expression.  Elowitz et al. wanted to understand how much variation in gene expression was intrinsic (specific to a particular gene) and how much was extrinsic (variations in global assets like polymerase and ribosomes).  So, the authors crated a plasmid (injectable sequence of DNA) with cyan fluorescent protein (CFP, on the green channel) on one side and yellow fluorescent protein (YFP, on the red channel) on the other.  These genes are nearly identical, and in this plasmid they are controlled by the same promoter.  Any variations in both colors indicates a global variation or extrinsic noise.  Any variation in a single color over the other indicates noise particular to that gene, intrinsic noise.

As can be seen in the image, many cells are yellow (low intrinsic noise and high extrinsic noise), but some are more red or more green (high intrinsic noise and low extrinsic noise).  The authors tune the level of expression and find noise increases as expression decreases.  That is, as an input gets smaller the response is less fine tuned because proteins are made in discrete numbers.  Additionally, notice how varied the response of an individual cell is to the average of the population.  Even though these cells are growing right next to each other in the same media with the same inputs their protein expressions vary widely.

Green Fluorescent Protein

Green Fluorescent Protein (GFP) is one of the most important proteins discovered, and it earned Osamu Shimomura, Martin Chalfie, and Roger Y. Tsien the 2008 Nobel Prize in Chemistry.  GFP is a fluorescent protein naturally found in the jellyfish Aequorea victoria.  The scientists needed to collect enough protein to crystalize (100mg) and thus had to extract protein from 50,000 jellyfish.  

Fluorescent means the electrons in the protein can be excited to a higher energy state than normal by ultraviolet light, 395 nm.  The electrons then fall back down to their low energy state and release a photon at the 509 nm wavelength, or green!  We use GFP as an easy way to track the production of protein.  The more the solution fluoresces the more protein has been produced.  The picture to the right is an Eppendorf tube containing a cell-free reaction producing GFP.  The tube is resting on a UV table.  

Cell-free Storage

Cell-free reactions are generally accepted to be more fragile than cellular systems.  They require numerous components in strict concentrations for optimal protein production.  For this reason, they are not generally considered for field applications.  In April 2014, Smith et al. (from the Bundy lab at Brigham Young University) report that cell-free systems can be lyophilized (free dried) and stored in a refrigerator with a minimal lack of activity.  The plot to the right, from their paper, shows the relative activity, measure by GFP production in ug/mL, for aqueous extract (blue), and two different lyophilized cell-free systems.  Compared to the standard of storing solutions at -80C, the lyophilized systems lose only about 25% of their activity.

Genome of Babel

There are many concerns about scientists or hobbyists accidentally creating dangerous pathogens.  While engineering pathogens is a serious concern there is little worry about creating a random sequence of DNA that is dangerous or even useful.  

The Library of Babel is a short story published in 1941 by Jorge Luis Borges.  Borges imagines a universe composed entirely of a library.  Every book in the library has the same format, 1,312,000 characters from a 25 character set.  The library contains a book with every combination of characters possible, 25^1,312,000 books.  If the library were built it would be much larger than the known universe.  Because the library contains every combination of letters it contains all knowledge.  It contains everyone's biography, a record of future events, every great work of literature, and any other information.  Of course, it also contains all incorrect knowledge too.  Many of the inhabitants of the Library go insane looking for books with real knowledge.  

In this analogy the books are genomes and the Library is the complete state space of genomes.  The average genome size is ~4 Mbp or 4 million base pairs.  Each base pair can be one of four bases, adenine, thymine, guanine, or cytosine, so the number of unique genomes in the state space is 4^4,000,000 or 10^2,408,240.  

The state space of genomes is so large that every living thing since the beginning of life on earth has collectively only explored a small segment.  DNA Polymerase typically has a 1 in 10,000,000 error rate, so assume every organism has only a single mutation in its 4,000,000 bp genome.  There are ~1.7x10^30 new prokaryotic cells every year, and life has existed for ~4x10^9 years.  So, there have been about 6.8x10^39 unique genomes explored in the state space.  This is a lot.  However 6.8x10^39 << 10^2,408,240, and there are still 10^2,408,201 unique genomes to explore.

While the argument for security through endless combinations works well in fields like cryptography where every sequence is truly independent, it is less guaranteed in biology because, like horseshoes and hand grenades, close counts in biology.  A good example is the Influenza virus.  Some strains are more deadly than others, like the the 1918 Spanish Flu or H1N1, but all are infectious and mildly harmful.  So, exploring the unique genomes around the flu is more dangerous than exploring the unique genomes around S. cerevisiae (yeast).

(The print to the left is by Érik Desmazières)

Vesicle Formation

This is an easy procedure to produce millions of liposomes, artificial vesicles, around 5um in diameter.  First, lipids are dissolved in paraffin oil to create an oil saturated phase.  Second, an inner solution is prepared.  The inner solution will end up inside the vesicles.  In our case, the inner solution is a cell free reaction to produce green fluorescent protein.  Next, the inner solution is added to the paraffin oil, as shown in the top right picture.  Then, vortex for about one minute until the large drop of inner solution is broken up.  The oil should be cloudy now, like in the bottom right image.  At this point, the hydrophilic heads of the lipids are in the water droplets and the hydrophobic tails are in the oil phase.  In this configuration the droplets are called micelles because of the lipid monolayer.  To make liposomes, which have a lipid bilayer, the paraffin oil phase, with the micelles, can be layered above another water phase.  A second lipid monolayer will be formed at the oil water interface.  The micelles can be pulled through the interface by centrifugation.  The resulting liposomes will be pelleted at the bottom of the tube.  Below is a fluorescent image of GFP produced in liposomes.  Liposomes are being investigated to create a minimal cell.

Vegetable DNA Extraction Party

I recently hosted a vegetable DNA extraction party for my friends to create discussion around DIYBio.  We followed a few different protocols but found this one from the University of Utah to be the best.  The basic procedure is to puree the flesh of a fruit or vegetable.  Filter the puree to remove large pieces.  Add detergent/soap/surfactant to the filtered liquid to break up the lipid membrane.  DNA is compacted by coiling around proteins (histones), so add meat tenderizer (a protease) to break up the proteins and free the DNA.  Add salt to the puree to interact with the negatively charged DNA backbone and make the DNA less hydrophilic, water loving.  The final step is to layer cold, 95% ethanol above the puree.  Because the DNA backbone and salt ions want to be close together, and water molecules prevent that, the DNA precipitates out of the water phase and into the ethanol phase.  This can be seen in the image.  The white strands are DNA!  Why would you want to extract DNA from fruits and vegetables?  You could test the DNA for genetic modifications or species fraud by collecting the DNA and sending off for sequencing.  You could PCR amplify specific genes for further cloning.  Or any application you can think of; be creative!  

Reading SNP Data

Last year I tested my genome with 23andMe (luckily, just before the FDA freeze on the health results of 23andMe).  23andMe has a wonderful interface for displaying mutations and the associated studies linking mutations to phenotype.  Unfortunately this display is only available to tests completed prior to November 22, 2013.  Tests run after this date have access to ancestry and their raw data.  The raw data is important because it still allows you to look into your health report without using 23andMe.  Today, I'll show you how to navigate a SNP database (Single Nucleotide Polymorphism, pronounced snip).  Lets say we read somewhere that people with the recessive phenotype at SNP rs601338 are resistant to some strains of Norovirus.  We'll check the validity of this fact and see if the recessive phenotype is in our 23andMe raw data.  

The National Center for Biotechnology Information (NCBI) manages many biotechnology databases including BLAST for nucleotide and protein sequences, PubMed for medical journal articles, dsSNP for SNPs, and many others.  We'll use dsSNP.  Search for rs601338 in dsSNP.  

From the search results we can see the SNP is on chromosome 19, the alleles are either A or G (resulting in AA, AG, and GG phenotypes), the gene is FUT2, and there is a link to the Online Mendelian Inheritance in Man (OMIM) database.  Follow this link.  (In addition, the Varview link provides a great view of the gene neighborhood).

Screen Shot 2014-10-06 at 2.43.28 PM.png

We see this SNP is on the FUT2 gene on chromosome 19 again (a quick check), and finally we see this gene has some association with Norovirus (Norwalk Virus) resistance.  If we scroll further down we can find information about the polymorphisms.  In 1995, Kelly et al. showed the AA phenotype produces proteins with Tryptophan changed to a terminating codon (TAG) resulting in a non-secreting phenotype.  In 2003, Lindsmith et al. showed that the non-secreting phenotype confers resistance to Norwalk Virus infection  Unfortunately, Lindsmith et al.'s article is behind a paywall which is an issue for another day.

Finally, we need to check our raw data from 23andMe to see which phenotype we have.  23andMe organizes the raw data into a TAB-seperated file with the SNP id, chromosome, position, and phenotype.  As shown below, I have the AA, non-secretor phenotype.  This does not mean I am immune to every flu or flu-like virus.  It only means I am resistant (not immune) to the most common strain (not all strains) of Norovirus.  We must be very carful and reserved when interpreting genome data.  

I find this process of digging through biotech databases and articles fun.  However, the TAB-seperated txt file for every SNP tested by 23andMe is 24.8MB (huge!), and it would take forever to go through this process for every SNP.  Fortunately, there are many SNP services to help you comb through your data and understand it readily. (Prometheus is well regarded, but I have not used it).


We often use microfluidics for reaction chambers, protein purification, and controlled environments for bacterial growth.  Microfluidics are defined as working with small volumes (uL, nL, pL, fL).  Small volumes cut reagent costs and allow manipulation of small samples. This is a master, positive photoresist on a silicon wafer, of two serpentine channels.  The photoresist is ~60 microns high.  PDMS can easily be poured over the master, cured, removed, and bound to a glass coverslip to create a sealed microfluidic channel.

Many open source DIYBio microfluidic designs have been tested including Jello, Sharpieadhesive tape, and others.  These efforts can help amateur labs test microfluidic designs and conserve reagents.

Polymerase Chain Reaction

Polymerase Chain Reaction (PCR) is a key technique in molecular biology labs.  It allows DNA to be copied many times over.  Theoretically, each cycle doubles the number of DNA strands, and a typical procedure will have 30 cycles.  So each strand of template DNA creates ~1 billion copies!

PCR exploits three concepts: DNA melting, primer annealing, and DNA polymerase 5' to 3' elongation.  First the temperature is raised to around 100C, causing the DNA double helix to reversibly change from double stranded to single stranded.  This allows the primers access to the base pairs.  The temperature is lowered just enough for primers to bind to their target sequences.  The sequence of the primers allows researchers to select what DNA sequences are copied.  Once the primers are bound, the temperature is increased to allow DNA polymerase to start at the primer and copy DNA in the 5' to 3' direction.

All these temperature changes are done on a thermocycler.  Typically these use the thermoelectric effect to maintain rapid temperature transitions and stable temperatures.  Thermoelectric devices and thermocyclers are generally expensive.  To bring biotech to the public, OpenPCR offers an open source thermocycler.

Liquid Nitrogen

This truck refills our liquid nitrogen tanks.  Liquid nitrogen is important for storing delicate samples like chemically competent cells and cell extract.  The most damaging part of freezing samples for storage is the formation of ice crystals.  This growth occurs in the intermediate zone of ~-15C to -60C.  Flash freezing samples helps them quickly pass through the intermediate zone.  It also helps the water form amorphous ice, vitrified ice, instead of crystal ice.  Once at -80C the samples are relatively stable.  Ideally, samples could be stored at -130C because at this temperature no liquid water remains, and at -196, liquid nitrogen temperatures, no thermally driven reactions can occur.  (Mazur 1984)  

-80C Storage

We store many reagents and samples at -80C.  Clearly, this box of aliquots in our freezer has not been touched in a while!  Keeping samples at such a low temperature limits the activity of proteins and chemicals.  This is also the standard way to preserve isolated cell strains (in our case, E. coli carrying different plasmids and genes).  Cells are preserved in glycerol stocks to prevent the formation of ice crystals that would rupture cell walls.  Cells can be kept for months and years in this condition, always ready to be unfrozen and continue dividing.

Similar storage capabilities can be achieved, at least for cell free protein synthesis reactions, for a much smaller investment (freezers are expensive!).  Bundy et al. recently demonstrated lyophilization, freeze drying, can preserve cell extracts in a 4C fridge to ~60% viability of aqueous storage in -80C freezer.  Their set up could be purchased for around $3k while a reliable -80C freezer is around $10k.  They also found lyophilization reduces bacterial cell contamination in cell extracts.

Plasma Cleaner

After our PDMS microfluidic devices have been molded and cured they need to be bound to a glass coverslip (coverslips are used because they are thin.  The focal point of our 100x oil immersion microscope objective would be inside a regular microscope slide).  Our device and coverslip are placed in a vacuum chamber.  Plasma is generated by radio frequency induction.  The plasma cleans the surface of the glass and PDMS and makes them both hydrophilic.  Once they have been plasma treated the PDMS is be irreversibly stuck to the glass.

Water Purification

These are the heating elements of our distilled water (dH2O) supply.  Water is heated to vapor and condensed into a 12 gallon glass jar.  We use the purified water for making solutions that do not require absolute purity or will be sterilized later, like TAE Buffer or LB medium.  TAE Buffer is used when separating DNA with gel electrophoresis.  It contains EDTA (Ethylenediaminetetraacetic acid, what a tedious name!) which chelates (binds/sequesters) metal ions that are required by enzymes that degrade DNA.  So, it protects DNA from degradation in solution.  TAE is not used for long term storage of DNA, so it is not vital that that the water be nuclease free.  LB is a well documented growth medium for bacteria, especially E. coli.  We use it for growing E. coli cultures.  Once the medium is mixed into solution it will be sterilized in the autoclave so the sterility of the dH2O is not an issue.