Expression Vector Optimization
Many recombinant proteins can be produced by cloning the corresponding genes in standard expression vectors. The main benefits of this approach are increased speed and reduced costs. However, it is unlikely to lead to optimal levels of recombinant protein expression. The goal of vector optimization is to maximize the yield of a recombinant protein manufacturing process.
Several control parameters (transcription, translation, plasmid copy number) can be leveraged to optimize protein expression in E. coli. Increasing these three parameters at the same time is unlikely to lead to the desired level gene expression. An expression vector design strategy relying on a high plasmid copy number, a strong promoter, and a efficient ribosome binding site will not lead to great results. The most likely outcomes of this naïve strategy are a fitness penalty for the host cell, misfolding of the recombinant protein that will accumulate in inclusion bodies, and ultimately, a poor yield of protein production.
It is necessary to carefully balance transcription, translation, and plasmid copy number to maximize the production of a recombinant protein. Unfortunately, it is not possible to predict which combination of genetic parts will make the best expression vector.
Instead, it is possible to produce a family of related plasmids by combining different genetic parts using recombinant DNA technologies. Using this plasmid library, it is possible to collect enough data that will help us understand how these different parameters interact with each other. These data can be captured in a mathematical model that will be used to predict the optimal combination of genetic parts.
Obviously, a large set of expression vectors included in the experiment will lead to a large dataset and strong predictions. In this context, being able to design and test a library of related expression vectors determines the overall success of the experiment.
Using the GenoFAB platform, we helped the client develop a process encompassing all the phases of the project.
Design: Combinatorial library of 48 expression vectors
We helped the team design a minimal expression vector. This plasmid was designed to facilitate assembly operations. It also included various tools to finely adjust different expression control parameters.
The team generated a library of 48 plasmids using the plasmid design services to perform a proof of concept experiment.
Build: combinatorial assembly of the plasmid library
Minimizing the cost of assembling the plasmid library was a priority. To this effect, the team designed a set of reusable DNA fragments. This set of fragments could be combined to assemble the 48 plasmids included in the experiment. Using build services, they produced 96 PCR primers to produce overlapping amplification products suitable for Gibson assembly and other standard molecular biology techniques.
We designed a plasmid assembly process optimized for speed, stability, and reproducibility. To facilitate preparation of assembly reactions, the process generated pipetting instructions for their liquid handling system. Pipetting schemes to prepare PCR and Gibson assembly reactions were directly derived from LIMS data.
The Laboratory Information Management System supported the process allowing users to generate barcoded labels to track samples throughout the process. The LIMS data model was customized to capture all the samples generated by the vector assembly process
Finally, the plasmid assembly workflow ended with a sequence verification step. Short sequencing reads were assembled using a de novo assembly strategy. The contig produced by the assembly step was compared with the plasmid reference theoretical DNA sequence.
Test: automating the estimation of gene expression data
One of the potential bottlenecks of this project was the testing phase. It can be challenging to collect recombinant protein expression data on a large number of expression vectors. It would have been too slow and expensive to grow each cell line in a fermentor and purify the protein.
Instead, the team decided to use the expression of selection markers as a proxy. They thought that they could measure the cell line fitness in different growth conditions.
They were using a plate imager as the primary instrument to measure gene expression. Raw data were processed and imported in a database for future analysis. A data service was developed to analyze the data. The application aggregated gene expression data and LIMS data to estimate protein expression levels achieved by each of the 48 plasmids.
A statistical model made it possible to describe how the vector architecture influenced gene expression level. This helped determine how the different parameters controlling gene expression influenced the yield of recombinant protein production. The model made it possible to tune the plasmid copy number, the transcription of the gene of interest, and its translation to maximize gene expression.
Results from this experiment led to a follow-up project to design a next generation expression vector. The client considers patenting these expression vectors to gain a competitive advantage in the marketplace.