DIY: Gene synthesis codon optimization

Do not use free codon optimization algorithm provided by some
"leading" companies, such as GeneArt, Genscript,  IDT etc. A
company has recently synthesized ~ 100 genes using GeneArt software, or
sequences provided by Genscript and IDT, 90% of the synthetic genes
could not be expressed in E coli, or expressed at extremely low
level (compared to wt).

Save money to do your own optimization for expression of your gene in
E coli:

1. Analyse wild type DNA sequence using
this website: http://nihserver.mbi.ucla.edu/RACC/

Red = rare Arg codons AGG, AGA, CGA

Green = rare Leu codon CTA

Blue = rare Ile codon ATA

Orange = rare Pro codon CCC

for the following input sequence:

gga cca aac aca gaa ttt gca CTA
tcc ctg tta agg
aaa aac ATA
atg act ATA
aca acc tca aag gga gag ttc aca ggg tta ggc ATA

cat gat cgt gtc tgt gtg ATA
CCC
aca cac gca cag cct ggt gat gat gta CTA
gtg aat ggt cag aaa att aga
gtt aag gat aag tac aaa tta gta gat cca gag aac att aat CTA
gag ctt aca gtg ttg act tta gat aga
aat gaa aaa ttc aga
gat atc agg
gga ttt ATA
tca gaa gat CTA
gaa ggt gtg gat gcc act ttg gta gta cat tca aat aac ttt acc aac act atc
tta gaa gtt ggc cct gta aca atg gca gga ctt att aat ttg agt agc acc CCC
act aac aga
atg att cgt tat gat tat gca aca aaa act ggg cag tgt gga ggt gtg ctg tgt
gct act ggt aag atc ttt ggt att cat gtt ggc ggt aat gga aga
caa gga ttt tca gct caa ctt aaa aaa caa tat ttt gta gag aaa caa

The length is: 546 nucleotides


Number of total single rare Arg codons:
7

 

occurring at codons:
12, 55, 79, 84, 87, 133, 166

Number of tandem rare Arg codon double repeats:
0

Number of tandem rare Arg codon triple repeats:
0


Too lazy to beautify this new part right now…Results are in order for
Arginine, Leucine, Isoleucine, and Proline, respectively (delimited by
numbers
1.1, 2.1, 3.1, 4.1 for singles; 1.2, 2.2, 3.2, 4.2 for doubles; etc.).
Single rare codons at positions:
12 55 79 84 87 133 166 (agg)|(aga)|(cga)1.1 8 48 70 94 cta2.1 15 18 30
37 90 ata3.1 38 130 ccc4.1

 

Double rare codons at positions:
(agg)|(aga)|(cga)1.2 cta2.2 ata3.2 ccc4.2

Triple rare codons at positions:
(agg)|(aga)|(cga)1.3 cta2.3 ata3.3 ccc4.3

 

2. Change  rare codons 
Ile: ATA
Leu: CTA
Pro: CCC
Arg: AGG, AGA, CGG, and CGA
based on this
E coli codon usage table:

http://www.my-whiteboard.com/e-coli-codon-usage-table-2/

For example:

L:
cta –> CTG

I:
ata –> ATC or ATT

R:
aga –> CGC or CGT
cgg –> CGC or CGT

cga–> CGC or CGT
agg –> CGC or CGT

P:
ccc –> CCG

T:

ACA–>ACC, ACT, ACG

3. Change second amino acid to A (gct, gca), K (aaa) or S (agc, tcc, tct)

4. Change G and C to A and T at the 5'-end.
A high GC content in the 5'-end of the gene of interest –> formation of secondary structure in the mRNA —> Inefficient translation —> low expression

5.  Remove cis-acting DNA sequences such as internal TATA-boxes, chi-sites, and ribosomal entry sites; AT-rich or GC-rich sequence stretches; repeat sequences; and RNA secondary structures. Remove internal Shine-Dalgarno sequences such as AAGGAG(nnnnn)ATG, GAAGGAGA(nnnnn)ATG, AAGGAGG(nnnnn)ATG, AAGGAGGT(nnnnn)ATG, GGAG, GAGG, and AGGA.

For example, you should change ATAATA to ATCATC; change AGAAGA to CGCCGT; ATAAGA to ATCCGT; AGAATA to CGCATT; ATAAGG to ATTCGT; AGAAGG to CGTCGC

 

6. Add two stop codons TAATAA

7. Append another strong terminator to the end of your DNA sequence

 

References:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC139613/

December 22, 2012 at 3:59 pm

2 Comments »

  1. trouble in terrorist town said,

    June 12, 2014 @ 9:31 am

    Hi there! I’m at work browsing your blog from my new apple iphone!

    Just wanted to say I love reading your blog and look forward to all
    your posts! Keep up the outstanding work!

  2. compare motor trade insurance said,

    July 20, 2014 @ 10:53 pm

    Wow, awesome blog layout! How long have you been blogging for?
    you make blogging look easy. The overall look of your web site is excellent, as well
    as the content!

RSS feed for comments on this post · TrackBack URI

Leave a Comment