If you've eaten vegan burgers that taste like meat or used synthetic collagen in your beauty routine -- both products that are "grown" in the lab -- then you've benefited from synthetic biology. It's a field rife with potential, as it allows scientists to design biological systems to specification, such as engineering a microbe to produce a cancer-fighting agent. Yet conventional methods of bioengineering are slow and laborious, with trial and error being the main approach.

Now scientists at the Department of Energy's Lawrence Berkeley National Laboratory (Berkeley Lab) have developed a new tool that adapts machine learning algorithms to the needs of synthetic biology to guide development systematically. The innovation means scientists will not have to spend years developing a meticulous understanding of each part of a cell and what it does in order to manipulate it; instead, with a limited set of training data, the algorithms are able to predict how changes in a cell's DNA or biochemistry will affect its behavior, then make recommendations for the next engineering cycle along with probabilistic predictions for attaining the desired goal.

"The possibilities are revolutionary," said Hector Garcia Martin, a researcher in Berkeley Lab's Biological Systems and Engineering (BSE) Division who led the research. "Right now, bioengineering is a very slow process. It took 150 person-years to create the anti-malarial drug, artemisinin. If you're able to create new cells to specification in a couple weeks or months instead of years, you could really revolutionize what you can do with bioengineering."

The project was led by Jie Zhang and Soren Petersen of the Novo Nordisk Foundation Center for Biosustainability at the Technical University of Denmark, in collaboration with scientists at Berkeley Lab and Teselagen, a San Francisco-based startup company.

To conduct the experiment, they selected five genes, each controlled by different gene promoters and other mechanisms within the cell and representing, in total, nearly 8,000 potential combinations of biological pathways. The researchers in Denmark then obtained experimental data on 250 of those pathways, representing just 3% of all possible combinations, and that data were used to train the algorithm. In other words, ART learned what output (amino acid production) is associated with what input (gene expression).

Then, using statistical inference, the tool was able to extrapolate how each of the remaining 7,000-plus combinations would affect tryptophan production. The design it ultimately recommended increased tryptophan production by 106% over the state-of-the-art reference strain and by 17% over the best designs used for training the model.

More data needed

The researchers say they were surprised by how little data was needed to obtain results. Yet to truly realize synthetic biology's potential, they say the algorithms will need to be trained with much more data. Garcia Martin describes synthetic biology as being only in its infancy -- the equivalent of where the Industrial Revolution was in the 1790s. "It's only by investing in automation and high-throughput technologies that you'll be able to leverage the data needed to really revolutionize bioengineering," he said.

The unique capabilities of national labs

Besides the dearth of experimental data, Garcia Martin says the other limitation is human capital -- or machine learning experts. Given the explosion of data in our world today, many fields and companies are competing for a limited number of experts in machine learning and artificial intelligence.

"The national labs provide the environment where specialization and standardization can prosper and combine in the large multidisciplinary teams that are their hallmark," Garcia Martin said.

Synthetic biology has the potential to make significant impacts in almost every sector: food, medicine, agriculture, climate, energy, and materials. The global synthetic biology market is currently estimated at around $4 billion and has been forecast to grow to more than $20 billion by 2025, according to various market reports.

(A detailed version of this story is available at https://www.sciencedaily.com/releases/2020/09/200925113447.htm)