DNAI - The Artificial Intelligence / Artificial Life convergence
Part 1: When AI Bots do Genetic Engineering
Happy new year. I hope folks found rest and reasons for hope in the turning of the calendar. I’m going to continue occasionally writing here at scanthehorizon.org through 2024 while paying close attention to the controversy over whether this platform, Substack, is crossing an ethical line profiting from hate speech (For now Substack makes no money from ScantheHorizon so I don’t feel too complicit).
This longer post below is the first in what will likely be a two-part essay on how the worlds of artificial life (Synthetic Biology) and artificial Intelligence are rapidly merging. Its a topic I’ve been trying to puzzle out as a member of the UN’s Multidisciplinary Ad Hoc Technical Expert Group (mAHTEG) on Synthetic Biology. I’ll explain a bit about both Syn Bio and AI below but I’ve come to see that there are broadly 3 ways in which developments in these fields are converging - into what my friend Pat Mooney snappily calls “DNAI” - and I think the significances of that convergence are huge.
Firstly, There’s the way that Artificial Intelligence tools (such as generative AI) are being used to design synthetic organisms or synthetic biological ‘parts’ such as DNA and proteins . including so-called ‘alt-proteins’. I call this ‘When AI bots do genetic engineering’ - and thats the ‘part 1’ essay shared below.
Secondly , Synthetic Biology is being increasingly used to design organisms as a component of bigger cyber-physical systems driven by AI and algorithmic decsion-making. The example I will write about from digital agriculture is what I call ‘robot-ready crops’ . These are crops genetically engineered to emit different signals that can be interpreted by AI systems to direct autonomous agricultural machinery.
Thirdly, We are also seeing synthetically engineered biological organisms and parts become the substrate for AI computation itself - so called ‘biocomputing’. Instead of using silicon chips to compute, DNA or other cellular processes can carry out computation which is promised as a ‘low energy’ computing option. One arresting example from the past year is the emergence of organoid computing where lab-grown brain cells are now being cultured as AI computers.
I’ll deal with the second and third areas in a post yet to come.
The big questions of course cross all three areas: eg. what does this mean for nature?, for our economies and peoples livelihoods?, for safety?, governance? for who we are in the world and how we relate to each other? etc etc..
Part 1. When AI Bots do Genetic Engineering: Text-to-organism platforms, the engineered protein explosion and AI biopiracy.
“Limits are now being breached. We are approaching an inflection point with the arrival of these higher order technologies, the most profound in history. The coming wave of technology is built primarily on two general-purpose technologies capable of operating at the grandest and most granular levels alike: artificial intelligence and synthetic biology”
– Mustafa Suleyman Co-founder of DeepMind (now owned by Google) and Inflection AI, The Coming Wave -Technology, Power and the 21st century’s Greatest Dilemma, p55
I was at a supper recently with a graphic designer explaining how she uses generative AI tools (like Midjourney or Dall-E ) to design images. Many readers are probably familiar with these text-to-art AI generators and I mistakenly assumed generative AI tools had become pretty commonplace. But her explanation of typing a descriptive text prompt into a programme to auto-magically generate a full AI image amazed my fellow dinner guests (civil society activists) as if the whole ChatGPT hoopla hadn’t happened yet. I found myself wondering how they would react if I shared what I’d just been researching about text-to-organism software. I thought better of that, kept quiet and sipped my soup. I think they might have freaked them out a bit. It freaks me out a bit.
“Text-to-organism” is my shorthand for an imagined system where you type (in text) the characteristics you want from a genetically engineered organism and then an AI agent custom-designs the DNA for you, maybe even printing out the modified DNA ready to be engineered into a living organism.
Unlike Midjourney or ChatGPT, text-to-organism capability doesn’t exist yet - but its not as far off as you might assume. Along with other developments along the rapidly fusing frontiers of artificial life and artificial intelligence, developments that three years ago would have been classed in the ‘Science Fiction/Fantasy’ section are now marked up as ‘Business/Venture Capital’ . They will soon enough come round to dinner table conversation (and urgent action points on policy memos).
Speaking of policy-making, I am part of the UN’s Multidisciplinary Ad Hoc Technical Expert Group (mAHTEG) on Synthetic Biology - under the Biodiversity Convention . We are a small group charged by governments with ‘horizon-scanning, assessment and monitoring’ of new developments in cutting-edge genetic engineering (or ‘synthetic biology’). I was glad when the mAHTEG took up this topic of AI-synthetic biology integration. . However, when I started to look more carefully I was astonished by how far the field had moved . In particular, it appears that the developments in so-called ‘generative AI’ - the generation of new texts, pictures, videos etc exemplified by Open Ai’s ChatGPT - is now bringing along a parallel shift in how life can be redesigned, manipulated and engineered. It has also brought a jolt of additional capital , corporate enthusiasm and hype into the already overheated commercial space known as ’Syn Bio’. The biggest corporate players in the AI revolution , many with no institutional background in biology , are now eagerly starting to play around with the littlest parts of living things . As I recently wrote to my fellow mAHTEG members ”The current speed, investments and shifts in technical and corporate developments in this arena should have us all sitting up in our seats with great attention and focus to address this topic.“
‘Synthetic biology’ (Syn Bio) is better known in tech, investment and science policy circles. It is presented as the building or “programming” of artificial lifeforms (a sort of “GMO 2.0” or “extreme genetic engineering”). As a field ‘Syn Bio’ was founded two decades ago upon the use of large sets of genomic data (big data) and new tools with which genetic engineers could rationally redesign biological parts, organisms and systems. Using AI to re-design genetics was always there in some form. Synthetic biology firms such as Amyris Biotechnologies, Zymergen and Gingko Bioworks ( only the last one still exists) have used forms of artificial intelligence, for over a decade - leveraging algorithms to sort through genomic data and select viable DNA sequences as part of their genome-design processes. However in the past year the release of massive AI ‘foundation models’ (such as Open AI’s ‘Chat GPT’ , Google’s ‘Bard’, Meta’s ‘LLAMA’ or Stability AI’s ‘Stable Diffusion’ ) has not only effected a historic switch in the AI space. It is also now driving a parallel switch in the Syn Bio space. This switch is described as a move from “discriminative AI to generative AI” and in biotech it accompanies a switch from genomic analysis to a more generative synthetic biology itself.
To put it roughly, whereas the ‘discriminative AI’ of the past decade was about sorting differences between data to identify clusters and outliers (high throughput automated analysis), “Generative AI” directly generates novel forms or rearrangements of that data (high throughput automated de-novo design). Today’s ‘large language models’ (LLM’s) such as ChatGPT respond to a human request or “prompt” by predicting what data elements will most likely satisfy the user’s request . The AI then automatically stitches together texts, images or videos that appear to us as entirely new and creative but are actually just a series of predictive guesses. Generative AI programmes output these novel forms into the world as synthetic images, synthetic text, synthetic video etc
The mechanism by which generative AI works may look like a bit like magic, but really its just statistics. First these systems undertake massive computational “machine learning” or “training”. This sorts and maps patterns of relationship between elements within billions of digitally ingested texts and images to create algorithmic rules. eg the training may record how much more often the word ‘pizza’ is associated with the word ‘cheese’’ rather than the word ’elephant’ - and make a rule out of that. AI experts then “fine tune” these general models of relationships on more specialized training datasets (to refine the rules). The system then generates outputs coherent with the algorithmic rules that the trained model has deduced. eg by choosing the most likely thing statistically (eg the most likely next word in a paragraph). This creates seemingly new variations of text or image that satisfy the users request for novel but coherent “synthetic media” .
if we believe the self-reporting of Open AI and others then generative AI tools represent the fastest commercial take-up of a new technology in history. For this reason they are attracting astronomic sums of speculative money. At a time when investment money is tighter than it used to be, the high expectation of commercial returns is cracking the whip , driving the direction, nature and speed of development in AI. These tools arriving so fast into commercial use, accompanied by magical claims and urgent need for financial returns, stretch and break our governance mechanisms across many domains.
The next big tech investment? – a rush to AI-enabled SynBIo.
Having seemingly mastered synthetic text, image, sound and video generation , the leading AI firms (which include the world’s very largest and richest companies by market cap) are now focusing on exploiting other commercially valuable forms of ‘language’ that they might train their ‘large language models’ on. In particular there is incredible excitement for AI to master the biological ‘languages’ of genomics and proteinomics (and by extension the field of synthetic biology). Having spectacularly ‘disrupted’ graphic design, script writing, advertising, movies, the legal profession, journalism and more, now the ‘life’ industries (agriculture, food, conservation, healthcare) as well as biological production of materials is firmly in the crosshairs for AI firms looking for their next ‘killer app’ or payout. This is seen as a next multi-trillion dollar commercial frontier for the generative AI revolution to ‘disrupt’ and deliver big investor payouts.
The Senior Director of AI Research at NVIDIA (the leading producer of AI chips) , Anima Anandkumar, describes one example of how an AI corporate leader is making this switch to applying generative AI capabilities to synthetic biology . She describes NVIDIA’s new “GenSLM” large language model this way:
“Rather than generating english language or any other natural language, why not think about the language of the genomes. You know, we took all the DNA data that is available - both DNA and RNA data, for virus and bacteria that are known to us - about a hundred and ten million such genomes. We learnt a language model over that and then we can now ask it to generate new genomes” (source: NVIDIA - The AI Podcast - Anima Anandkumar on Using Generative AI to Tackle Global Challenges – Ep203 -
)
If NVIDIA’s airy talk about just ’ taking’ all that DNA data sounds like a big colonial grab (biopiracy) - thats because it is. I’ll come back to that.
As a demonstration of GenSLM, Nvidia recently ‘fine tuned’ their genomic language model (based on those 110 million ‘taken’ genomes) on a further dataset of 1.5 million COVID viral sequences in order to generate DNA sequences for novel coronavirus variants . They did this, so they said, in order to aid in pandemic prediction and vaccine design – although what they didn’t acknowledge out loud is that this also has dual use security concerns. If you or I decided to generate blueprints for infectious strains of biological viruses on your computer expect security agencies to very quickly come knocking at your door. Anyway i’ll come back to that too.
NVIDIA proudly announced that amongst the synthetic variants of COVID that GenSLM generated digitally there were plus for strains that closely match actual recent new biological variants that have emerged in nature since the original training set. That is: they correctly predicted how Covid might mutate. This matters: It helps with surveillance efforts and vaccine design. But this work is not just about Covid. NVIDIA emphasizes that their underlying genomic large language model (GenSLM) could now be fine-tuned to create de-novo genomes of other viruses or bacteria, enabling new syn bio microbes to be automatically generated much as ChatGPT generates texts. Once they have those new sequences the DNA can be either ‘printed out ‘on a DNA synthesizer or tweaked in a living organism using CRISPR-type gene editing tools. Its a reasonably trivial step these days to go from designing a new genome blueprint on a computer to implementing it in biological wetware - the hard part is getting the design right. In effect NVIDIA has created a first-pass ‘ChatGPT for virus and microbe design’.
It should be noted that NVIDIA – a traditional chipmaker and the worlds 6th largest corporation by market cap - was not previously seen as a life sciences company. As such it may not have the right in-house instincts for dealing with microbiological risk or the complex politics of engineering biology. Even biotech companies aren’t too good at this - witness Bayer (formerly Monsanto’s) trail of global protests, lawsuits, genetic pollution etc.
NVIDIA is far from the only trillion-dollar silicon tech giant applying generative AI to Syn Bio organisms and parts. Meta (Facebook), Microsoft (who now control Open AI) , Alphabet (Google) and Stability AI are all investing heavily in developing generative AI tools for synthetic biology. The first 3 of these, like NVIDIA, are also among the 7 richest corporations in the world. The established corporate giants of the biotech world (eg Bayer, Syngenta, Corteva) are also using this generative AI approach or contracting with smaller firms that employ it on their behalf. One recent report by a UK AI think tank, The Ada Lovelace Institute, suggests that the market for AI-driven genomics technologies could reach more than £19.5 billion (about $25 billion USD) by 2030, up from just half a billion in 2021. – but that prediction may already be out of date given the pace of developments.
The visceral impact of merely-digital generative Ai is already being felt across many other economic sectors (eg. entertainment, law, education, advertising). However biotech and AI leaders are touting that applying generative AI to biology is going to be a much more explosive act of “disruption” than what we have seen to date – even predicting a ‘splitting the atom’ moment for AI.(see below) . In his recent bestseller book “The Coming Wave’ the co-founder of Deep Mind (now owned by Google) Mustafa Suleyman talks up the current coming together of generative AI with Syn Bio as the most significant “superwave” that technologists have ever seen. Another AI-Syn bio proslytiser is Jason Kelly, CEO of Syn Bio ‘unicorn’ Gingko Bioworks. His firm recently inked a five year partnership with Google to train large language models for synthetic biology. He describes the exceptional commercial opportunity of applying generative AI to Syn Bio like this:
“Here’s why ‘Bio’ is particularly interesting for folks who are interested in AI: The idea of a foundation model plus fine tuning with specialized data - people in AI understand that. Let’s try that with one of the categories of English - lets say ‘legal’ . That thing has to compete with a lawyer at Robeson Grey trained for 15 years, taught by other humans, writing contracts designed to be understood by human brains in English (a language that coevolved with our brains). That also gives us leverage from how our brains work - and so we are asking these computer brains - neural nets - to compete with us on our turf. it’s a pretty high bar that its got to compete with.
Now lets go over into biology. I remind you it runs on code (sequential letters) feels a lot like language - but it ain’t our language, we did not invent it , we do not speak it. we do not read it, or write it and so I feel like these computer brains are going to kick our ass a lot faster in this domain than they do in English ... if you are looking to understand where Ai is really going to flip the script - not be a low level Clay Christensen disruption (which is what’s happening in English) - but rather be like splitting the atom: its Bio.”
(Jason Kelly speaking on ‘No Priors’ Podcast ep34 – see 12:50 at
)
Black box biology?
In short: Kelly is pointing out that AI bots will be more at ease with navigating the logic of engineering biology than human brains are and so increasingly we human brains will struggle to understand what the AI bots for genetic engineering are doing. Reassuring, eh? One of the foundational issues in AI ethics is already something called ‘the black box problem’. Effectively this describes how AI systems evolve complex sets of rules for themselves that are non-obvious nor able to be easily understood by humans . However these self-evolved rules then cause AI systems to make decsions that have real world impacts. The black box problem becomes thorny when for example an AI trained self-driving car fails to recognise a bicycle and drives into it or decides to accelerate into the side of a truck. Because the decsion-making process of an Ai is a ‘black box’ to us humans we can’t understand why it made that potentially fatal decision and so its almost impossible to guard against.
Related to this is the phenomenom of AI ‘hallucinations’ by generative AI systems. Large language models such as ChatGPT routinely incorporate elements in their output that appear compelling but are factually inaccurate or bizarre: Living people are described as deceased, dates are given wrongly, generative AI images of people develop additional body parts or mangled unreadable signage is created etc. While such hallucinations and black box failures can be problematic enough in the 2 dimensional and electronic domains of text, image, video or sound, they could be highly problematic if incorporated into genomic design of four dimensional living organisms or of active biological proteins released into the body or the biosphere.
Genetic engineers already commonly face problems of unexpected and emergent effects from small genomic changes (even as small as a single base pair). If an AI-designed genome was to begin to behave unpredictably or have significant side effects it may be impossible to understand why those changes have happened or to locate the cause until long after the organism or protein had entered the biosphere. It is not clear how biosafety assessment of AI-designed organisms or proteins can proceed when the system is not even able to explain its own design rationale in ways that we can comprehend. In response to the wicked problems of the AI black box , the European Union is now prioritizing development of ‘Explainable AI’. Governments may also wish to insist that any organism, protein or other biological components designed through generative Ai must be able to provide strong human-understandable explanations of its design decisions.
Generating novel nano-machines - Making AI ‘speak protein’
To go back to Gingko Bioworks, their AI collaboration with Google is initially focusing on using generative AI to design proteins - drawing on their in-house codebase of 2 billion protein sequences. Gingko’s Jason Kelly explains that “The new idea is “Can I make a foundation model that … speaks ‘protein’ just like GPT4 speaks english?” (source).
It’s worth focusing a moment on why ‘proteins’ matter as a commercial target. While that word popularly evokes a type of food ingredient (meat, dairy, beans etc), a protein is actually a very particular kind of biological form: - a chain of amino acids, folded into a 3d structure which in turn often has bioactive properties. High school biology reminds us of the so-called ‘central dogma’ of genetics where the DNA in the cell is translated into particular RNA stands which code the order by which amino acids link in long chains (polypeptides) . These chains then fold into intricate proteins: nano-sized structures which carry out life processes. These include enzymes and catalysts to enable, speed up or slow down key biochemical reactions as well as those proteins that build up into living and non-living materials. Proteins have been described as nature’s nano-machines - carrying out much of the work of the living world down at the molecular scale. How the amino acids are coded by DNA and RNA and then folded into different structures determines how these machines are ‘programmed’ . So being able to ‘speak protein’ is about having the language to programme biological nano-machines.
For years the crux of research into proteins was about trying to understand how amino acid sequences fold and unfold themselves into shapes with unique and important biological and structural roles. One by one protein scientists tried to predict how a given linear amino acid sequence would then fold into a particular 3d shape with specific physical and biological properties. It was slow-going work. Then in 2018 Artificial Intelligence firm Deep Mind (owned by Google) seemingly solved the protein-folding puzzle. Deep Mind’s Alphafold programme was an AI model trained on over 170,000 proteins from a public repository of protein sequences and structures and it seemed to successfully predict the shape of a folded protein from its initial linear code almost every time. In 2022 Deep Mind released a codebase of protein folding predictions containing almost every protein known to science - almost 200 million different structures. The alphafold breakthrough is often held up as one of the strongest examples of an AI ‘success’ in science.
But working with only known proteins is not enough. Following the success of Alphafold several of the first wave of generative Ai models in Synthetic Biology are focusing on generating entirely new proteins never before seen in nature (“generative protein design”) as well altering and ‘optimising’ existing natural proteins.
There are in fact now several generative AI tools for protein engineering with names like ProtGPT2, ProteinDT and Chroma - but there are also a number of startups (beyond Gingko) focused entirely on using generative AI for creating a range of novel proteins for commercial markets including enzymes, catalysts, food ingredients, pharmaceuticals, biomaterials, coatings, gene therapy and more. In another example of how AI is bringing unusual tech entrants into Syn Bio, global cloud data company Salesforce has developed ProGEN: This is yet another AI large language model for generating novel proteins. This model was trained by feeding the amino acid sequences of 280 million different proteins into a machine learning model. Salesforce then fine-tuned the model by priming it with 56,000 sequences from just one class of protein: lysozymes – in order to generate functional novel lysozymes (used, amongst other things, for food ingredients). A report on this work in Science Daily emphasises just how huge the protein design space is for novel variation just within this one class of proteins:
“With proteins, the design choices were almost limitless. Lysozymes are small as proteins go, with up to about 300 amino acids. But with 20 possible amino acids, there are an enormous number (20 to the power of 300) possible combinations. That's greater than taking all the humans who lived throughout time, multiplied by the number of grains of sand on Earth, multiplied by the number of atoms in the universe. Given the limitless possibilities, it's remarkable that the model can so easily generate working enzymes.”
As noted, Lysozymes are an example of food ingredients (salesforce began with egg proteins) and using AI to design novel synthetic ‘alt proteins’ for processed food markets fits perfectly with foodtech ambitions by investors who claim that high tech artificial proteins can be a profitable ‘climate fix’ (by displacing livestock proteins) or a biodiversity fix (by replacing environmentally damaging protein harvesting of palm oil or fish oil) . There are numerous problems with these simplistic green claims but it’s worth noting that one of the factors really driving these new “alt protein markets” is the commercial opportunity opened up by such protein engineering tools - including AI. Alt-protein enthusiasts such as George Monbiot of the Guardian like to refer to this sort of synthetic bioproduction as “precision fermentation”. The “precision” bit of that buzz term is mostly just hype but it also increasingly refers to AI-driven design.
Food ingredients are just one slice of the future engineered protein market that companies like Salesforce or Gingko are chasing. (Gingko has its own Alt-protein food ingredients company called Motif Foodworks) . Syn Bio companies are also developing engineered alt-proteins as coatings, sweeteners, pesticides, packaging etc – including several uses that will involve environmental release or human and animal ingestion of these novel protein entities.
Planetary boundaries for novel entities
By itself this new AI-powered ability to generate a wider range of novel proteins ever faster for industrial use should be regarded as a potentially significant industrial shift in production patterns. It’s a shift that may have huge impacts on health, economies and on biodiversity in the longer term once a greater variety of engineered proteins make it into the market, our bodies and the biosphere. This should place a high requirement on monitoring, assessment and reporting as well the need to develop systems of recall, clean up and liability should problems emerge. None of that infrastructure is being put in place right now. A historical point of comparison might be the advent of synthetic chemistry techniques and establishment of the accompanying petrochemical-based synthetic chemical industry in the late 19th and early 20th century that flowed from the new techniques to ‘crack’ hydrocarbons . The generation of a range of commercially valuable novel chemical molecules before proper oversight and regulation were in place led to rapid dispersal of many thousands of different synthetic chemicals into the biosphere long before meaningful toxics laws were framed. Many of these synthetic chemicals are now subject to complicated and difficult global efforts at clean-up or mitigation (or attempts at industrial replacement) because of the unexpected biological and health effects of synthetic compounds interacting with the natural world. It is estimated that there are currently between 140,000- 350,000 different types of manufactured chemicals being released to the biosphere at approximately 220 billion tonnes per year and that the USA alone adds approx. 15000 new synthetic chemicals to the inventory every year. Most of these are new-to-nature and many are toxic at some concentration. In early 2022 scientists reported that humans had breached the safe ‘planetary boundary’ for novel chemical entities in the biosphere.
Now consider the prospect of unleashing a new generative synthetic protein industry, undergirded with massive speculative capital intended to artificially generate an array of never before seen proteins for quick venture capital returns. Once again this industry is gunning to realise profits in the marketplace ahead of deliberate international discussion and rule-setting. This should raise significant red flags. That this unleashing is supercharged with the current investment hype on AI is doubly worrying. Recall that proteins have been described as intricate nanomachines whose interactions govern most life processes at the molecular level. Synthetic proteins as a class of complex molecules may therefore be more likely to be biologically active (and disruptive) than simple synthetic chemical compounds – indeed they may be deliberately designed for industrial reasons to speed up, slow down, transform or otherwise alter molecular biological processes at the basis of life - thereby requiring more complex safety assessment . Observers have noted for example that synthetically engineered proteins appear to be more stable than naturally evolved proteins, – which may raise comparisons with the persistence problems of certain classes of synthetic chemicals. (eg POPs).
It was from the enormous challenge of trying to deal with the negative effects of unassessed, poorly understood synthetic chemicals that the Precautionary Principle was first established in environmental governance. This principle roughly states that its is appropriate and prudent to take early action to prevent, regulate and control an emerging threat even before we have all the data to conclude on its exact nature . The precautionary approach is enshrined in the preamble to the convention on Biological Diversity as well as in the first objective (article 1) of the Cartagena Protocol on Biosafety and is also principle 15 of the Rio convention that emerged from the 1992 Earth Summit. The precautionary principle was crafted exactly to try to prevent highly disruptive technological developments running into widespread application ahead of proper oversight and governance. This time we have the chance to apply it before the number of novel protein entities entering the biosphere starts to mimic the toxic trajectory of synthetic chemicals. If synthetic engineered proteins become a rapidly expanding, structurally diverse and widely distributed class of novel synthetic entities they will enter the biosphere and require new forms of biosafety assessment and oversight. Left unchecked this would further worsen the overreach of the planetary boundary on novel entities adding both novel protein entities and novel genome-engineered entities to the biospheric load..
“Text to Protein” may mean greater distribution, dual use, livelihood impacts.
Even more concerning is that the industrial generation of de novo proteins through AI-generated syn bio may more quickly become widely distributed, automated and difficult to manage. The spread of the industrial chemistry industry was somewhat slowed by the need for large expensive production facilities. This wider distribution of synbio entities could come as a result of new protein engineering Ai tools. Just as Chat GPT swiftly enabled millions of ordinary users with just a web browser to enter natural language text descriptions in order to generate synthetic media, so new foundation models are being developed for natural language “text-to protein” discovery. In a system like ProteinDT which describes itself as a ‘text-guided protein design framework’ a user can write in natural language (such as English) the broad characteristics that they want to see in a synthetic protein (eg high thermal stability or luminescence). The generative AI model will then generate multiple viable synthetic protein sequences. These can be selected and created out of synthetic RNA strands (eg expressed by an engineered microbe or in a cell free system) . The equipment to turn these designs into reality is in itself is becoming more distributed.
This distributed “text-to protein” model could make oversight even more difficult. For example one paper on text-to-protein generation acknowledges that “Although text-based protein design has many potential positive applications in agriculture, bioengineering, and therapeutics, it can be considered a dual-use technology. Much like generative models for small molecules (Urbina et al., 2022), ProteinDT could be applied to generate toxic or otherwise harmful protein sequences. Even though acting on these designs would require a wet lab, synthesizing custom amino acid sequences is typically straightforward compared to synthesizing novel small molecules.” The paper further notes that the authors own model allows generation of venomous and dangerous proteins and that “Future efforts to expand the training dataset and modeling improvements could increase the dual-use risk. “ .
Its not necessary to biosynthesize snake venom for the products of generative protein engineering to harm people’s well being. Synthetic biology firms such as Gingko argue that by allowing rapid deign of novel materials it will be possible to replace existing petrochemical-based production with fast and lighter biological production methods. Replacing petroleum-derived chemicals may indeed be one outcome, but it will only be one amongst many commercial drivers of the technology. Other commercial entities may attempt to replace valuable natural products currently grown by small farmers or to displace forest or marine-derived commodities – changing land and ocean use patterns and impacting the livelihoods of farmers and fisherfolk.
We have already seen that the first commercial targets for synthetic biology production have been exactly these high value natural flavours, fragrances, cosmetic ingredients, oils, spices and textiles that are grown , gathered and stewarded by small farmers and indigenous peoples. While i worked for ETC Group my former colleagues and I spent years cataloguing examples of synthetic biology firms who were biosynthesizing (or as some now say “precision fermenting”) natural products. these included vanilla, Stevia, silk, saffron, artemisinin, coconut oil, orange oil and many more important, culturally sensitive commodities that the worlds most vulnerable people depend upon for livelihoods and culture. Here is a report we published in 2016 that highlights 13 case studies of natural products that could be disrupted by synthetic bioproduction - which may also have serious impacts on biodiversity conservation since small farmers and indigenous communities are often the basis of in-situ community conservation efforts. We found that of the 200 to 250 different botanical crops used by the food and fragrance industry, 95% of that comes from small-scale farmers and agricultural workers mostly in the global South. In total an estimated 20 million small-scale farmers and agricultural workers depend on botanical crops grown for natural flavors and fragrances. (This is a low estimate.) Flavor & fragrance industry trade groups themselves acknowledge that these botanicals are “highly important in terms of their socio-economic impact on rural populations and may also have important environmental benefits within agricultural systems.”
Back in 2016 ,when I was researching these syn bio applications, I was increasingly noticing that the disruption to traditional natural products was not simply coming from the vats of fermented microbes that synthesized alternatives compounds. It was also coming from new bioengineered enzymes that could convert one low value product into something else with a high value. For example a key target of synthetic biology companies at the time was trying to coax genetically engineered yeast, fed on sugar, to produce large quantities of sweetener compounds called Rebausides found in Stevia. Most commercially harvested stevia is high in a rebauside called Reb A which gives natural Stevia its characteristic metallic taste as a sweetener ingredient. However much smaller quantities exist of Reb D and Reb M which are far sweeter but exist only in small quantities in natural stevia leaf. While some companies such as Cargill and Evolva were actively engineering yeast to produce quantities of Reb D and Reb M in fermentation vats (directly challenging botanical stevia) others were producing a genetically engineered enzyme protein that could convert the large quantities of Reb A already grown on fields into Reb M. On the face of it this enzymatic “bioconversion” might seem a good thing: Stevia farmers could keep growing stevia leaf but consumers would have a sweeter product at the end. That was certainly the argument that leading Stevia company Pure Circle were making about the engineered enzyme approach.
But the same enzymatic approach can also be used to design engineered enzymes and proteins which bioconvert quite unrelated feedstocks into high value ingredients. For example Ambrosia Bio, a synthetic biology company is working with Gingko Bioworks and its AI-protein design tools to create engineered enzymes which convert low cost sugars and starches into allulose, a low calorie sweetener to compete with stevia usually found in small quantities in figs, raisins, jackfruit and maple syrup Allulose has the special property of not spiking blood glucose and insulin levels while still tasting very much like sugar. One analyst report provocatively claims that the Allulose industry is likely to be “the fastest growing industry in 2031”. In fact Ambrosia Bio is just copying what most allulose producers are now already doing - including sugar giant Tate and Lyle . They also use genetically engineered enzymes to convert corn and wheat starch into this new low cal super ingredient. In another reality allulose might have been an opportunity for raisin, fig, jackfruit and maple producers to build new low cal sweetener markets. Instead syn bio protein production means it has been captured entirely by big ag and biotech interests.
AI Biopiracy
But the loss of such economic opportunities is not just about innate AI and genomic cleverness. It is also predicated on a grand theft of genomic data. Remember those hundreds of millions of genomes and millions of protein sequences that NVIDIA, Salesforce, Gingko and others trained their large language models on? They weren’t collected by NVIDIA or Salesforce through mutually agreed use agreements with farmers, peasants, patients and so on, They are the result of decades of underhand bioprospecting and biopiracy - taking samples and sequencing genomic material from many thousands of locations and communities and uploading those digital sequences into databases.
Biological materials are both the common heritage of humankind and they come from specific communities who have stewarded, developed, protected, bred and co-evolved with biological diversity and who may be seen to have inherent rights and relationships to how those genetic resources are used. For the past quarter century indigenous peoples movements, farmers and south governments have waged ongoing battles through the corridors of the Convention on Biological Diversity, the Food and Agriculture Organisation, The World Intellectual Property Organisation, the World Health Organization and other bodies to get their inherent rights in genomic material recognized and to staunch the theft of genetic material by biotech interests. Its a battle that has birthed a global ‘Seed Treaty’ on plant genetic resources and a Nagoya Protocol on Access and Benefit Sharing of genetic resources. In the past few years the existence of large digital genomic databases has elevated that fight to a north vs south battle about fair and equitable governance of Digital Sequence Information (or DSI) on genetic resources - ie the very digital DNA sequences that AI titans are now basing their business plans on.
Because its a fair bet that at no point when DNA sequences were taken from southern communities by bioprospectors or university biotech researchers did the local community get asked how they felt abut their genetic heritage being transformed into digital code, uploaded to large language models and used by generative Ai agents to construct synthetic alternatives that could be sold for private profit. That act of free prior and informed consent simply never happened.. Yet it is the common resources taken from those communities which are now being used to power the expected mega profits of AI tech companies. The emergence of automated AI driven genetic engineering as an industrial opportunity is the culmination of why biotech corporations and industrial states have fought so hard against giving communities rights to their own traditional biological resources.
Incredibly, There is an almost exact parallel to the legal and moral fights which are no breaking out around commercial AI art, text and video being based on stolen , unpaid for labour of real artists, authors , actors and others . Whan Dall-E or MidJourney “generates” a new piece of AI artwork - what it really does is remix elements from images of millions of existing artworks that have been ingested into the training data. Yet no attribution or commercial repayment is made to those whose artistic labour is leveraged nor is there even acknowledgement of the original artists whose work is stolen, trained upon, remixed and incorporated into the new AI “artwork”. The same is true for synthetically constructed texts which remix the actual written work of real authors and journalists.
Several lawsuits have now been launched by artists, graphic designers, authors and others, while Hollywood actors, scriptwriters and animators staged a successful strike to force hollywood studios to recognise there inherent rights in their works and not have unfairly it stripped away by AI processes. There are now online tools such as haveibeentrained.com which lets artists search major AI datasets to find out if their works are included within them and insist on having them removed. However no similar process exists for communities to explore DNA datasets to see if their stolen genetic material is being used by generative AI to create corporate Synbio organisms, protein, and other commercial products. The existing processes within the Convention on Biological Diversity and The Seed Treaty urgently need to reckon with the implication of generative AI for the already bitter politics of Digital Sequence Information - in order to defend the rights and economies of the worlds poorest and most vulnerable people. For example this month a process is getting underway under the biodiversity convention to create multilateral mechanism for benefit-sharing from the use of digital sequence information (DSI) on genetic resources. Unless that mechanism addresses the aggregation and use of digital DNA by generative AI models (and makes the DNAI companies and the AI protein-generators pay) it will have completely missed the point.
—
For all the daily welter of stories about AI , media reporting and analysis on AI/Syn Bio convergence is still fairly thin. Where it happens, its usually trade and investment press that focuses on the economic promises and hopes of startup founders and their clever algorithmic science.. Ultimately however most folks are looking in the wrong place. Its elsewhere, in the peasant fields of the global south or even the agricultural producing regions of the nort, that the real impact will be felt from the convergence of artificial life with artificial intelligence. If it gets to the point where synthetic biology firms or their industrial clients can credibly type ‘Design me a protein like silk’ into a text-to-protein command line as easily as graphic designers can now type ‘paint me a picture of silk ’ into midjourney then the worlds production economies may shift - harmfully and even further away from justice.
Thanks for this, Jim. As ever, your analysis is very useful. I would like to think that my work, and that of my compadres at the Center for Genetics and Society, is complementary with yours. My review of Suleyman’s book might be of interest: https://www.geneticsandsociety.org/biopolitical-times/ai-biotech-and-race-control-future.
Incidentally, you are spot on about Monbiot; I heard him speak at an FoE annual conference in about 1996 and was deeply suspicious. I recognized a certain slipperiness, which I fear has got worse.
Very helpful. Thanks. Looking forward to parts two and three. I recently read The Coming Wave, which you cite. What other books would you recommend on these issues? I've been trying to digest accessible books to get me started (God, Human, Animal, Machine; The Regenesis Machine; As Gods; The Coming Wave). Others?