Should We Be Concerned? “Dual Use Of Artificial-Intelligence-Powered Drug Discovery”

Share this:

– Shocking AI Ability To Create Bioweapons Inventory Proven – 40.000 Lethal Molecules Discovered In 6 Hours

We all know about the unprecedented events of operation warp speed. The gene sequence of the supposed “Covid 19 virus” was released from China and 42 days later a vaccine entered clinical trial. In the below interview, Dave Johnson from Moderna explains now Artificial Intelligence was used to get this done.

Are there any dangers in that? Artificial Intelligence is now rapidly moving into Big Pharma domains for drug development.

After the transcript, I posted a review article discussing the concerns of dual use of Artificial Intelligence powered drug discovery. Dual use means civilian and military weapons development. The article is shocking, to say the least. They just checked if instead of deveoping useful molecules for health, they could find toxic bioweapons, lethal for humans.

Here is the answer:

In less than 6 hours after starting on our in-house server, our model generated forty thousand molecules that scored within our desired threshold.

Who are these scientists and why did they check for this?

An international security conference explored how artificial intelligence (AI) technologies for drug discovery could be misused for de novo design of biochemical weapons. A thought experiment evolved into a computational proof.

The Swiss Federal Institute for NBC-Protection—Spiez Laboratory—is part of the ‘convergence’ conference series1 set up by the Swiss government to identify developments in chemistry, biology and enabling technologies, which may have implications for the Chemical and Biological Weapons Conventions. Meeting every two years, the conference brings together an international group of scientific and disarmament experts to explore the current state of the art in the chemical and biological fields and their trajectories, to think through potential security implications, and to consider how these implications can most effectively be managed internationally. The meeting convenes for three days of discussion on the possibilities of harm, should the intent be there, from cutting edge chemical and biological technologies. Our drug discovery company received an invitation to contribute a presentation on how AI technologies for drug discovery could be potentially misused.

Below is the transcript of the interview with Dave Johnson from Moderna explaining how AI was used for Covid 19 vaccine development – but we know now that it created a biological weapon.

Jennifer Strong: The genetic sequence of the COVID-19 virus was first published in January 2020. It kicked off an international sprint to develop a vaccine… and represented an unprecedented collaboration between the pharmaceutical industry and governments around the world. And it worked. Months later, the U.S Government approved emergency authorizations for multiple vaccines.

I’m Jennifer Strong, and this is I Was There When—an oral history project featuring the stories of breakthroughs and watershed moments in AI and computing… as told by those who witnessed them.

This episode, we meet Dave Johnson, the chief data and artificial intelligence officer at Moderna.

Dave Johnson: Moderna is a biotech company that was founded on the promise of mRNA technology.

My name is Dave Johnson. I’m chief data and AI officer at Moderna. mRNA is essentially an information molecule. It’s encoded, a sequence of amino acids, which when they enter the cell in your body, it produces a protein and that protein can perform a variety of different functions in your body from curing a rare disease, potentially attacking cancer, or even a vaccine to battle of virus like we’ve seen with Covid. What’s so fundamentally different about this approach from the typical pharmaceutical development is it’s much more of a design approach. We’re saying we know what we want to do. And then we’re trying to design the right information molecule, the right protein, that will then have that effect in the body.

And if you know anything about pharmaceutical development, it tends to be a very serial process. You know, you start with some kind of initial concept, some initial idea and you test it in Petri dishes or in, you know, small experiments. And then you move on to preclinical testing. And if all of that looks good, then you’re finally moving off to, to human testing and you go through several different phases of clinical trials where phase three is the, the largest one where you’re proving the efficacy of this drug.

And that whole process from end to end can be immensely expensive, cost billions of dollars and take, you know, up to a decade to do that. And in many cases, it still fails. You know, there’s countless diseases out there right now that have no vaccine for them, that have no treatment for them. And it’s not like people haven’t tried, it’s just, they’re, they’re challenging.

And so we built the company thinking about: how can we reduce those timelines? How can we target many, many more things? And so that’s how I kind of entered into the company. You know, my background is in software engineering and data science. I actually have a PhD in what’s called information physics—which is very closely related to data science. And I started when the company was really young, maybe a hundred, 200 people at the time. And we were building that early preclinical engine of a company, which is, how can we target a bunch of different ideas at once, run some experiments, learn really fast and do it again. Let’s run a hundred experiments at once and let’s learn quickly and then take that learning into the next stage.

So if you wanna run a lot of experiments, you have to have a lot of mRNA. So we built out this massively parallel robotic processing of mRNA, and we needed to integrate all of that. We needed systems to kind of drive all of those, uh, robotics together. And, you know, as things evolved as you capture data in these systems, that’s where AI starts to show up. You know, instead of just capturing, you know, here’s what happened in an experiment, now you’re saying let’s use that data to make some predictions. Let’s take out decision making away from, you know, scientists who don’t wanna just stare and look at data over and over and over again. But let’s use their insights. Let’s build models and algorithms to automate their analyses and, you know, do a much better job and much faster job of predicting outcomes and improving the quality of our, our data.

So when Covid showed up, it was really, uh, a powerful moment for us to take everything we had built and everything we had learned, and the research we had done and really apply it in this really important scenario. Um, and so when this sequence was first released by Chinese authorities, it was only 42 days for us to go from taking that sequence, identifying, you know, these are the mutations we wanna do. This is the protein we want to target. 

Forty-two days from that point to actually building up clinical-grade, human safe manufacturing, batch, and shipping it off to the clinic—which is totally unprecedented. I think a lot of people were surprised by how fast it moved, but it’s really… We spent 10 years getting to this point. We spent 10 years building this engine that lets us move research as quickly as possible. But it didn’t stop there.

We thought, how can we use data science and AI to really inform the, the best way to get the best outcome of our clinical studies. And so one of the first big challenges we had was we have to do this large phase three trial to prove in a large number, you know, it was 30,000 subjects in this study to prove that this works, right?

That’s a huge study. Covid had been flaring, um, infecting countless people. And we had to figure out: where do we run our studies? We’re gonna pick a hundred locations in the US to run this study and we needed to balance finding places where we have kind of the right racial diversity that’s the right makeup for the country.

We needed to balance… kind of practical concerns. If we need a, you know, the right size facility and clinical trial sites that can deliver quality data. And we need to find places where Covid has not already hit. So at the time New York, for example, was already heavily hit. And so it wouldn’t be an ideal place to run a clinical study because we have to accrue cases of it.

So we had to find places that weren’t quite yet hit, but places that we expected to actually, you know, surge, you know, maybe six weeks after the study started after people had been inoculated. So that’s a really challenging problem we had to solve. And I wanna say, you know, we, we didn’t do this all entirely internally. We worked with countless external partners. And I can’t tell you the number of different epidemiology models that we saw. It seemed like everybody was an epidemiologist all of a sudden. But we incorporated all that learning all that information into our internal decision making and used that to try to find: these are the optimal places that we should run this study.

And then even while we were running this study, we were saying, how can we continue to optimize and do better? You know, we built real time analytics into our studies enrollment. So as patients or subjects enrolled into the study, were treated with our vaccine, we are monitoring the diversity of this: the age, the gender, and racial diversity to ensure that the final makeup of this study, when all said and done was representative of the US.

We got, I wanna say, maybe 80% of the way through the study. And we realized, look, we are not gonna meet our, our objectives because the level of volunteers aren’t quite what we wanted. And so we made the, the really difficult decision to say, look, we need a throttle, some areas of the country and focus on outreach in different areas to get the right makeup so that the study was representative.

Given these views from Dave Johnson, it appears that AI in Big Pharma drug development is an absolute miracle. However, the results of the security conference discussing biological weapons, comes to a very different conclusion:

Risk of misuse

The thought had never struck us. We were vaguely aware of security concerns around work with pathogens or toxic chemicals, but that did not relate to us; we primarily operate in a virtual setting. Our work is rooted in building machine learning models for therapeutic and toxic targets to better assist in the design of new molecules for drug discovery. We have spent decades using computers and AI to improve human health—not to degrade it. We were naïve in thinking about the potential misuse of our trade, as our aim had always been to avoid molecular features that could interfere with the many different classes of proteins essential to human life. Even our projects on Ebola and neurotoxins, which could have sparked thoughts about the potential negative implications of our machine learning models, had not set our alarm bells ringing.

Our company—Collaborations Pharmaceuticals, Inc—had recently published computational machine learning models for toxicity prediction in different areas, and, in developing our presentation to the Spiez meeting, we opted to explore how AI could be used to design toxic molecules. It was a thought exercise we had not considered before that ultimately evolved into a computational proof of concept for making biochemical weapons.

Generation of new toxic molecules

We had previously designed a commercial de novo molecule generator which we called MegaSyn2 which is guided by machine learning model predictions of bioactivity for the purpose of finding new therapeutic inhibitors of targets for human diseases. This generative model normally penalizes predicted toxicity and rewards predicted target activity. We simply proposed to invert this logic using the same approach to design molecules de novo, but now guiding the model to reward both toxicity and bioactivity instead. We trained the AI with molecules from a public database using a collection of primarily drug-like molecules (that are synthesizable and likely to be absorbed) and their bioactivities. We opted to score the designed molecules with an organism-specific lethal dose (LD50) model3, and a specific model using data from the same public database which would ordinarily be used to help derive compounds for treatment of neurological diseases (details of the approach are withheld but were available during the review process). The underlying generative software is built on and similar to other open-source software that is readily available4. To narrow the universe of molecules we chose to drive the generative model towards compounds like the nerve agent VX, one of the most toxic chemical warfare agents developed during the 20th century—a few salt-sized grains of VX, (6–10 mg)5, is sufficient to kill a person. Nerve agents such as Novichoks have also been in the headlines recently6.

In less than 6 hours after starting on our in-house server, our model generated forty thousand molecules that scored within our desired threshold. In the process, the AI designed not only VX, but many other known chemical warfare agents that we identified through visual confirmation with structures in public chemistry databases. Many new molecules were also designed that looked equally plausible. These new molecules were predicted to be more toxic based on the predicted LD50 in comparison to publicly known chemical warfare agents (Figure 1). This was unexpected as the datasets we used for training the AI did not include these nerve agents. The virtual molecules even occupied a region of molecular property space that was entirely separate to the many thousands of molecules in the organism-specific LD50 model, which is mainly made up of pesticides, environmental toxins, and drugs (Figure 1). By inverting the use of our machine learning models, we had transformed our innocuous generative model from a helpful tool of medicine to a generator of likely deadly molecules.

Our toxicity models were originally created for use in avoiding toxicity, enabling us to better virtually screen molecules (for pharmaceutical and consumer product applications) before ultimately confirming their toxicity through in vitro testing. The inverse, however, has always been true: the better we can predict toxicity, the better we can steer our generative model to design new molecules in a region of chemical space populated by predominantly lethal molecules. We did not assess the virtual molecules for synthesizability or explore how to make them with retrosynthesis software. Both of these processes have readily available commercial and open-source software, which can be easily plugged into the de novo design process of new molecules7. We also did not physically synthesize any of the molecules either, but with a global array of hundreds of commercial companies offering chemical synthesis, it is not necessarily too big of a step, which is poorly regulated with few if any checks to prevent synthesis of new extremely toxic agents that could potentially be used as chemical weapons. Importantly, we had a human-in-the-loop with a firm moral and ethical ‘don’t-go-there’ voice to intervene. But what if the human was removed or replaced with a bad actor? With current breakthroughs and research into autonomous synthesis8, a complete design-make-test cycle applicable to making not only drugs, but toxins, is within reach. Our proof-of-concept highlights how a non-human autonomous creator of a deadly chemical weapon is entirely feasible.

A wake-up call

Without being overly alarmist, this should serve as a wake-up call for our colleagues in the ‘AI in drug discovery’ community. While some domain expertise in chemistry or toxicology is still required to generate toxic substances or biological agents that can cause significant harm, when these fields intersect with machine learning models, where all you need is the ability to code and to understand the output of the models themselves, they dramatically lower technical thresholds. Open source machine learning software is the primary route for learning and creating new models like ours, and toxicity datasets10 that provide a baseline model for predictions for a range of targets related to human health are readily available.

Our proof of concept was focused on VX-like compounds, but it is equally applicable to other toxic small molecules with similar or different mechanisms with minimal adjustments to our protocol. Retrosynthesis software tools are also improving in parallel, allowing new synthesis routes to be investigated for known and unknown molecules. It is therefore entirely possible that novel routes can be predicted for chemical warfare agents, circumventing national and international lists of watched or controlled precursor chemicals for known synthesis routes.

The reality is that this is not science fiction. We are but one very small company in a universe of many hundreds of companies using AI software for drug discovery and de novo design. How many of them have even considered repurposing, or misuse, possibilities? Most will work on small molecules and many of the companies are very well funded and likely using the global chemistry network to make their AI designed molecules. How many people are familiar with the know-how to find the pockets of chemical space that can be filled with molecules predicted to be orders of magnitude more toxic than VX? We do not currently have answers to these questions. There has not previously been significant discussion in the scientific community about this dual use concern of AI used for de novo molecule design, at least not publicly. Discussion of societal impact of AI has principally focused on aspects like safety, privacy, discrimination and potential criminal misuse10, but not national and international security. When we think of drug discovery, we normally do not consider technology misuse potential. We are not trained to consider it, and it is not even required for machine learning research, but we can now share our experience with other companies and individuals. AI generative machine learning tools are equally applicable to larger molecules (peptides, macrolactones etc.) and to other industries like consumer products and agrochemicals that also have interests in designing and making new molecules with specific physicochemical and biological properties. This greatly increases the breadth of the potential audience that should be paying attention to these concerns.

For us, the genie is out of the medicine bottle when it comes to repurposing our machine learning. We must now ask: what are the implications? Our own commercial tools as well as open-source software tools and many datasets that populate public databases are available with no oversight. If the threat of harm, or actual harm, occurs with ties back to machine learning, what impact will this have on how this technology is perceived? Will hype in the press on AI-designed drugs suddenly flip to AI-designed toxins, public shaming, and decreased investment in these technologies? As a field, we should open a conversation on this topic. The reputational risk is substantial; it only takes one bad apple that takes what we have vaguely described to the next logical step, or an adversarial state looking for a technological edge. How do we prevent this? Can we lock away all the tools and throw away the key? Do we monitor software downloads or restrict sales to certain groups? We could follow the example of machine-learning models like GPT-311 which was initially waitlist restricted to prevent abuse and has an API for public usage. Even today, without a waitlist, GPT-3 has safeguards in place to prevent abuse, Content Guidelines, a free content filter and monitoring of applications that use GPT-3 for abuse. We know of no recent toxicity or target model publications that discuss these concerns of dual use, similarly. As responsible scientists, we need to ensure that misuse of AI is prevented, and that the tools and models we develop are only used for good.

By going as close as we dared, we have still crossed a grey moral boundary, demonstrating that designing virtual potential toxic molecules is possible without much effort, time or computational resources. We can easily erase the thousands of molecules we created, but we cannot delete the knowledge of how to recreate them.

The broader impacts on society

There is a need for discussions across traditional boundaries and multiple disciplines to allow for a fresh look at AI for de novo design and related technologies from different perspectives and with a wide variety of mindsets. Here, we give some recommendations which we believe will reduce potential dual-use concerns for AI in drug discovery. Scientific conferences, like the Society of Toxicology and American Chemical Society, for example should actively foster a dialogue among experts from industry, academia and policy making on the implications of our computational tools. There has been recent discussion in this journal regarding requirements for broader impact statements from authors submitting to conferences, institutional review boards and funding bodies as well as addressing potential challenges12. Making increased visibility a continuous effort and a key priority would greatly assist in raising awareness about potential dual use aspects of cutting-edge technologies and would generate the outreach necessary to have everyone active in our field engage in responsible science. We can take inspiration from examples such as The Hague Ethical Guidelines13, which promote a culture of responsible conduct in the chemical sciences and guard against the misuse of chemistry, in order to have AI-focused drug discovery, pharmaceutical, and possibly other companies agree to a code of conduct to train employees, secure their technology and prevent access and potential misuse. The use of a public-facing API for models with code and data available upon request would greatly enhance the security and control over how published models are utilized without adding much hindrance to accessibility. While MegaSyn is a commercial product and thus we have control over who has access to it, going forward, we will implement restrictions or an API for any forward-facing models. A reporting structure or hotline to authorities, if there is a lapse or if we become aware of anyone working on developing toxic molecules for non-therapeutic uses, may also be valuable. Finally, universities should redouble their efforts in ethical training of science students and broaden the scope to other disciplines, particularly computing students, so that they are aware of the potential misuse of AI from an early stage of their career as well as understand the potential for broader impact12. We hope that by raising awareness of this technology we will have gone some way to demonstrating that while AI can have important applications for healthcare and other industries, we should also remain diligent against the potential for dual use, in the same way that we would with physical resources such as molecules or biologics.

Summary:

Artificial Intelligence computation can be easily utilized to find biological weapons rather than molecules for human benefit. Since many processes are automated, can AI self learn to create destructive algorithms? Would humans ever know about it?

These findings are of highest concern and need to be discussed in public forums.

 

Source: Should We Be Concerned? “Dual Use Of Artificial-Intelligence-Powered Drug Discovery” – Shocking AI Ability To Create Bioweapons Inventory Proven – 40.000 Lethal Molecules Discovered In 6 Hours


Share this:
0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x
Scroll to Top