Artificial Intelligence (AI) is taking the old concept of “Rational Drug Design” to a new dimension. The time has not come yet where robots will replace medicinal chemists, but drug discovery in the small molecule area is on the brink of a radical transformation.
The economist Joseph DiMasi (Director of the Tufts Center for the Study of Drug Development) published a study  highlighting a multiplication by 6, between 1991 and 2013, of the costs of research and development engaged for a single molecule to reach the market: from ~€450m in 1991 versus ~€2,560m in 2013.
According to Deloitte’s annual report “Measuring the return from pharmaceutical innovation”, the return on investment of the big pharmaceutical companies is steadily decreasing (10% in 2010 versus 3.2% in 2017), thus predicting a possible shift in negative ROI by the beginning of the 2020 decade. This efficiency crisis shakes the traditional model of the pharmaceutical R&D to such an extent that it becomes urgent to adapt it.
The long and expensive way leading to drug discovery…
The research and discovery of small molecules (as opposed to biologics, which are bigger molecules, more complex, and less stable) can be seen as a succession of steps of molecules’ identification and selection. It is usually segmented into four major steps:
Figure 1 – Drug Research and Discovery steps
Lead optimisation remains a puzzle for chemists that compares to the Rubik’s cube: maximise a parameter, you will degrade another one. This step alone concentrates 20% of total research and development costs.
Next steps are more widely known to the general public: in preclinical research, the best molecules are tested on vivo models to assess their toxicity, pharmacology and pharmacokinetics. Finally, one (or more) drug candidate enters the human clinical trial phase. This is the “development” part which usually lasts several years (10 years on average, according to Bernstein Research) and accounts for 60% of total R&D costs.
Why is pharma R&D productivity declining?
The reasons for this decline in productivity are numerous and have been intensively discussed . Among them, two reasons stand out, in our opinion.
First: pharmaceutical R&D addresses more complex pathological processes today than in the past. In other words, we have found medicines for all the ‘easy diseases’, and we are now facing the “difficult” ones.
The second reason is based on a technological bias: the scientific and technical progress of the 80s and 90s allowed the industrialisation of certain research stages. For instance, High Throughput Screening (HTS) or more recently DNA-encoded chemical libraries have artificially inflated the number of leads generated, by increasing the capacity of the filtration stages, without increasing their quality in similar proportion. Thus, more molecules, not better ones, have been pushed through later stages of development, generating more spending without better results in the end.
Thus, the pharmaceutical industry is now seeking to reduce operational costs and improve cycle time within research and development. And AI’s ability to reduce drug development times is increasingly established.
The potential of AI in drug discovery
To improve productivity in small molecule discovery, the key challenge is to find a molecule (the identification part of the process) that maximizes a large number of very diverse criteria, which will be tested sequentially, one after the other (the selection part). Artificial Intelligence (AI) makes it possible to build holistic models for the design of new drugs where these tests can be performed simultaneously, in silico.
The use of deep learning algorithms in drug discovery became widespread in 2012, after Georges Dalh won the Merck Molecular Activity Challenge by demonstrating the effectiveness of little trained deep neural networks to predict the activity of a molecule starting from its structure . This has automated a discipline well known to chemists: QSAR (Quantitative Structure-Activity Relationship).
In 2016, in an article entitled “Automatic chemical design using a data-driven continuous representation of molecules“, Alan Aspuru-Guzik et al. describe a method of continuous and multidimensional representation of the chemical space using deep neural networks. This method allows a simpler, faster and more comprehensive exploration of the chemical space (estimated at 1060 molecules potentially usable as a drug), and ultimately, the generation of virtual molecules previously inaccessible even via the largest databases (containing about 108 molecules).
A case study…
One of our clients, IKTOS, a French start-up founded in 2016, has developed an AI technology capable of generating molecules under the constraint of a set of physicochemical and biological characteristics, according to in silico predictive models of such characteristics.
IKTOS technology is based on the interweaving of three algorithms, which are orchestrated to enable an efficient exploration of the chemical space in an iterative manner and enable the identification of optimal in silico compounds in a few hours of computation.
The first is a generative model: trained on databases containing several million chemical compounds, it can “build” virtual molecules located anywhere in the chemical space (implementing a principle close to that proposed by Gomez-Bombarelli et al.).
The second is a predictive algorithm: trained on a customer database that contains already available and tested molecules, those models can predict the physicochemical and pharmacological properties of a molecule only from its chemical structure.
The third is the reinforcement algorithm: the reinforcement component uses the information (scores) provided by the predicted models on the previous sets of generated molecules to modify the weights of the generative model in order to orientate the molecule generation in the right direction.
This technology has demonstrated its effectiveness through a collaboration with a major pharmaceutical company. For 10 years, the chemist team had tried to make compounds maximising a set of 11 biological activity criteria. Among the 900 compounds that they had synthesized and tested; they were not able to find any molecule hitting more than 9/11 success criteria. In just a few days, IKTOS technology generated 150 virtual molecules which were predicted to meet all 11 criteria, in silico. Out of those 150 molecules, 11 were selected (based on their synthetic accessibility and originality), synthesized by the chemists, and tested on all 11 criteria. 9 molecules were found to maximize 9 criteria, 3 to maximize ten criteria, and 1 molecule was found to be good on all 11 success criteria. It took only a few days for an AI, and 11 molecules, to achieve better results than what had been achieved by a team of chemist experts over 10 years of benchtop research and 900 trials of molecules.
An emerging field attracting massive investments
The use of artificial intelligence in small molecule research is quite new, and it is still difficult to figure out how far it will go. We have identified -and often met as well- more than a hundred companies (mainly start-ups) at the crossroad of pharmaceutical R&D and AI. Many start-ups are flaunting attractive technologies, but those who are really able to deliver high value-added and actionable results are still few in number. In addition, the still limited number of success stories and the confidentiality that surrounds most of them contribute to a lack of clarity among medicinal chemists around the applications of AI in their profession. In fact, big pharma is still seeking to understand the possible applications of AI, and to evaluate existing technologies and stakeholders.
Numerous partnerships have been signed recently, demonstrating the growing interest in the area (Sanofi with Exscientia and Recursion, Merck with Atomwise, GSK with Cloud pharmaceuticals, InSilico Medicine and Exscientia, Iktos and Janssen, Iktos and Merck). The increasing number of scientific publications in recent years also reflects the enthusiasm of the scientific community and the industry. In addition, private investments are accelerating (~$30m invested in 2012 versus ~$500m and ~$800m in 2014 and 2016 respectively). The enthusiasm of investors is all the greater when certain start-ups, initially service providers of R&D for the pharmaceutical industry, develop their own pipeline of molecules and thus compete frontally with traditional biotech startups. Benevolent AI, whose first clinical trials on Parkinson’s disease began in 2018, holds 20 molecules in the preclinical phase, and has recently raised $115m. Today, it is valued at $2bn.
Towards automated drug discovery?
Certain companies (like SRI or Catapult Medicine Discovery) aim at developing a fully automated research workflow, from automated design and retrosynthesis to robotised synthesis and tests. We are not among the few utopians who believe in the total automation of pharmaceutical drug discovery. Nevertheless, AI will certainly contribute to deeply transforming pharmaceutical research. As some say, “AI will not replace medicinal chemists, but medicinal chemists who use AI will replace those who don’t”. Investments of pharmaceutical companies in AI are nibbling budgets of benchtop research and computational chemistry, progressively driving research activities to more in silico and automated methods. Today, most of the major names in the industry develop partnerships with companies mastering these technologies. The question remains whether they will try to internalise it or not. If not, we expect to see in the next few years the emergence of a new model of AI-based biotech start-ups, with highly automated discovery processes, and sufficient funding to develop their own pipeline.
 DiMasi JA, Grabowski HG, Hansen RA. “Innovation in the pharmaceutical industry: new estimates of R&D costs”. Journal of Health Economics 2016
 J.W. Scannel et al. “Diagnosing the decline in pharmaceutical R&D efficiency”. Nature 2012
 George E. Dahl et al. “Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships”. Journal of Chemical Information and Modeling 2015
 Alan Aspuru-Guzik et al. « Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules”. ACS Central Science 2018
 “Deep Learning For Ligand-Based De Novo Design In Lead Optimization: A Real Life Case Study”. Iktos and Servier Poster 2018