Deep Learning and Biology: We’ve only just begun

This is the first blog in a 3 part series about deep learning applications in biology and an introduction to the topic. Future blogs will go deeper in the science and technology of the subject.

Deep learning is all the rage. Data scientists everywhere want to be using it regardless of their industry, and while deep learning is far from a panacea, its reputation as the groundbreaking code-related development of the last decade is well deserved. Along with consistently smashing records in image classification for several consecutive years, it has found a place in many additional problems, including speech, natural language processing, reinforcement learning and more. It is used in autonomous vehicles, finance, building automation and, of course, outperforming the best human players of highly complex games. But of all the fields being revolutionized by deep learning, we cannot ignore the potential impact on biology.

Biology is ripe for a machine learning revolution, with interesting and complex problems ranging from ADMET (absorption, distribution, metabolism, excretion, and toxicology) data modeling in the pharmaceutical space to gene expression modeling from microarray data, from estimating the impact of specific genetic variants to histopathology tissue segmentation for disease diagnostics. And, of course, I can’t forget to mention the impact of deep learning in drug discovery.

Drug discovery is a slow process that is becoming increasingly more expensive. Getting a drug to market takes longer and costs more than ever before, and unless something changes, it’ll only get worse. Recursion Pharmaceuticals is leveraging deep learning to combat that trend. Rather than spending a decade trying to bring a single drug to market for a single disease, Recursion couples deep learning with wet-lab biology to model hundreds of genetic diseases in parallel, paired with thousands of diverse small molecules in order to rapidly discovery potential drug candidates.

At close to 10TB of image data a week, Recursion pumps out fluorescent microscopy image data at an unprecedented rate that is only increasing.

These images are fed into our deep learning models to extract the relevant information necessary to identify a disease signature as well as determine which drugs rescue a given disease signature. The flexibility provided by a deep learning framework allows the models to learn abstract representations of our biological image data in order to work on hundreds of drug programs in parallel.

While deep learning might not be that panacea some wish it would be, it’s impact on biology is already starting to show promising results. Whether those results come in the form of diagnostic tools to aid radiologists or better prediction of target binding for new small molecules, it’s clear that the impact will be huge. Recursion, for one, is excited to continue to push the bounds of deep learning in drug discovery in its mission to treat 100 diseases by 2025.