21 Startups Building Biotech & Life Science Foundation Models

This post is part of a series covering the Domain Specific Foundation Models (DFSM) which are industry or use case specific Foundation Models. You can view the full interactive map with more than 70 startups here.

This landscape highlights the startups building specialized foundation models (DSFM) across industries. These are companies whose core product is a large scale, pre-trained model built for a specific domain (biology, brainwaves, robotics, law, etc.). They commercialize their domain specific foundation models either “as a service” to third party applications or directly to end users by building the application on top of their DFSM.

Biotech & Life Science Foundation Models

What is this category about?
  • Startups in this category build foundation models trained on biological data such as DNA, RNA, proteins, cell states, and tissue images.
  • These models are designed to generalize across a range of biomedical tasks, including structure prediction, function annotation, drug target discovery, and therapeutic design.
  • They are typically multi-modal, combining genomic, transcriptomic, proteomic, and sometimes clinical or pathology data.
  • Unlike narrow models used for specific drugs or targets, these foundation models act as core infrastructure for biotech R&D pipelines.
  • The goal is to enable faster, more scalable drug discovery by pretraining on massive biological datasets, reducing the need for hand-crafted models per project.
  • Commercialization strategies include internal drug programs, platform licensing, and research collaborations with pharma and biotech firms.
How is data generated and accessed?
  • The data used to train these models is generated through wet lab experiments (hands on biological experiments in a lab setting), high throughput screening (automated testing of thousands of biological or chemical samples), sequencing technologies (like RNA seq or DNA sequencing to read genetic information), mass spectrometry (used to identify and quantify molecules such as proteins or metabolites), and microscopy (capturing images of cells and tissues at high resolution).
  • Some startups operate in-house labs to generate proprietary, high resolution datasets specifically optimized for model training (e.g. 3D RNA structures, single cell omics, spatial tissue maps).
  • Others form partnerships with biopharma companies, academic labs, or hospital systems to access clinical, multi-omics, or pathology data under exclusive or semi exclusive agreements.
  • Public datasets from initiatives like ENCODE (gene regulation data), TCGA (cancer genomics), AlphaFold DB (predicted protein structures), or PDB (experimentally determined protein structures) are often used to pre-train models.
  • Some companies also purchase access to specialized commercial databases, such as proprietary compound libraries, protein interaction networks, or expression datasets.
  • Successful DSFM efforts typically blend public data (for scale and breadth) with proprietary or licensed datasets (for quality, depth, and differentiation) to build performant, defensible models.

21 Startups Building Biotech & Life Science Foundation Models

🇪🇺 Europe – 🇫🇷 Fra – 💸 Series B+

What they do:

  • Bioptimus is building a universal foundation model for biology that learns from data spanning molecules, tissues and whole organisms.
  • Their models integrate several biological modalities such as imaging and molecular profiles to capture relationships that normally stay hidden when each dataset is treated in isolation.
  • Early releases focus on large scale pathology models while the broader roadmap aims at multi modal systems that can support a wide range of biological and clinical tasks.

Use cases and customers:

  • The models help extract rich representations from biomedical images and molecular data which supports tasks such as disease classification, biomarker discovery and outcome prediction.
  • Researchers use the platform to speed up therapeutic discovery and to gain more precise insights into complex biological mechanisms.
  • Typical users include biotech teams, pharmaceutical R and D groups and research labs that need powerful models to interpret high dimensional biological data.

🇺🇸 US – 💰 Series A

What they do:

  • Atomic AI is building a foundation model for RNA biology called ATOM-1 which learns from large scale chemical mapping data to predict RNA structure and function with high precision.
  • The model captures relationships between RNA sequence, folding patterns and biological activity so researchers can understand and design RNA with far greater accuracy.
  • Their platform blends this model with in house datasets and wet lab validation to support the design of advanced RNA targeted and RNA based therapeutics.

Use cases and customers:

  • The technology helps identify and model RNA structures that open new paths for treating diseases considered difficult or impossible to target.
  • Teams use the model to improve RNA therapeutic modalities including small molecules, mRNA constructs and other oligonucleotide based approaches.
  • Typical users include biotech research groups and pharmaceutical teams exploring next generation medicines rooted in RNA structure and function.

🇨🇳 Chi – 🇺🇸 US – 💸 Series B+

What they do:

  • BioMap develops large scale AI foundation models for life sciences that aim to capture the fundamental patterns of biological systems and support high quality protein generation.
  • Their main model family, xTrimo, is trained on extensive protein sequence and interaction data so it can understand and predict complex biological behavior.
  • The platform combines these models with proprietary datasets and high performance computing so researchers can analyse and design biological components with greater reliability.

Use cases and customers:

  • The technology supports drug discovery by helping teams identify new protein targets, design therapeutic candidates and predict key biological interactions.
  • Research groups use the models in areas such as immunology, oncology, synthetic biology and cell and gene therapy where protein engineering plays a central role.
  • BioMap’s customers include pharmaceutical companies, biotech firms and academic labs that need advanced computational tools to speed up biological research and development.

🇺🇸 US – 💰 Series A

What they do:

  • Chai Discovery builds AI foundation models that learn the rules governing interactions between proteins, small molecules and nucleic acids so molecular behavior can be predicted and redesigned.
  • Their model family includes Chai 1, a multi modal system that delivers high quality structure prediction across a broad range of biochemical entities without relying on heavy alignment procedures.
  • The company also develops Chai 2, a model focused on de novo antibody design that generates new antibody sequences directly from target information.

Use cases and customers:

  • The models help teams accelerate drug discovery by enabling accurate molecular structure prediction and generative design tasks that usually require long experimental cycles.
  • Research groups apply the technology to antibody engineering, protein ligand modeling and other molecular design challenges where traditional computational tools struggle.
  • Typical users include biotech companies, pharmaceutical R &D teams and multidisciplinary research labs that integrate AI into their therapeutic development workflows.

🇨🇦 Can – 💸 Series B+

What they do:

  • Deep Genomics builds AI foundation models for RNA biology that learn how genes are regulated and how RNA mechanisms respond to sequence changes.
  • Their flagship model, BigRNA, predicts tissue specific regulatory activity, variant effects and therapeutic potential directly from DNA and RNA inputs.
  • The platform also includes models such as REPRESS which focus on microRNA binding and mRNA stability to give a more complete view of RNA mediated control.

Use cases and customers:

  • The models help researchers uncover hidden regulatory mechanisms, map disease related variants and identify therapeutic interventions grounded in RNA biology.
  • Teams use these tools to design antisense oligonucleotides, siRNA constructs and mRNA based therapies with more confidence in how sequence choices affect function.
  • Customers include biotech research groups, pharmaceutical organisations working on genetic diseases and academic labs exploring RNA driven therapeutic strategies.

🇺🇸 US – 💵 Seed

What they do:

  • Anthrogen is building AI foundation models for protein design called Odyssey that learn from protein sequences, three dimensional structures and functional data.
  • The Odyssey models are trained to capture the relationships between sequence, structure and activity so new proteins can be proposed that meet specified goals.
  • Their platform combines these models with experimental feedback so designs can be tested and refined in real world biological settings.

Use cases and customers:

  • The models support discovery of novel proteins that satisfy functional criteria such as binding strength, stability or catalytic activity.
  • Research teams use the platform to accelerate therapeutic protein engineering, biomolecule design and industrial enzyme development.
  • Typical customers include biotech companies, pharmaceutical discovery teams and laboratories using AI guided design to explore new areas of protein science.

🇪🇺 Europe – 🇱🇺 Lux – 💵 Seed

What they do:

  • Helical is building multi-modal foundation models for biology, trained on massive, proprietary biological datasets spanning molecules, cells, tissues, and clinical data.
  • The system provides access to pre trained models in genomics, transcriptomics and mRNA design and gives teams the ability to adapt these models with their own datasets.
  • Their platform is designed to enable biologists and drug developers to query biology directly, using AI to predict biological behavior, disease mechanisms, and therapeutic responses.

Use cases and customers:

  • Research groups use the platform to accelerate tasks such as target identification, biomarker discovery and mRNA sequence optimisation while reducing wet lab workloads.
  • The technology supports precision medicine efforts by helping teams stratify patient groups from molecular features and explore likely responses before clinical testing.
  • Customers include pharmaceutical research organisations, biotech companies and translational labs adopting AI driven in silico workflows to speed up early R&D.

🇺🇸 US – 💰 Series A

What they do:

  • Noetik is building a foundation model for immunology that learns how immune cells behave in health and disease using single cell and multi modal biological data.
  • The model captures patterns across cell states, signaling pathways and tissue environments so researchers can understand immune mechanisms with more clarity.
  • Their platform pairs this model with experimental feedback loops to guide the discovery of immune based therapies and next generation drug targets.

Use cases and customers:

  • The technology supports discovery of novel targets in immuno oncology, autoimmune disorders and other areas where mapping immune cell behavior is essential.
  • Research teams apply the model to stratify patient populations, predict treatment responses and generate hypotheses that can be tested in the lab.
  • Typical users include biotech companies, pharmaceutical R&D groups and research labs focused on immune driven diseases and therapeutic development.

🇪🇺 Europe – 🇫🇷 Fra – 💵 Seed

What they do:

  • Arca Science develops foundation models for scientific discovery that learn from multimodal experimental data so researchers can model complex biological systems with greater depth.
  • Their models integrate data from sources such as microscopy, molecular assays and phenotypic measurements to capture relationships that span scales from molecules to cells.
  • The platform is designed to scale analysis across heterogeneous datasets so insights can be derived more rapidly than with traditional methods.

Use cases and customers:

  • Research teams use the technology to accelerate hypothesis generation, identify patterns in experimental results and guide next steps in discovery workflows.
  • The models help scientists prioritise experiments, understand mechanisms and explore complex biological phenomena that are difficult to interpret manually.
  • Customers include biotech research groups, academic labs and pharmaceutical discovery teams that need AI driven support to make sense of rich scientific data.

🇺🇸 US – 💵 Seed

What they do:

  • Achira AI is building atomistic foundation simulation models that blend machine learning with physics, quantum chemistry and statistical mechanics to power next generation drug discovery.
  • Their models learn to simulate molecular dynamics with high fidelity so synthetic datasets can be generated at scale without relying solely on costly experimental data.
  • The platform aims to transform drug discovery into a computation-centric inverse design process where predictive simulations guide molecular design and accelerate therapeutic candidate identification.

Use cases and customers:

  • Teams use the technology to shorten the drug discovery cycle by predicting molecular behaviour, interactions and stability before committing to laboratory experiments. 
  • The models support design of novel therapeutic compounds and exploration of biomolecular systems that are difficult to analyse with traditional force field and physics based simulation tools.
  • Potential customers include pharmaceutical researchers, biotech discovery groups and organisations seeking to couple advanced computational simulation with experimental validation to accelerate R&D.

🇪🇺 Europe – 🇬🇧 UK – 💵 Seed

What they do:

  • Prima Mente is a frontier biology AI lab building general purpose biological foundation models with a focus on understanding, protecting and enhancing the human brain.
  • Their flagship family of models, Pleiades, is trained on very large human epigenomic and molecular datasets so it can capture patterns in DNA methylation and other regulatory layers of biology.
  • The platform combines these models with proprietary data generation and experimental workflows so insights and predictions can be translated into research, diagnostics and potentially therapeutic applications.

Use cases and customers:

  • Scientists use the models to decode complex biological signals such as epigenetic changes and cell type-specific molecular patterns that are linked to neurological diseases.
  • The technology supports clinical research and discovery efforts including early detection of neurodegeneration, biomarker discovery and in silico exploration of disease mechanisms.
  • Customers and collaborators include research organisations, academic labs and health science teams focused on Alzheimer’s, dementia and other complex diseases where standard tools struggle to reveal underlying biology. 

🇪🇺 Europe – 🇵🇱 Pol

What they do:

  • Ingenix.ai builds multimodal, multiscale foundation models that simulate clinical trials from molecular interactions up through population level outcomes so predictions about drug efficacy and safety can be generated.
  • Their core models are trained on diverse biological data including genomic, transcriptomic, proteomic, imaging and phenotypic inputs and combine low level predictions such as molecular pathway behaviour with high level forecasts of endpoints and adverse events.
  • The platform offers a co-pilot interface where researchers can interact with the model in natural language to explore predicted clinical results and underlying biological reasoning.

Use cases and customers:

  • The technology supports pharmaceutical and biotech teams by accelerating drug development planning, enabling more predictable trial design and reducing reliance on costly and slow real world experiments.
  • Clinical research groups use the models to forecast outcomes, triage candidate compounds and identify potential risks before human testing.
  • Potential customers include pharmaceutical companies, biotech innovators and clinical development organisations seeking AI-driven tools to improve decision making across the drug R&D pipeline.

🇺🇸 US – 💸 Series B+

What they do:

  • ConcertAI builds large AI models that learn from extensive real world clinical data, especially in oncology, so insights and predictions can be generated across research and care workflows.
  • Their core AI technology, often referred to collectively as CARAai and the Precision Suite, fuses generative, predictive and agentic capabilities with multimodal patient and trial data.
  • The platform integrates real world evidence, clinical records, imaging and longitudinal patient data so AI can support advanced analytics, forecasting, cohort matching and other domain-specific tasks for healthcare and life sciences users.

Use cases and customers:

  • Life science organisations use the models to accelerate clinical development by optimising trial design, speeding patient matching, analysing real world evidence and informing regulatory planning.
  • Oncology teams apply the technology to derive insights about treatment patterns, patient outcomes and disease progression which can support precision medicine and clinical decision augmentation.
  • Customers include pharmaceutical sponsors, biotechnology R & D groups, healthcare providers and research networks that need AI driven analytics and predictive tools grounded in deep, real world clinical datasets. 

🇨🇦 Can – 💵 Seed

What they do:

  • Variational AI builds Enki, a generative foundation model for small molecule drug discovery that learns from large sets of chemical and biological data to generate entirely new, synthesizable compounds that meet multiple therapeutic criteria.
  • Their models are trained to optimise across many properties simultaneously such as potency, selectivity, toxicity and pharmacokinetics which lets them explore chemical space more broadly than traditional screening methods.
  • The platform integrates this core model into discovery workflows so researchers can define a target profile and receive novel candidate molecules quickly rather than relying on conventional library screening or manual design.

Use cases and customers:

  • The technology accelerates early stage drug discovery by enabling rapid generation and optimisation of candidate compounds that match defined therapeutic priorities.
  • Biopharma teams and research partners use the platform for de novo design of molecules in areas such as oncology, immunology and inflammation where novel chemotypes are needed.
  • Potential customers include pharmaceutical companies and biotechnology innovators seeking to shorten development timelines and reduce the cost and risk associated with small molecule discovery.

🇺🇸 US – 💰 Series A

What they do:

  • Zephyr AI builds multimodal foundation models trained on very large real world clinical and clinicogenomic datasets so complex biological and health signals can be understood with depth.
  • Their models combine clinical, molecular and phenotypic information to support predictive analytics, biomarker discovery and patient stratification across research and care.
  • The platform integrates these models into tools used throughout the therapeutic lifecycle so insights can guide decisions in drug development, diagnostics and precision medicine.

Use cases and customers:

  • Biopharma teams use the technology to optimise clinical trial design, improve patient matching and identify new biomarkers during early research.
  • Precision medicine groups rely on the models to stratify populations, forecast treatment response and refine diagnostic pathways with richer evidence.
  • Customers include pharmaceutical companies, biotech developers, diagnostics innovators and research organisations that need AI driven insight from complex clinical datasets.

🇺🇸 US – Acquired by Biohub

What they do:

  • EvolutionaryScale builds large generative biological foundation models such as ESM3 that learn from billions of protein sequences so AI can understand and design new biological molecules.
  • Their models treat protein sequences as the language of life and can propose entirely new proteins with desired properties by reasoning about sequence, structure and function together.
  • The platform provides access to these models through tools and APIs so research teams can fine tune them with proprietary data and integrate them into discovery workflows.

Use cases and customers:

  • Biotech and pharmaceutical researchers use the technology to accelerate drug discovery by generating candidate proteins, optimising functions and exploring vast biological design spaces.
  • Teams in synthetic biology and materials science rely on the models to design enzymes, industrial proteins and novel molecules for therapeutic or environmental applications.
  • Customers include life science companies, academic labs and research organisations that need advanced AI for understanding and engineering biological systems at scale.

🇺🇸 US – 💸 Series B+

What they do:

  • Recursion develops foundation models that learn from vast, multimodal experimental data including high-content cellular imaging, genetic perturbations and chemical screening so complex biological patterns can be understood by AI.
  • Their models integrate visual phenotypes with molecular and perturbation data to predict mechanisms of action, suggest novel targets and prioritise therapeutic hypotheses.
  • The platform embeds these foundation models into a high throughput experimental pipeline where AI model outputs directly inform wet lab experiments and iterative learning cycles.

Use cases and customers:

  • Drug discovery teams use the technology to accelerate target identification, compound prioritisation and mechanism prediction across large chemical libraries.
  • Research scientists rely on the models to interpret phenotypic effects of perturbations, detect subtle signatures of activity and propose biologically plausible hypotheses.
  • Customers include pharmaceutical companies, biotech innovators and internal research organisations that leverage AI-driven insight to reduce discovery timelines and improve R&D efficiency.

🇮🇱 Isr – 💵 Seed

What they do:

  • Synaptiflora builds a foundation model called SynaptiCore that learns from microbiome profiles, clinical variables, genomic data and drug information to map relationships between microbial ecosystems, host biology and therapeutic outcomes.
  • The model is trained on a high resolution proprietary microbiome database that captures bacteria, fungi, viruses and other microbial entities so predictive patterns can be identified with precision.
  • Their platform provides structured and explainable insights that link microbial features, biochemical pathways and clinical context to support drug development and precision health applications.

Use cases and customers:

  • Pharmaceutical and biotech teams use the technology to improve drug development outcomes by predicting patient response, enhancing stratification and identifying biomarkers tied to efficacy or safety.
  • The model supports clinical decision making and responder analysis by turning complex microbiome data into actionable indicators that relate to disease dynamics.
  • Customers include pharma sponsors, precision health providers, research groups in immunology and oncology and organisations in nutrition or agriculture that benefit from microbiome driven insights.

🇺🇸 US – 💰 Series A

What they do:

  • Somite AI builds large multimodal foundation models that learn from diverse data sources including genetic, imaging and experimental datasets so biological systems can be modelled holistically.
  • Their models are trained to extract deep insights about mechanisms, interactions and phenotypes which enables predictive tasks across discovery workflows.
  • The platform integrates these foundation models into tools for hypothesis generation, target prioritisation and simulation of biological effects so researchers can explore complex questions quickly.

Use cases and customers:

  • Drug discovery teams use the technology to surface potential targets, forecast responses and prioritise compounds early in the development pipeline.
  • Researchers rely on the models to decode multimodal signals from experiments and identify patterns that might be hidden to conventional methods.
  • Customers include biotech companies, pharmaceutical R&D groups and academic labs that need advanced AI support to accelerate biological research and development.

🇪🇺 Europe – 🇬🇧 UK – 💸 Series B+

What they do:

  • Isomorphic Labs builds large AI models that learn the underlying principles of biology and chemistry so molecular interactions and behaviours can be predicted with high accuracy.
  • Their models extend the capabilities of AlphaFold to understand proteins, DNA, RNA, ligands and other biomolecules in a unified framework that supports drug design.
  • The platform integrates these models into an end to end AI driven discovery engine that guides target selection, molecule design and candidate optimisation.

Use cases and customers:

  • Pharmaceutical research teams use the technology to accelerate target identification, predict molecular interactions and reduce reliance on slow experimental cycles.
  • Biotech organisations rely on the models to explore chemical space, assess potential drug candidates and prioritise compounds across many therapeutic areas.
  • Customers include large pharma partners and internal R&D groups that need scalable AI systems to support discovery workflows and advance AI designed molecules toward development.

🇨🇦 Can – Acquired

What they do:

  • Valence Labs is the AI research engine of Recursion focused on creating predictive and mechanistic models of cellular biology that can simulate how cells respond to genetic and chemical perturbations at scale.
  • Their work builds on multimodal foundation models trained on high-dimensional biological data — including phenomics, transcriptomics and other interventional datasets — so models can go beyond simple pattern recognition to infer causal mechanisms of cellular function.
  • The platform’s vision is to construct “virtual cell” models that predict, explain and guide experimental hypotheses about cellular response, effectively serving as a general modelling layer for biological discovery.

Use cases and customers:

  • Biotechnology and pharmaceutical research teams use the models to simulate cellular functional responses, prioritise therapeutic hypotheses and reduce the need for costly experimental cycles.
  • Research groups rely on these foundation models to uncover mechanistic insights about how perturbations impact cellular systems across conditions and contexts.
  • Customers include internal R&D units at Recursion and potentially partner organisations in drug discovery and translational biology seeking predictive, mechanistically-informed AI to accelerate discovery. 


Posted

in

by