AI in Biology: From Smarter CRISPR to Designing Proteins Nature Never Tried
- Vasili Balios
- Nov 11, 2025
- 7 min read

Artificial intelligence is quickly becoming a standard part of the biologist’s toolkit. Over the past decade, routine wet lab workflows have been joined by models that learn from large amounts of sequencing, imaging, and biochemical data spotting patterns which no human could scan by eye and proposing experiments that would have taken months of trial and error. Much like statistics, PCR and cloning moved from specialist skills to everyday methods, AI is seeing the very same shift.
At a high level, AI helps in two complementary ways. Predictive models read biology’s existing information, inferring structure from sequence, predicting the effects of mutations, ranking guide RNAs and/or primers, and flagging off-targets or toxic designs in specific organisms. Generative models go a step further, they write new options, designing sequences, motifs, or entire proteins that satisfy constraints we set (stability, specificity, activity, deliverability), even when evolution never explored those solutions. Wrapped inside lab automation and design build test learn (DBTL) loops, these models don’t replace experiments, they make each experiment more targeted and impactful.
You’ve already seen AI’s fingerprints across the life sciences: single cell atlases denoised and clustered by machine learning, image based phenotyping accelerated by computer vision, structure prediction transforming structural biology, and statistical models that calibrate risk in clinical genomics. In protein science, the impact is especially tangible. AI now improves the proteins we already rely on, in this post I will discuss mainly recent research which focuses on CRISPR editors and their guides, by boosting on-target activity, reducing off-targets, mapping PAMs in the right cellular context, and suggesting smarter variants to test. In parallel, AI increasingly designs proteins with properties and functions that don’t occur in nature, unlocking bespoke enzymes, binders, and genome engineering tools tailored to real world constraints.
This post walks through both fronts:
how AI sharpens today’s genome editing tools (with recent CRISPR papers as examples), and
how AI enables de novo and AI guided protein design for novel functions.
Along the way, I will highlight practical entry points for biologists, what to expect from common models, how to interpret their outputs, and where cell context assays and DBTL loops fit. The bottom line is you don’t need to be a machine learning expert to benefit. A basic understanding will soon be as routine as running a PCR, and AI will sit alongside cloning, sequencing, and microscopy as a standard method in the working biologist’s toolbox.
1) AI makes CRISPR better, faster, and safer
Smarter guide design and outcome prediction. Machine learning models now predict which gRNAs will cut well, where off-targets are most likely, and even what repair genotypes your experiment will produce, before you step into the lab. Reviews summarize how models like Rule Set 3, DeepSpCas9/DeepCpf1, inDelphi, FORECasT, SPROUT and others boost on-target activity, anticipate off-targets and forecast indel patterns across cell types. This turns gRNA picking and edit planning from guesswork into data guided design and it’s already improving reproducibility and safety in nuclease, base editing, and prime editing experiments (1).
Discovering and engineering the editors themselves. AI isn’t only for gRNAs, it increasingly helps find and optimize the proteins. A 2025 review details how deep models accelerate discovery of new CRISPR systems, guide editor miniaturization, and inform base/prime editing design by learning from large, heterogeneous datasets (and even by generating new sequences) (1). In parallel, structure prediction breakthroughs (AlphaFold2/3, RoseTTAFold) have reshaped how we reason about editor architecture and interactions.
Direct, cell context characterization of CRISPR PAMs with GenomePAM.
One of the practical bottlenecks in CRISPR engineering is PAM mapping, what sequences an enzyme can actually target in mammalian cells. GenomePAM solves this by using naturally repetitive sequences across the human genome as a built in, high diversity PAM library, enabling accurate PAM profiling for type II and V nucleases, comparisons of activity/fidelity across variants with a single gRNA, and even insights into chromatin accessibility, no protein purifications or synthetic libraries required. That means faster, truer to cell results for nuclease engineering and selection (2).
Take home for working biologists: with AI assisted design and cell native readouts like GenomePAM, you can plan edits more confidently, pick better editors and guides, and quantify trade offs (activity vs fidelity) before committing resources. This is already changing how CRISPR experiments are designed and executed. These are some ways in which AI is being utilized by biologists to help improve genetic engineering tools.
2) AI as a co-pilot for improving enzymes you already use
Autonomous enzyme engineering platforms.A generalized “self driving lab” loop, protein language models to seed libraries, robotics to build/test variants, and machine learning to pick the next designs, recently delivered big, practical gains with only a few hundred variants per enzyme. In four weekly rounds, one platform improved the Arabidopsis thaliana halide methyltransferase substrate preference by approximately 90 times and ethyl transferase activity by approximately 16 times and produced a Yersinia mollaretii phytase variant with 26 times higher activity at neutral pH, while sequencing confirmed approximately 95% mutagenesis accuracy across hundreds of constructs and each full build test measure cycle ran in about a week. That’s industrial scale optimization, accessible with just a sequence and a measurable fitness assay (3).
Why this matters: you don’t need bespoke heuristics for each protein anymore. With low N ML, modular automation, and LLM guided library design, you can iteratively push stability, rate, and specificity in weeks, not months, with fewer wet lab cycles.
3) Designing proteins that nature never made
Language model guided genome engineering tools. Mining eukaryotic genomes at scale and fine tuning protein language models can expand enzyme families and then push them beyond natural sequence space. Recent work on PiggyBac transposases uncovered approximately two orders of magnitude more diversity, validated multiple highly divergent active orthologs, and used a fine tuned pLLM (e.g. Progen2) to generate “mega active” synthetic variants compatible with T-cell engineering and Cas9 directed targeted integration. In some cases, the AI designed sequences improved targeted integration twofold, illustrating how generative models add function that bioprospecting alone might miss (4).
Fully computational enzymes that rival natural catalysts.A 2025 Nature study describes the full computational design of triosephosphate isomerase barrel Kemp eliminases, without large library screening. The initial designs matched respectable enzyme performance, and a single, rationally designed substitution lifted them to natural like levels (catalytic efficiency in the >10⁵ M⁻¹ s⁻¹ range and turnover around 30 s⁻¹). Notably, these proteins carry over 140 substitutions relative to natural sequences and expose design rules emphasizing stability, active site preorganization, and backbone diversity that challenge long held assumptions about what an active site “must” contain. This is a concrete milestone: high efficiency, new to nature catalysis, achieved largely in silico (5).
What this means for the field (and your lab)
AI is becoming routine.
From gRNA ranking to editor choice to enzyme optimization, AI reduces trial and error and exposes options you wouldn’t otherwise test. For CRISPR specifically, that spans gRNA/on-target prediction, off-target risk, edit outcome modeling, and even AI assisted discovery of new editors, all summarized in recent overviews.
Cell relevant measurements matter.
Methods like GenomePAM align CRISPR engineering with the mammalian context where you’ll actually deploy it, making PAM maps, activity/fidelity comparisons, and chromatin effects directly actionable (if working with mammalian cells). Thus, when working in a different context, like plants, the outcomes will be different due to the different context.
Design beyond nature is here.
Between autonomous engineering (closed loop DBTL) and de novo design that reaches natural like kinetics, the “search space” for proteins is no longer bounded by evolution’s sampling. That means bespoke tools for editing, delivery, metabolism, sensing, designed to spec.
A practical on ramp for biologists
Get comfortable with the vocabulary. Learn how to read feature importances for gRNA models, interpret edit outcome predictions, and understand calibration and uncertainty in ML outputs. (Treat model outputs like any instrument: with controls.)
Start “AI assisted, not AI only.” Pair a small, hypothesis driven library with an ML guided expansion. Validate early; let the model learn from your assay rather than replacing it.
Use cell native assays where possible. If you’re exploring a new nuclease or variant, methods like GenomePAM can save months by giving you true context specific PAMs and comparative fidelity data in one go.
Document and version your designs. Keep sequences, prompts, training data, and assay conditions under version control to ensure you (and collaborators) can reproduce or roll back.
Plan for scale. If you have a measurable fitness assay, consider partnering with an automation core or bio foundry to run an iterative DBTL loop, the recent demonstrations show it can pay off quickly.
The bottom line
AI is not replacing biologists; it’s empowering them. In CRISPR, it’s already standard practice to use models for gRNA selection, off-target analysis, and outcome prediction, and cell native platforms like GenomePAM are accelerating nuclease engineering in the exact context that matters. At the same time, autonomous engineering and de novo design are making it realistic to create proteins for tasks evolution never explored. These approaches have a high probability to become as common as PCR or standard cloning practices. Investing now in a basic understanding, enough to choose the right model, interpret its outputs, and design a sound validation plan, will pay off as AI becomes another method in the biological toolbox.
Want to learn more? Check out the references:
1. Kim M-G, Go M-J, Kang S-H, Jeong S-H, Lim K. Revolutionizing CRISPR technology with artificial intelligence. Experimental & Molecular Medicine. 2025;57(7):1419-31.
2. Yu M, Ai L, Wang B, Lian S, Ip L, Liu J, et al. GenomePAM directs PAM characterization and engineering of CRISPR-Cas nucleases using mammalian genome repeats. Nature Biomedical Engineering. 2025.
3. Singh N, Lane S, Yu T, Lu J, Ramos A, Cui H, Zhao H. A generalized platform for artificial intelligence-powered autonomous enzyme engineering. Nature Communications. 2025;16(1).
4. Ivančić D, Agudelo A, Lindstrom-Vautrin J, Jaraba-Wallace J, Gallo M, Das R, et al. Discovery and protein language model-guided design of hyperactive transposases. Nature Biotechnology. 2025.
5. Listov D, Vos E, Hoffka G, Hoch SY, Berg A, Hamer-Rogotner S, et al. Complete computational design of high-efficiency Kemp elimination enzymes. Nature. 2025;643(8074):1421-7.



Comments