Hot Topics #33

BoltzGen, PPI prediction, Virtual Cell Challenge, and more.

Nov 09, 2025

alt text — BoltzGen header image from the GitHub repository.

BoltzGen: Stark et al.: Oct 26, 2025

Abstract: We introduce BoltzGen, an all-atom generative model for designing proteins and peptides across all modalities to bind a wide range of biomolecular targets. BoltzGen builds strong structural reasoning capabilities about target-binder interactions into its generative design process. This is achieved by unifying design and structure prediction, resulting in a single model that also reaches state-of-the-art folding performance. BoltzGen’s generation process can be controlled with a flexible design specification language over covalent bonds, structure constraints, binding sites, and more. We experimentally validate these capabilities in a total of eight diverse wetlab design campaigns with functional and affinity readouts across 26 targets. The experiments span binder modalities from nanobodies to disulfide-bonded peptides and include targets ranging from disordered proteins to small molecules. For instance, we test 15 nanobody and protein binder designs against each of nine novel targets with low similarity to any protein with a known bound structure. For both binder modalities, this yields nanomolar binders for 66% of targets. We release model weights, data, and both inference and training code at: https://github.com/HannesStark/boltzgen.

Predicting protein-protein interactions in the human proteome: Zhang et al.: Sep. 25, 2025

Abstract: Protein-protein interactions (PPIs) are essential for biological function. Coevolutionary analysis and deep-learning (DL)–based protein structure prediction have enabled comprehensive PPI identification in bacteria and yeast, but these approaches have had limited success for the more complex human proteome. We overcame this challenge by enhancing the coevolutionary signals with sevenfold-deeper multiple sequence alignments harvested from 30 petabytes of unassembled genomic data and developing a new DL network trained on augmented datasets of domain-domain interactions from 200 million predicted protein structures. We systematically screened 200 million human protein pairs and predicted 17,849 interactions with an expected precision of 90%, of which 3631 interactions were not identified in previous experimental screens. Three-dimensional models of these predicted interactions provide numerous hypotheses about protein function and mechanisms of human diseases.

Calibrating Generative Models: Smith et al.: Oct. 11, 2025

Abstract: Generative models frequently suffer miscalibration, wherein class probabilities and other statistics of the sampling distribution deviate from desired values. We frame calibration as a constrained optimization problem and seek the closest model in Kullback-Leibler divergence satisfying calibration constraints. To address the intractability of imposing these constraints exactly, we introduce two surrogate objectives for fine-tuning: (1) the relax loss, which replaces the constraint with a miscalibration penalty, and (2) the reward loss, which converts calibration into a reward fine-tuning problem. We demonstrate that these approaches substantially reduce calibration error across hundreds of simultaneous constraints and models with up to one billion parameters, spanning applications in protein design, image generation, and language modeling.

ODesign: A World Model for Biomolecular Interaction Design: Zhang et al.

Abstract: Biomolecular interactions underpin almost all biological processes, and their rational design is central to programming new biological functions. Generative AI models have emerged as powerful tools for molecular design, yet most remain specialized for individual molecular types and lack fine-grained control over interaction details. Here we present ODesign, an all-atom generative world model for all-to-all biomolecular interaction design. ODesign allows scientists to specify epitopes on arbitrary targets and generate diverse classes of binding partners with fine-grained control. Across entity-, token-, and atom-level benchmarks in the protein modality, ODesign demonstrates superior controllability and performance to modality-specific baselines. Extending beyond proteins, it generalizes to nucleic acid and small-molecule design, enabling interaction types such as protein-binding RNA/DNA and RNA/DNA-binding ligands that were previously inaccessible. By unifying multimodal biomolecular interactions within a single generative framework, ODesign moves toward a general-purpose molecular world model capable of programmable design.

Which pLM to choose?: Senoner et al.: Oct. 31, 2025

Abstract: Protein-language models (pLMs) provide a novel means for mapping the protein space. Which of these new maps best advances specific biological analyses, however, is not obvious. To elucidate the principles of model selection, we benchmarked fourteen pLMs, spanning several orders of magnitude in parameter count, across a hundred million protein pairs, to assess how well they capture sequence, structure, and function similarity. For each model, we distinguish inherent information, i.e. signal recoverable from raw-embedding distances, and extractable information, i.e. signal revealed through additional supervised training.

Three key results emerge. First, pLM protein representation space is inherently different from the space of biological protein representations, i.e. sequences or structures. Here, a size-performance paradox is salient – mid-scale foundation models are as good as much larger ones in reflecting all tested biological properties. Second, pLM representations compress and store biological information in proportion to model size. That is, a lightweight feed-forward network can be trained on embedding pairs to predict said biological properties well – a capacity dividend. Finally, we observe that a task-specific learning radically reshapes the embedding space, gaining inherent understanding of the task, but garbling any further extractions.

In other words, smaller pLMs can provide efficient and compute-light general insight. Larger models are advantageous only when fine-tuning is planned to accomplish a specific task. Furthermore, representations generated by “specialist” models are not immediately generalizable throughout protein biology. Thus, for pLMs, bigger isn’t always better.

Virtual Cell Challenge: Advances in single-cell RNA-seq technologies now enable large-scale measurements of cellular responses to genetic and chemical perturbations, fueling this exciting era of predictive cellular modeling. Virtual Cell Challenge is a recurring, open, community-driven challenge aimed at evaluating and improving computational models that predict cellular responses to genetic or chemical perturbations. In 2025, the challenge will focus on context generalization: participants must predict the effects of perturbation in a held out cell type—the H1 human embryonic stem cell line. Using new experimental data we have generated for the Challenge, you will build a model that predicts these effects and submit the results to the Challenge leaderboard. The top three models will win prizes valued at $100,000, $50,000, and $25,000.

C2S: The Open Cellular Context ModelA biological LLM that learns the language of cells.: Cell2Sentence (C2S) is a family of large language models (LLMs) designed to understand, predict, and simulate biological systems. By learning the language of cells, C2S can analyze cellular responses, predict drug effects, discover new therapeutic pathways, and advance our understanding of biological processes at the molecular level.

The Merge

Discussion about this post

Ready for more?