Hot Topics #3 (June 13, 2022)
Tumor evolution, protein design+prediction, neurodegeneration modelling, and tips for training large NNs.
Tumor-immune metaphenotypes orchestrate an evolutionary bottleneck that promotes metabolic transformation; West et al.; June 4, 2022
Abstract: Metabolism plays a complex role in the evolution of cancerous tumors, including inducing a multifaceted effect on the immune system to aid immune escape. Immune escape is, by definition, a collective phenomenon by requiring the presence of two cell types interacting in close proximity: tumor and immune. The microenvironmental context of these interactions is influenced by the dynamic process of blood vessel growth and remodelling, creating heterogeneous patches of well-vascularized tumor or acidic niches. Here, we present a multiscale mathematical model that captures the phenotypic, vascular, microenvironmental, and spatial heterogeneity which shape acid-mediated invasion and immune escape over a biologically-realistic time scale. The model explores several immune escape mechanisms such as i) acid inactivation of immune cells, ii) competition for glucose, and iii) inhibitory immune checkpoint receptor expression (PD-L1). We also explore the efficacy of anti-PD-L1 and sodium bicarbonate buffer agents for treatment. To aid in understanding immune escape as a collective cellular phenomenon, we define immune escape in the context of six collective phenotypes (termed "meta-phenotypes"): self-acidify, mooch acid, PD-L1 attack, mooch PD-L1, proliferate fast, and starve glucose. Fomenting a stronger immune response leads to initial benefits (additional cytotoxicity), but this advantage is offset by increased cell turnover that leads to accelerated evolution and the emergence of aggressive phenotypes. This creates a bimodal therapy landscape: either the immune system should be maximized for complete cure, or kept in check to avoid rapid evolution of invasive cells. These constraints are dependent on heterogeneity in vascular context, microenvironmental acidification, and the strength of immune response. This model helps to untangle the key constraints on evolutionary costs and benefits of three key phenotypic axes on tumor invasion and treatment: acid-resistance, glycolysis, and PD-L1 expression. The benefits of concomitant anti-PD-L1 and buffer treatments is a promising treatment strategy to limit the adverse effects of immune escape.
A deep unsupervised language model for protein design; Ferruz et al.; March 12, 2022
Abstract: Protein design aims to build new proteins from scratch thereby holding the potential to tackle many environmental and biomedical problems. Recent progress in the field of natural language processing (NLP) has enabled the implementation of ever-growing language models capable of understanding and generating text with human-like capabilities. Given the many similarities between human languages and protein sequences, the use of NLP models offers itself for predictive tasks in protein research. Motivated by the evident success of generative Transformer-based language models such as the GPT-x series, we developed ProtGPT2, a language model trained on protein space that generates de novo protein sequences that follow the principles of natural ones. In particular, the generated proteins display amino acid propensities which resemble natural proteins. Disorder and secondary structure prediction indicate that 88% of ProtGPT2-generated proteins are globular, in line with natural sequences. Sensitive sequence searches in protein databases show that ProtGPT2 sequences are distantly related to natural ones, and similarity networks further demonstrate that ProtGPT2 is sampling unexplored regions of protein space. AlphaFold prediction of ProtGPT2-sequences yielded well-folded non-idealized structures with embodiments as well as large loops and revealed new topologies not captured in current structure databases. ProtGPT2 has learned to speak the protein language. It has the potential to generate de novo proteins in a high throughput fashion in a matter of seconds. The model is easy-to-use and freely available.
Protein complex prediction with AlphaFold-Multimer; Evans et al.; October 4, 2021
Abstract: While the vast majority of well-structured single protein chains can now be predicted to high accuracy due to the recent AlphaFold [1] model, the prediction of multi-chain protein complexes remains a challenge in many cases. In this work, we demonstrate that an AlphaFold model trained specifically for multimeric inputs of known stoichiometry, which we call AlphaFold-Multimer, significantly increases accuracy of predicted multimeric interfaces over input-adapted single-chain AlphaFold while maintaining high intra-chain accuracy. On a benchmark dataset of 17 heterodimer proteins without templates (introduced in [2]) we achieve at least medium accuracy (DockQ [3] ≥ 0.49) on 14 targets and high accuracy (DockQ ≥ 0.8) on 6 targets, compared to 9 targets of at least medium accuracy and 4 of high accuracy for the previous state of the art system (an AlphaFold-based system from [2]). We also predict structures for a large dataset of 4,433 recent protein complexes, from which we score all non-redundant interfaces with low template identity. For heteromeric interfaces we successfully predict the interface (DockQ ≥ 0.23) in 67% of cases, and produce high accuracy predictions (DockQ ≥ 0.8) in 23% of cases, an improvement of +25 and +11 percentage points over the flexible linker modification of AlphaFold [4] respectively. For homomeric interfaces we successfully predict the interface in 69% of cases, and produce high accuracy predictions in 34% of cases, an improvement of +5 percentage points in both instances.
Modeling Neurodegeneration in silico With Deep Learning; Tuladhar et al.; November 19, 2021
Abstract: Deep neural networks, inspired by information processing in the brain, can achieve human-like performance for various tasks. However, research efforts to use these networks as models of the brain have primarily focused on modeling healthy brain function so far. In this work, we propose a paradigm for modeling neural diseases in silico with deep learning and demonstrate its use in modeling posterior cortical atrophy (PCA), an atypical form of Alzheimer’s disease affecting the visual cortex. We simulated PCA in deep convolutional neural networks (DCNNs) trained for visual object recognition by randomly injuring connections between artificial neurons. Results showed that injured networks progressively lost their object recognition capability. Simulated PCA impacted learned representations hierarchically, as networks lost object-level representations before category-level representations. Incorporating this paradigm in computational neuroscience will be essential for developing in silico models of the brain and neurological diseases. The paradigm can be expanded to incorporate elements of neural plasticity and to other cognitive domains such as motor control, auditory cognition, language processing, and decision making.
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model; Wang et al.; January 5, 2017
Abstract: Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction.
Model-based reinforcement learning for biological sequence design; Angermueller et al.; May 10, 2020
Abstract: The ability to design biological structures such as DNA or proteins would have considerable medical and industrial impact. Doing so presents a challenging black-box optimization problem characterized by the large-batch, low round setting due to the need for labor-intensive wet lab evaluations. In response, we propose using reinforcement learning (RL) based on proximal-policy optimization (PPO) for biological sequence design. RL provides a flexible framework for optimization generative sequence models to achieve specific criteria, such as diversity among the high-quality sequences discovered. We propose a model-based variant of PPO, DyNA-PPO, to improve sample efficiency, where the policy for a new round is trained offline using a simulator fit on functional measurements from prior rounds. To accommodate the growing number of observations across rounds, the simulator model is automatically selected at each round from a pool of diverse models of varying capacity. On the tasks of designing DNA transcription factor binding sites, designing antimicrobial proteins, and optimizing the energy of Ising models based on protein structure, we find that DyNA-PPO performs significantly better than existing methods in settings in which modeling is feasible, while still not performing worse in situations in which a reliable model cannot be learned.
Techniques for Training Large Neural Networks; Wang & Brockman; June 9, 2022
Abstract: Large neural networks are at the core of many recent advances in AI, but training them is a difficult engineering and research challenge which requires orchestrating a cluster of GPUs to perform a single synchronized calculation. As cluster and model sizes have grown, machine learning practitioners have developed an increasing variety of techniques to parallelize model training over many GPUs. At first glance, understanding these parallelism techniques may seem daunting, but with only a few assumptions about the structure of the computation these techniques become much more clear—at that point, you’re just shuttling around opaque bits from A to B like a network switch shuttles around packets.
Accurate prediction of protein structures and interactions using a three-track neural network; Baek et al.; August 19, 2021
Abstract: DeepMind presented notably accurate predictions at the recent 14th Critical Assessment of Structure Prediction (CASP14) conference. We explored network architectures that incorporate related ideas and obtained the best performance with a three-track network in which information at the one-dimensional (1D) sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The three-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging x-ray crystallography and cryo–electron microscopy structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate protein-protein complex models from sequence information alone, short-circuiting traditional approaches that require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research.
Artificial intelligence reveals nuclear pore complexity; Mosalaganti et al.; November 2, 2021
Abstract: Nuclear pore complexes (NPCs) mediate nucleocytoplasmic transport. Their intricate 120 MDa architecture remains incompletely understood. Here, we report a near-complete structural model of the human NPC scaffold with explicit membrane and in multiple conformational states. We combined AI-based structure prediction with in situ and in cellulo cryo-electron tomography and integrative modeling. We show that linker Nups spatially organize the scaffold within and across subcomplexes to establish the higher-order structure. Microsecond-long molecular dynamics simulations suggest that the scaffold is not required to stabilize the inner and outer nuclear membrane fusion, but rather widens the central pore. Our work exemplifies how AI-based modeling can be integrated with in situ structural biology to understand subcellular architecture across spatial organization levels.
Structure of cytoplasmic ring of nuclear pore complex by integrative cryo-EM and AlphaFold; Fontana et al.; June 10, 2022
Abstract: The nuclear pore complex (NPC) is the conduit for bidirectional cargo traffic between the cytoplasm and the nucleus. We determined a near-complete structure of the cytoplasmic ring of the NPC from Xenopus oocytes using single-particle cryo–electron microscopy and AlphaFold prediction. Structures of nucleoporins were predicted with AlphaFold and fit into the medium-resolution map by using the prominent secondary structural density as a guide. Certain molecular interactions were further built or confirmed by complex prediction by using AlphaFold. We identified the binding modes of five copies of Nup358, the largest NPC subunit with Phe-Gly repeats for cargo transport, and predicted it to contain a coiled-coil domain that may provide avidity to assist its role as a nucleation center for NPC formation under certain conditions.
Visualizing Molecular Structure with Weights & Biases; View COVID-19 protein complexes in the browser.
De Novo Molecule Generation with GCPNs using TorchDrug. A tutorial using TorchDrug to generate molecules with some neat visualizations.
TorchDrug: A powerful and flexible machine learning platform for drug discovery