Data-driven protein engineering: learning the sequence-function mapping from experimental data
Dr. Philip Romero
University of California, San Francisco
Proteins are amazingly diverse molecules that are capable of performing a wide variety of chemical and biological tasks. Such versatility presents tremendous opportunities for solving challenging human problems that range from medicine and agriculture to environmental protection and industrial chemistry. Despite this great potential, our ability to design proteins with tailor-made functions has been impeded by our limited understanding of these complex molecules.
Rational protein engineering relies on accurate models that relate a protein's sequence to its function. However, many molecular properties are extremely difficult to model because they may be poorly understood or involve subtle, possibly dynamic, structural changes. In this talk, I will present an alternative modeling approach where statistical models are used to learn the relationship between protein sequence and function from experimental data. These data-driven methods are able to implicitly capture the numerous and possibly unknown factors that shape the sequence-function mapping. Using these models, I describe an adaptive protein design algorithm that can efficiently identify optimized protein sequences. I will finish by describing my current work in high-throughput experimentation and how new technologies are being used to generate protein sequence-function data sets of an unprecedented scale.
Phil Romero is currently a postdoctoral fellow in Adam Abate's lab at UCSF where he is developing microfluidic technologies for protein engineering. He obtained his B.S.E. and M.S. degrees from Tulane University in Biomedical Engineering and Molecular Biology, respectively. As a graduate student at Caltech, he worked in Frances Arnold's laboratory, where he engineered proteins for a variety of applications including medical imaging, cancer therapeutics, and biofuel production. His thesis research focused on developing new statistical methods that can learn the relationship between protein sequence and function from experimental data.