Biochemistry and Molecular Biology
Penn State Science
You are here: Home Directory Arthur Lesk
Arthur Lesk

Arthur Lesk

Main Content

  • Professor of Biochemistry and Molecular Biology
203 South Frear Laboratory
University Park, PA 16802
Phone: (814) 865-4743

Research Interests

Protein structure, function and evolution

Research Summary

Major topics of my current research include (1) genomics, (2) protein structure, folding and evolution, and (3) the structures and functions of biological networks.

Genomes encode proteins synthesised by organisms.  (They contain lots more also.)  By comparing proteins encoded by genomes from related species – for instance mammoth and elephant – it is possible to understand how, in the short term at least, proteins evolve, and to study how changes in amino acid sequence reflect selective pressure or genetic drift.  In this work I collaborate with the groups of Profs. W. Miller and S. Schuster.

Although the detection of protein-coding genes in newly-sequenced genomes has received much attention, variable splicing in eukaryotic genes is a very serious complication.  I am examining specific families to try to understand the structural and functional consequences of alternative splicing.

Protein structure, function, and evolution
A major unsolved problem of molecular biology is how the amino acid sequences of proteins dictate their three-dimensional structures, and thereby their functions.  For a number of protein families, including the globins (Figure 1), cytochromes c, immunoglobulins, and chymotrypsin-family serine proteases, I and colleagues have studied of changes in amino acid sequence and analyzed how these mutations are reflected in changes in protein structure.  In general, evolution maintains at least a large portion of the protein topology, or folding pattern, intact.

On a larger scale, what accounts for the repertoire of protein structures that we see?  The difficulty is that the answer is partly physics and partly historical accident.  I have developed a “tableau” representation of protein folding patterns that offers the possibility of illuminating the possible varieties of protein folding patterns, and has applications to protein fold recognition and classification.

Analysis of structures of networks
Systems biology studies biological networks; for instance, transcription regulatory networks (Figure 2), and protein-protein interaction networks.  It is known that prevalent within regulatory networks are thematic small subnetworks, or “motifs”. My research is aimed at the next step in analysis of network structure: how are these small motifs connected up?  What are the functional implications of larger and larger combinations of subnetworks?

There is some analogy between the studies of proteins and of networks, if both are regarded as formed of hierarchical associations of smaller units.  In proteins the hierarchy involves primary structure (amino acid sequence), secondary structure (helices and sheets), tertiary structure (folding pattern of compact units), and quaternary structure (assembly of proteins from separate polypeptide chains). However, globular proteins are constrained by the linear assembly of the polypeptide chain; and by the limit of the number of atoms that can pack closely in space to form a compact structure.  The assemblies of networks are, as far as we know, unfettered by any such constraints.

Lesk fig 1


Figure 1.  Comparison of the folding patterns of Sperm Whale Myoglobin -- the first protein for which the structure was determined at high resolution, and a plant globin from Yellow Lupin.  Both contain a  similar assembly of helices.    Can you spot differences between these structures?   Look at the movie version at

Lesk fig 2

Figure 2.  Simplified sketch illustrating features of a typical segment of the pathways in the yeast regulatory network. Transcriptional regulators appear as circles. Target genes appear as squares. A transcriptional regulator typically has direct influence over about 50 genes, indicated by multiple connections from the filled black circle to the circles on the line below it. Roughly one in 10 of the neighbours of any node is connected to another neighbour, indicated by the horizontal arrow on the second row. The ultimate receptor of the signal lies at the end of a pathway typically containing about five intermediate nodes (shown in black). This ultimate target gene receives on average about two inputs. This diagram shows only a small fragment of a network that is very large and dense. (From Lesk, A.M. Introduction to Genomics.  Oxford University Press, 2007.)

Selected Publications

Introduction to Protein Science, second edition

Arthur M. Lesk
Oxford University Press, 2010

Introduction to Bioinformatics, third edition
Arthur M. Lesk
Oxford University Press, 2008

Introduction to Genomics
Arthur M. Lesk
Oxford University Press, 2007


  • Arun S. Konagurthu, Cyril F. Reboul, Jason W. Schmidberger, James A. Irving, Arthur M. Lesk, Peter J. Stuckey, James C. Whisstock and Ashley M. Buckle. Mustang-MR structural sieving server: applications in protein structural analysis and crystallography. PLoS One. 5, e10048 (2010).
  • Arun S. Konagurthu and Arthur M. Lesk Cataloging topologies of protein folding patterns.  J. Mol. Recognition 23, 253–257 (2010).
  • The mitochondrial genome sequence of the Tasmanian tiger (Thylacinus cynocephalus). Webb Miller, Daniela I. Drautz, Jan E. Janecka, Arthur M. Lesk, Aakrosh Ratan, Lynn P. Tomsho, Mike Packard, Yeting Zhang, Lindsay R. McClellan, Ji Qi, Fangqing Zhao, M. Thomas P. Gilbert, Love Dalén, Juan Luis Arsuaga, Per G. P. Ericson, Daniel H. Huson, Kristofer M. Helgen, William J. Murphy, Anders Götherström and Stephan C. Schuster  Genome Res. 19, 213–220 (2009).
  • Sequencing the nuclear genome of the extinct woolly mammoth. Webb Miller, Daniela I. Drautz, Aakrosh Ratan, Barbara Pusey, Ji Qi, Arthur M. Lesk, Lynn P. Tomsho, Michael D. Packard, Fangqing Zhao, Andrei Sher, Alexei Tikhonov, Brian Raney, Nick Patterson, Kerstin Lindblad-Toh, Eric S. Lander, James R. Knight, Gerard P. Irzyk, Karin M. Fredrikson, Timothy T. Harkins, Sharon Sheridan, Tom Pringle, and Stephan C. Schuster Nature, 456, 387-390 (2008).
  • Intraspecific phylogenetic analysis of Siberian woolly mammoths using complete mitochondrial genomes. M. Thomas P. Gilbert, Daniela I. Drautz, Arthur M. Lesk, Simon Y. W. Ho, Ji Qi, Aakrosh Ratan, Chih-Hao Hsu, Andrei Sher, Love Dalén, Anders Götherström, Lynn P. Tomsho, Snjezana Rendulic, Michael Packard, Paula F. Campos, Tatyana V. Kuznetsova, Fyodor Shidlovskiy, Alexei Tikhonov, Eske Willerslev, Paola Iacumin, Bernard Buigues, Per G. P. Ericson, Mietje Germonpré, Pavel Kosintsev, Vladimir Nikolaev, Malgosia Nowak-Kemp,James R. Knight, Gerard P. Irzyk, Clotilde S. Perbost, Karin M. Fredrikson, Timothy T. Harkins, Sharon Sheridan, Webb Miller, and Stephan C. Schuster Proc. Nat'l. Acad. Sci. (U.S.A.) 105, 8327-8332 (2008).
  • On the origin of distribution patterns of motifs in biological networks. Arun S. Konagurthu and Arthur M. Lesk BMC Systems Biology 2, 73-74 (2008).
  • Single and multiple input modules in regulatory networks. Arun S. Konagurthu and Arthur M. Lesk Proteins: Structure, Function, Bioinformatics. 73, 320-324 (2008).
  • Bioinformatics of protein function. Arthur M. Lesk, Vineet Sangar, Helen Parkinson and James C. Whisstock In: Structural Proteomics and its Impact on the Life Sciences, J.L. Sussman and I. Silman, eds. World Scientific Publishing, Singapore (2008), pp. 79-119.
  • Molecular graphics in structural biology. Arthur M. Lesk, Herbert J. Bernstein and Frances C. Bernstein In: Computational Structural Biology, Methods and applications, M. Peitsch, & T. Schwede, eds. World Scientific Publishing, Singapore (2008), pp. 729-770.
  • Correspondences between low-energy modes in enzymes: Dynamics-based alignment of enzymatic functional families. Andrea Zen, Vincenzo Carnevale, Arthur M. Lesk, and Cristian Micheletti. Protein Science 17, 918-929 (2008).
  • Structural search and retrieval using a tableau representation of protein folding patterns. Arun S. Konagurthu, Peter J. Stuckey, and Arthur M. Lesk. Bioinformatics 24, 645-651 (2008).
  • On the use of overlapping lattices for screening to find pairs of nearby points in two and three dimensions Vineet Sangar, Victor I. Lesk, and Arthur M. Lesk Computational Biology and Chemistry 32, 212-214 (2008).
  • 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser. (26 authors, including Arthur M. Lesk) Genome Res. 17, 1797-1808 (2007).
  • Evolutionary and biomedical insights from the rhesus macaque genome. Rhesus Macaque Genome Sequencing and Analysis Consortium (175 authors, including Arthur M. Lesk) Science 316, 222-234 (2007).
  • Protein structure, classification, and prediction Arthur M. Lesk in: Bioinformatics P.H. Dear, ed. Scion Publishing Ltd, Bloxham, U.K., 2007, pp. 169-194.
  • Quantitative sequence-function relationships in proteins based on Gene Ontology Vineet Sangar, Daniel J. Blankenberg, Naomi Altman and Arthur M. Lesk BMC Bioinformatics 8, 294 (2007).
  • Serpin conformations Mary C. Pearce, Robert N. Pike, Arthur M. Lesk and Stephen P. Bottomley In: Molecular and cellular aspects of the serpinopathies and disorders in serpin activity, G.A. Silverman and D.A. Lomas, eds. World Scientific, Singapore, 2007, pp. 35-66.
  • Serpins in prokaryotes Qingwei Zhang, Ruby Law, Ashley M. Buckle, Lisa Cabrita, Sheena McGowan,  James A. Irving, Noel G. Faux, Arthur M. Lesk, Stephen P. Bottomley and  James C. Whisstock In: Molecular and cellular aspects of the serpinopathies and disorders in serpin activity, G.A. Silverman and D.A. Lomas, eds. World Scientific, Singapore, 2007, pp. 131-162.
  • Contact patterns between helices and strands of sheet define protein folding patterns Akhil Kamat and Arthur M. Lesk Proteins: Structure, Function and Bioinformatics 66, 869-876 (2007).
  • Computational study of the fibril organization of polyglutamine repeats reveals a common motif identified in β-helices David Zanuy, Kannan Gunasekaran, Arthur M. Lesk and Ruth Nussinov  Journal of Molecular Biology 358, 330-345 (2006).
  • MUSTANG: A MUltiple STructural AligNment AlGorithm. Arun S. Konagurthu, Peter J. Stuckey,  James C. Whisstock, and Arthur M. Lesk Proteins: Structure, Function and Bioinformatics 64, 559-574 (2006).
  • What determines the spectrum of protein native state structures? Timothy Lezon, Jayanth R. Banavar, Arthur M. Lesk and Amos Maritan Proteins: Structure, Function and Bioinformatics 63, 273-277 (2006).
  • The evolution of the globins: We thought we understood it. Arthur M. Lesk
  • In: Structural Approaches to Sequence Evolution, U. Bastolla, M. Porto, H.E. Roman, and Vendruscolo, eds. Berlin: Springer-Verlag (2006).