Major topics of my current research include (1) genomics, (2) protein structure, folding and evolution, and (3) the structures and functions of biological networks.
Genomes encode proteins synthesised by organisms. (They contain lots more also.) By comparing proteins encoded by genomes from related species – for instance mammoth and elephant – it is possible to understand how, in the short term at least, proteins evolve, and to study how changes in amino acid sequence reflect selective pressure or genetic drift. In this work I collaborate with the groups of Profs. W. Miller and S. Schuster.
Although the detection of protein-coding genes in newly-sequenced genomes has received much attention, variable splicing in eukaryotic genes is a very serious complication. I am examining specific families to try to understand the structural and functional consequences of alternative splicing.
Protein structure, function, and evolution
A major unsolved problem of molecular biology is how the amino acid sequences of proteins dictate their three-dimensional structures, and thereby their functions. For a number of protein families, including the globins (Figure 1), cytochromes c, immunoglobulins, and chymotrypsin-family serine proteases, I and colleagues have studied of changes in amino acid sequence and analyzed how these mutations are reflected in changes in protein structure. In general, evolution maintains at least a large portion of the protein topology, or folding pattern, intact.
On a larger scale, what accounts for the repertoire of protein structures that we see? The difficulty is that the answer is partly physics and partly historical accident. I have developed a “tableau” representation of protein folding patterns that offers the possibility of illuminating the possible varieties of protein folding patterns, and has applications to protein fold recognition and classification.
Analysis of structures of networks
Systems biology studies biological networks; for instance, transcription regulatory networks (Figure 2), and protein-protein interaction networks. It is known that prevalent within regulatory networks are thematic small subnetworks, or “motifs”. My research is aimed at the next step in analysis of network structure: how are these small motifs connected up? What are the functional implications of larger and larger combinations of subnetworks?
There is some analogy between the studies of proteins and of networks, if both are regarded as formed of hierarchical associations of smaller units. In proteins the hierarchy involves primary structure (amino acid sequence), secondary structure (helices and sheets), tertiary structure (folding pattern of compact units), and quaternary structure (assembly of proteins from separate polypeptide chains). However, globular proteins are constrained by the linear assembly of the polypeptide chain; and by the limit of the number of atoms that can pack closely in space to form a compact structure. The assemblies of networks are, as far as we know, unfettered by any such constraints.
Figure 1. Comparison of the folding patterns of Sperm Whale Myoglobin -- the first protein for which the structure was determined at high resolution, and a plant globin from Yellow Lupin. Both contain a similar assembly of helices. Can you spot differences between these structures? Look at the movie version at http://www.bx.psu.edu/~aml2/swm2lh7mv.gif
Figure 2. Simplified sketch illustrating features of a typical segment of the pathways in the yeast regulatory network. Transcriptional regulators appear as circles. Target genes appear as squares. A transcriptional regulator typically has direct influence over about 50 genes, indicated by multiple connections from the filled black circle to the circles on the line below it. Roughly one in 10 of the neighbours of any node is connected to another neighbour, indicated by the horizontal arrow on the second row. The ultimate receptor of the signal lies at the end of a pathway typically containing about five intermediate nodes (shown in black). This ultimate target gene receives on average about two inputs. This diagram shows only a small fragment of a network that is very large and dense. (From Lesk, A.M. Introduction to Genomics. Oxford University Press, 2007.)