Annotated Bibliography

profilePMilan95
GreenbergEtal2008_2022.pdf

Evolutionary Constraint and Adaptation in the Metabolic Network of Drosophila

Anthony J. Greenberg, Sarah R. Stockwell, and Andrew G. Clark Department of Molecular Biology and Genetics, Cornell University

Organisms must carefully control their metabolism in order to survive. On the other hand, enzymes must adapt in response to evolutionary pressures on the pathways in which they are imbedded. Taking advantage of the newly available whole-genome sequences of 12 Drosophila species, we examined how protein function and metabolic network architecture influence rates of enzyme evolution. We found that despite high overall constraint, there were significant differences in rates of amino acid substitution among functional classes of enzymes. This heterogeneity arises because proteins involved in the metabolism of foreign compounds evolve relatively rapidly, whereas enzymes that act in ‘‘core’’ metabolism exhibit much slower rates of amino acid replacement, suggesting strong selective constraint. Network architecture also influences enzymes’ rates of amino acid replacement. In particular, enzymes that share metabolites with many other enzymes are relatively constrained, although apparently not because they are more likely to be essential. Our analyses suggest that this pattern is driven by strong constraint of enzymes acting at branch points in metabolic pathways. We conclude that metabolic network architecture and enzyme function separately affect enzyme evolution rates.

Introduction

Metabolism lies at the core of organismal survival. It is required for fundamental and highly conserved processes, such as energy generation. Yet, changes in metabolic func- tion accompany adaptation to novel environments (e.g., Berenbaum et al. 1996; Jones 2005). The genome sequen- ces of 12 Drosophila species provide an ideal opportunity to tease apart the opposing forces of conservation and adaptability in metabolic evolution.

Enzymes do not act in isolation. They are organized into a network, where enzymes are connected by the com- pounds they metabolize (Jeong et al. 2000; Wagner and Fell 2001; Stelling et al. 2002; Tanaka 2005). This network con- sists of modules, which loosely correspond to the tradition- ally recognized metabolic pathways (Schuster et al. 2000, 2002; Ravasz et al. 2002; Holme et al. 2003). An enzyme’s position in the network helps determine how dramatically a change in its activity will affect flux (rate of metabolite production) through pathways (Kacser and Burns 1973; Stephanopoulos et al. 1998; Stelling et al. 2002). This in- fluence is measured by control coefficients, which are high for enzymes with strong influence over flux (Kacser and Burns 1973). Metabolic control theory predicts that en- zymes at branch points have higher control coefficients than those in linear pathways and that enzymes catalyzing irre- versible reactions will have more influence on flux than re- versible reaction enzymes (Kacser and Burns 1973; Heinrich and Rapoport 1974).

Changes in the activity of enzymes with low control coefficients should be nearly neutral, especially in pathways that perform nonessential functions. Thus, if purifying selection is the dominant force, genes coding for such en- zymes should tolerate more amino acid-changing mutations over the course of evolution than genes coding for enzymes with high control coefficients (Wilson et al. 1977; Rausher et al. 1999). Consistent with this prediction, highly con- nected enzymes evolve slowly in yeast (Vitkup et al.

Key words: metabolic network, molecular evolution, codon substitution.

E-mail: [email protected].

Mol. Biol. Evol. 25(12):2537–2546. 2008 doi:10.1093/molbev/msn205 Advance Access publication September 17, 2008

� The Author 2008. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected]

2006), although perhaps not in bacteria (Hahn et al. 2004). The relative importance of positive and purifying selection is still unclear, however. Recent estimates for Drosophila suggest that at least a third, and possibly most, amino acid differences between closely related species are the result of the action of positive selection (Smith and Eyre-Walker 2002; Sawyer et al. 2003; Shapiro et al. 2007). Any analysis of the influence of metabolic network architecture on enzyme evolution patterns must therefore discriminate between genes that show signs of adaptive evolution and those that do not.

Other aspects of network architecture may also mod- ulate an enzyme’s influence over metabolic function. The number of connections an enzyme has to other enzymes via shared metabolites (‘‘degree’’), the number of pathways it participates in, the number of reactions it catalyzes, and the extent to which it serves as a conduit of information be- tween modules/pathways (‘‘betweenness’’) may all affect how sensitive an organism’s physiology is to changes in a given enzyme’s activity.

We set out to test comprehensively the effects of enzyme function and metabolic network architecture on enzyme evolution using the newly available genomic se- quence from 12 Drosophila species (Drosophila 12 Genomes Consortium 2007). We found that enzymes involved in metabolizing xenobiotic (foreign) compounds evolve sig- nificantly faster than average at the amino acid level. More- over, almost all enzymes involved in this process also participate in other pathways and significantly affect me- dian evolution rates for those pathways. Of the network ar- chitecture parameters, only enzyme degree is significantly correlated with rates of protein evolution: highly connected enzymes are relatively constrained, regardless of whether the connections are between or within pathways. Such en- zymes are not more likely to be essential, however, and have average rates of adaptive evolution. We conclude that metabolic network architecture has a measurable impact on enzyme evolution rates that is independent of the influence of enzyme function.

Methods

An expanded version of this section is available Sup- plementary Material online.

D ow

nloaded from https://academ

ic.oup.com /m

be/article/25/12/2537/1110609 by guest on 24 D ecem

ber 2021

2538 Greenberg et al.

Genome Sequence and Evolutionary Rate Estimation

We used maximum likelihood estimates of rates of amino acid (dN) and silent (dS) substitutions as well as the rate of amino acid change corrected for silent site diver- gence (x 5 dN/dS) for each gene. These estimates were calculated for six species most closely related to Drosophila melanogaster (D. melanogaster, Drosophila simulans, Drosophila sechellia, Drosophila yakuba, Drosophila erecta, and Drosophila ananassae) by Larracuente et al. (2008), based on the genome assemblies and gene models provided by the 12 genomes sequencing group (Drosophila 12 Genomes Consortium 2007). In addition to parameter estimation, Larracuente et al. (2008) performed tests of pos- itive selection (model M8 against model M7). We classified genes as evolving under positive selection if they satisfied the q value (Storey and Tibshirani 2003) cutoff of 0.1 (i.e., the expected fraction of true positives is 90%). For statis- tical tests, we coded positively selected genes as ‘‘1,’’ and the rest as ‘‘0.’’ To estimate rates of gene duplication, we took advantage of the fuzzy reciprocal BLAST of all D. melanogaster genes against all the remaining 11 genomes (the five mentioned above plus Drosophila pseudoobscura, Drosophila persimilis, Drosophila willistoni, Drosophila mojavensis, Drosophila virilis, and Drosophila grimshawi; Drosophila 12 Genomes Consortium 2007). Some of the genes had homologous clusters within species that could not be assigned an unambiguous ortholog in other species (for details, see Drosophila 12 Genomes Consortium 2007). If an enzyme was coded by any such genes, it was marked as having evidence of duplication and was coded as ‘‘1’’ for statistical tests. Otherwise, it was coded as ‘‘0.’’ Unless other- wise indicated, we considered a gene to be duplicated if there was evidence of multiple copies in any of the 12 species.

Metabolic Network Data

We downloaded data on the D. melanogaster metabolic network and pathway assignments from the KEGG database (Kanehisa and Goto 2000). We extracted information on D. melanogaster genes coding for enzymes and related it to gene information from FlyBase (http://flybase.net) and the data from the 12 genomes project via FBgn numbers. We were able to calculate evolutionary rates for 447 genes (between 23 and 128 genes in each ‘‘pathway group’’) that coded for enzymes found in the KEGG database.

We assume the network is undirected. Although a ma- jority (73%) of the reactions in our data set are effectively irreversible, mechanisms such as feedback regulation let the information travel both ways (Wagner and Fell 2001).

We also collected information on the phenotypes of mutations from FlyBase. Details of the classification of genes based on alleles listed in FlyBase are presented in Larracuente et al. (2008) and online. For partial correlation analysis, we only considered the essential (coded ‘‘1’’) and viable (coded ‘‘0’’) genes.

Statistical Tests

All statistical tests were performed using R (R Devel- opment Core Team 2006). To estimate the effect of each

network parameter on rates of protein evolution, we used partial correlations, which are defined as correlations be- tween pairs of variables calculated conditional on all other parameters (Whittaker 1990). To estimate the partial corre- lations, we calculated the pseudoinverse of correlation ma- trices, as implemented in the R package corpcor (Schäfer et al. 2006). Distributions of all the variables are highly ir- regular. Therefore, we used nonparametric and permutation tests to estimate P values (Davison and Hinkley 1997). For permutations, we used the boot package (Canty and Ripley 2006). All P values were two-tailed, except for the analyses of variance (ANOVAs) and Kruskal–Wallis tests, where they were right tailed. We corrected for multiple tests by controlling the false discovery rate (FDR; Benjamini and Hochberg 1995) at 5% unless noted otherwise.

A number of enzymes belong to several pathway groups and thus statistics calculated for each category are not independent. To calculate the P values in such cases (e.g., for the deviation of median x for each functional group from the data set-wide median), we modified our permutation test as follows. We randomly assigned the x values to each gene. We then recalculated the deviation for each group, keeping the relationship between genes and pathway groups constant for every permutation. Similarly, some enzymes are encoded by multiple genes either be- cause they are composed of multiple subunits or due to gene duplication. Because some variables (such as x) are as- signed to genes, whereas others (such as degree) pertain to enzymes, we had to modify the standard permutation tests to account for this (for details, see supplementary methods, Supplementary Material online). We did 9,999 permutations for all tests.

Results Genes Coding for Enzymes Evolve Slowly

Metabolic function is essential for survival, and many metabolic processes are highly conserved from bacteria to mammals. We thus wanted to test whether genes coding for enzymes are more constrained than other genes. To accom- plish this, we compared the distributions of the number of amino acid changes per silent substitution (x 5 dN/dS, estimated for the whole phylogeny of six species of the D. melanogaster family, see Methods; Larracuente et al. 2008) for the two groups of genes. Low values of x indicate that few amino acid-altering mutations have been fixed dur- ing evolution, compared with the number of silent (non– amino acid changing and thus nearly neutral) mutations fixed in the same gene. A gene with a low value of x has been constrained from changing its amino acid se- quence and thus codes for a protein that is not free to evolve.

Enzymes are indeed relatively constrained: the median x of enzymes is 0.045, whereas it is 0.066 for nonenzymes (Wilcoxon test P 5 5.7 � 10�24). However, genes in- volved in metabolic function are also slightly more likely to be essential (17% vs. 12% for the rest; Fisher’s exact test P 5 0.0157). We wanted to know whether enzymes are constrained simply because they are more essential. We di- vided genes into four classes: ‘‘essential,’’ ‘‘viable,’’ ‘‘no information,’’ and ‘‘no alleles’’ (for details, see Methods

D ow

nloaded from https://academ

ic.oup.com /m

be/article/25/12/2537/1110609 by guest on 24 D ecem

ber 2021

Evolution in the Metabolic Network 2539

FIG. 1.—Metabolic genes are more constrained than average.

and Larracuente et al. [2008]). We then compared enzyme- coding to nonenzyme-coding genes within each class. We found that metabolic genes were still more constrained than other genes (fig. 1). The ‘‘enzyme’’ effect was highly sig- nificant (two-way ANOVA: F 5 67.0, degree of freedom [df] 5 1, P 5 3.1 � 10�16; permutation P � 0.0001), whereas no interaction between the ‘‘enzyme’’ and ‘‘essen- tiality’’ terms was detectable (F 5 1.3, df 5 3, P 5 0.27). Furthermore, the difference between enzymes and nonen- zymes within each class was highly significant (Wilcoxon test P values ranging from 5.4 � 10�9 to 7.5 � 10�5).

Smaller x can result from either a decrease in amino acid changes (dN) or an increase in dS. In this case, it is due to a decrease in dN: median dN for enzymes is 0.077 versus 0.114 for nonenzymes (Wilcoxon test P 5 3.5 � 10�20), whereas the values for dS were 1.796 and 1.777 (Wilcoxon test P 5 0.36).

The decision to classify a gene as coding for a partic- ular enzyme is at least partially based on its homology to enzyme-coding genes in other species. Apparent high con- straint of enzymes could thus be due to ascertainment bias. To control for this, we repeated the analysis with the subset of Drosophila genes that have a human homolog (Blast E value � 10�10), thereby eliminating from the nonenzyme group the fast-evolving genes that could inflate the group’s score. We still found that enzymes are more constrained than nonenzymes. Median x for enzyme-encoding genes was 0.043, in contrast with 0.052 for the rest of the genes (Wilcoxon test P 5 9.9 � 10�5; two-way ANOVA en- zyme effect F 5 15.4, df 5 1, P 5 9.0 � 10�5; permuta- tion P 5 0.0005). We conclude that ascertainment bias is unlikely to fully explain our observations.

Xenobiotic-Detoxifying Enzymes Evolve Relatively Quickly

Despite overall high levels of constraint, enzyme- encoding genes vary considerably in their ability to accom- modate amino acid substitutions (x ranges from 0.0001 to 0.2833). To test if enzyme function affects evolutionary rates, we grouped enzymes into 11 functional categories (‘‘pathway groups’’) according to the KEGG classification (Kanehisa and Goto 2000). Each category encompasses a number of pathways with similar functions. Because some pathways contain few enzymes and assignment of enzymes to functionally related pathways is prone to error, grouping pathways should increase statistical power and reduce an- notation errors.

We compared median rates of amino acid change of genes in different pathway groups. As expected (fig. 2), me- dian x for enzymes involved in metabolism of xenobiotic compounds is the highest of all the groups. This is true for the phylogeny-wide estimate of dN/dS as well as for individual

D ow

nloaded from https://academ

ic.oup.com /m

be/article/25/12/2537/1110609 by guest on 24 D ecem

ber 2021

FIG. 2.—Rates of amino acid substitution for metabolic pathway groups. Pathway groups with nominally significant deviations of median x from the overall median (dashed line) are colored in gray. Pathway groups are as follows: 1, protein synthesis; 2, amino acid metabolism; 3, glycan biosynthesis; 4, nucleotide metabolism; 5, carbohydrate metabolism; 6, metabolism of cofactors and vitamins; 7, energy metabolism; 8, metabolism of other amino acids; 9, lipid metabolism; 10, secondary metabolites; and 11, metabolism of xenobiotics.

2540 Greenberg et al.

species (see supplementary fig. S1, Supplementary Material online).

Because about one-third of all enzymes belong to more than one category, the distributions of x for each group are not independent, and normal statistical tests, such as Kruskal–Wallis, are not applicable. We therefore devel- oped a permutation test that accounts for such nonindepen- dence (see Methods). Using this approach, we determined that amino acid substitution rate heterogeneity among path- way groups is indeed significant (Kruskal–Wallis v 2 5 22.3, permutation P 5 0.0078). This is at least partly due to rel- atively fast evolution of xenobiotic detoxification genes (median x 5 0.05 for the xenobiotic group vs. x 5 0.04 overall, two-tailed permutation P 5 0.0110), and these re- sults are robust to moderate levels of random annotation error (see Methods; supplementary fig. S2, Supplementary Material online). This prompted us to further investigate the effect of this group.

Almost all the proteins in the fast-evolving xenobiotics pathway group act in other processes. How strongly are other groups’ median amino acid substitution rates influ- enced by genes that detoxify foreign compounds? We found that there is a strong correlation between the median x for a pathway group and the fraction of the category’s genes that it shares with the xenobiotics group (Spearman’s q 5 0.76, P 5 0.0108). When we excluded the xenobiot- ics group from consideration but retained its genes in other pathways, we still saw significant heterogeneity in x among pathway groups (Kruskal–Wallis v 2 5 17.1654, permuta- tion P 5 0.0045). In contrast, when we eliminated all the genes that belong to this category, we found no significant differences among groups (Kruskal–Wallis v 2 5 11.0, per- mutation P 5 0.2167). Heterogeneity in amino acid substi- tution rates among pathway groups is thus entirely due to the genes that detoxify foreign compounds.

These observations suggest that interdependence among pathways may hinder optimization of metabolic function achievable through amino acid change. One way to alleviate this problem would be through gene dupli- cation (Lynch and Force 2000). We see some evidence that this occurs. No pathway group significantly deviates from the data set average in the fraction of its enzymes coded for by genes duplicated in at least one of the 12 Drosophila genomes we analyzed (for details, see Methods; fig. 3B). Nevertheless, there is a significant positive correlation be- tween the fraction of enzymes in a pathway group that be- long to more than one category and the fraction encoded by duplicated genes (Spearman’s q 5 0.673, P 5 0.0268). This result suggests that although some paralogs may act only in one pathway, all members of a cluster are annotated as performing each function.

Network Parameters and Constraint

The results presented so far suggest that enzyme func- tion affects how quickly enzyme-coding genes evolve. We wished to know if enzyme evolution is also influenced by metabolic network architecture. To investigate this question comprehensively, we examined several characteristics of the metabolic network topology. In this model of the net-

FIG. 3.—Rates of nonindependence and gene duplication by pathway groups. Significant deviations (after FDR correction) from data set-wide fractions (dashed lines) are marked with gray. (A) Fraction of enzymes that act in multiple pathway group. (B) Fraction of enzymes that are encoded by duplicated genes. Duplicated genes are those that were not resolved by the fuzzy reciprocal Blast into orthologous sets.

work, each node represents an enzyme, and the nodes are connected by the metabolites with which the enzymes in- teract (fig. 4B). This ‘‘enzyme-centered’’ representation ensures that each enzyme appears only once in the network, although metabolites may appear multiple times. We calcu- lated partial correlation coefficients to estimate the associ- ations between each pair of network parameters, while controlling for all others. Although this approach has some shortcomings (Drummond et al. 2006), it is the only method that allows us to determine relationships among all the var- iables (for discussion, see supplementary methods, Supple- mentary Material online). We used a permutation test to estimate all P values (see supplementary methods, Supple- mentary Material online) and corrected for multiple tests by controlling the FDR (Benjamini and Hochberg 1995) at 5%.

Reversibility of Reactions

Metabolic control theory (Kacser and Burns 1973; Heinrich and Rapoport 1974) suggests that enzymes cata- lyzing reversible reactions should exert little control over flux. We might therefore expect genes coding for such

D ow

nloaded from https://academ

ic.oup.com /m

be/article/25/12/2537/1110609 by guest on 24 D ecem

ber 2021

Evolution in the Metabolic Network 2541

FIG. 4.—Illustration of the network representations we used in this study. (A) A connectivity pattern showing both enzymes and metabolites. (B) The enzyme-centered network derived from (A). To analyze the enzymes’ topological parameters and relate them to x, we redraw the network so that nodes represent enzymes. An edge between enzymes indicates that they share a metabolite in (A). Each enzyme appears only once. Note that high betweenness does not necessarily imply high degree or vice versa. Enzyme E4#s role as the sole conduit between the branching parts of the network gives it high betweenness, but its degree is only 2. (C) The metabolite-centered network derived from (A). To measure the mean metabolite degree of each enzyme, we transform the network in (A) so that nodes represent metabolites. Each metabolite appears only once in this version of the network, and its degree is the number of other metabolites from which it is only one reaction away. (D and E) Enzymes with the same mean metabolite degree but different distributions of connections.

enzymes to be less essential and hence more permissive of amino acid changes. In our correlation framework, this would produce a negative relationship between essentiality and reversibility and a positive one between x and revers- ibility. We do indeed see a negative correlation between es- sentiality and reversibility (Spearman’s partial q 5 �0.212, permutation P 5 0.0104; fig. 5B). However, this does not translate into a relationship between x and reversibility (Spearman’s partial q 5 �0.077, permutation P 5 0.2704).

Measures of Pleiotropy: Number of Pathways and Reactions Catalyzed

Enzymes that participate in many pathways or catalyze many reactions may be under increased constraint because any change could affect several metabolic functions. We first examined the number of pathways in which a gene’s product participates and found no correlation with x (Spearman’s partial q 5 0.018, permutation P 5 0.7840). Xenobiotic detoxification enzymes could obscure a correla- tion because they are especially likely to appear in many pathways and also tend to have high values of x. To test this, we repeated our analysis without these enzymes but

obtained the same result (Spearman’s partial q 5 0.026, permutation P 5 0.7360).

We then examined the effect of the number of reac- tions an enzyme catalyzes. We saw a slight positive corre- lation between the number of reactions catalyzed and x, but it was not quite significant even without a multiple test correction (Spearman’s partial q 5 0.126, permutation P 5 0.0560; fig. 5A). Our measures of pleiotropy thus do not appear to influence levels of evolutionary constraint.

Network Topology Measures: Betweenness and Degree

Engineering principles suggest that evolving modules would change internally but maintain consistent interfaces with the rest of the network so that communication between modules would be uninterrupted (Csete and Doyle 2002). If this is the case for the metabolic network, we should expect enzymes that form between-module bridges to be under in- creased constraint. Such nodes are considered to have high betweenness (Freeman 1977; Girvan and Newman 2002). The shortest-path betweenness (Newman and Girvan 2004) of an enzyme i is defined as the number of the shortest paths between all other pairs of enzymes in the network that pass

D ow

nloaded from https://academ

ic.oup.com /m

be/article/25/12/2537/1110609 by guest on 24 D ecem

ber 2021

2542 Greenberg et al.

FIG. 5.—Graphs of partial correlations among network parameters and x 5 dN/dS (A), essentiality (B), and duplication rate (C). Blue lines indicate positive correlations, red—negative. Bold lines denote relationships significant at 5% FDR; thin lines—those with nominal significance at 5%, but not after multiple test correction; dashed lines—correlations with 5% , P � 10%. Numbers over the lines show the partial correlation coefficients with permutation P values in parentheses. Partial correlations between x and either enzyme or mean compound degree shown in (A) are only significant when one and not the other of these variables is present in the analysis. This is indicated by the double arrow.

through i. High-betweenness enzymes play an important role in the flow of biomass through the network because they often act as ‘‘bottlenecks’’ (Girvan and Newman 2002; Wunderlich and Mirny 2006; Liu et al. 2007; Yu et al. 2007).

Enzymes that connect pathways do have relatively high betweenness (mean betweenness for pathway-connecting en- zymes is 1027.1, whereas for enzymes with no between- pathway connections, it is 290.2; Wilcoxon test P 5 1.2 � 10�7), supporting the claims that the traditionally recognized pathways appear to correspond to modules in the metabolic network (Schuster et al. 2000, 2002; Ravasz et al. 2002; Holme et al. 2003). However, we see no cor- relation between x and betweenness (Spearman’s partial q 5 0.028, permutation P 5 0.6974; fig. 5A).

Another indication of an enzyme’s importance in the network topology is its degree, defined as the number of other enzymes that share an edge with it. In the enzyme-

centered network used here, where edges represent metab- olites, an enzyme’s degree is the number of other enzymes to which it is connected through common metabolites (fig. 4B). Note that this is not necessarily equal to the num- ber of reactions catalyzed (see supplementary methods, Supplementary Material online). We found a significant negative correlation between x and enzyme degree (fig. 5A) after correcting for multiple tests (Spearman’s par- tial q 5 �0.240, permutation P 5 0.0020). This result ap- pears to be consistent across multiple Drosophila species and is robust to alternate statistical methods (see supplemen- tary fig. S1 and methods, Supplementary Material online), and the correlation persists if we exclude xenobiotic- detoxifying enzymes from the data set (Spearman’s partial q 5 �0.212, permutation P 5 0.0070).

A potential confounding factor in interpreting this re- sult is gene expression. Highly expressed genes are

D ow

nloaded from https://academ

ic.oup.com /m

be/article/25/12/2537/1110609 by guest on 24 D ecem

ber 2021

Evolution in the Metabolic Network 2543

relatively constrained, and relationships between x and various variables often disappear once expression levels are accounted for (Drummond et al. 2006 and Larracuente et al. 2008). As a measure of gene expression, we used a principal component that includes whole-fly mRNA levels from FlyAtlas (Chintapalli et al. 2007) and a measure of codon bias (frequency of preferred codons; for details, see Larracuente et al. [2008]). The association between degree and x remained unchanged (Spearman’s partial q 5 �0.210, permutation P 5 0.0100).

The relationship between x and degree is due to a def- icit of amino acid changes in high-degree enzymes: the cor- relation of dN with degree is significant (Spearman’s partial q 5 �0.235, permutation P 5 0.0034), whereas that for dS is not (q 5 �0.041, permutation P 5 0.5542). This result also strongly indicates that selection on silent sites does not compromise the estimates of x enough to affect our results.

In protein–protein interaction networks, highly con- nected proteins are relatively constrained only if the con- nections are inside modules (Fraser 2005). If the same pattern holds for metabolic networks, we would expect ex- cess constraint only for enzymes with many connections within their own pathways. We therefore partitioned the connections into within-pathway and between-pathway links and repeated the partial correlation analysis to assess their independent association with x. We found that both kinds of connections constrain enzyme evolution: Spearman’s partial q 5 �0.182 (permutation P 5 0.0090) for connections within pathways and q 5 �0.184 (permu- tation P 5 0.0100) for links between them.

Why are highly connected enzymes relatively con- strained? It is not because of their central position in the network, because betweenness is a measure of centrality (Freeman 1977) and we find that it has no measurable effect on constraint. Perhaps, high-degree enzymes interact with a few metabolites that are each involved in many reactions (branch point metabolites), providing many neighbors in the enzyme-centered network. Alternatively, a high-degree enzyme may interact with many metabolites that participate in a few reactions each. To distinguish between these topol- ogies, consider a network representation where nodes are metabolites instead of enzymes (and edges represent enzymes/reactions) (fig. 4C). Here, nodes with many edges (high ‘‘metabolite degree’’) are branch point metabolites, and the enzymes on those edges are branch point enzymes. We can calculate the degree of each metabolite in this ‘‘metabolite-centered’’ network and average the degrees of all substrates and products for a given enzyme. We call this measure ‘‘mean metabolite degree,’’ and it should be high for branch point enzymes.

If we substitute mean metabolite degree for enzyme degree in our partial correlation analysis, the relationship with x remains essentially the same (Spearman’s partial q 5 �0.224, permutation P 5 0.0020, fig. 5A). This pat- tern persists if products or substrates are considered sepa- rately, and if the skewness (illustrated in fig. 4D and E) of the metabolite degree distribution for each enzyme is ac- counted for (see ‘‘Network statistics’’ in supplementary methods, Supplementary Material online; examples of skewed topologies for enzymes with high mean metabolite degree can be found in supplementary fig. S5A and B, Sup-

plementary Material online). Enzyme degree and mean me- tabolite degree are highly correlated (Spearman’s partial q 5 0.749, permutation P � 0.0002), although only one protein is among the top five enzymes for both measures (supplementary figs. S4 and S5, Supplementary Material online). After including both the enzymes’ number of con- nections and their mean metabolite degree in the analysis, we see no independent relationship between dN/dS and ei- ther variable (Spearman’s partial q 5 �0.108, permutation P 5 0.1376 between x and degree; Spearman’s partial q 5 �0.066, permutation P 5 0.3584 between x and mean metabolite degree), suggesting that x is correlated with a parameter that is equally well described by either measure. To test this, we constructed principal components (Jolliffe 1986) from mean metabolite degree and enzyme degree and used them in our partial correlation analysis. As expected, we find that the PC that includes positive con- tributions from both parameters is negatively correlated with dN/dS (Spearman’s partial q 5 �0.164, permutation P 5 0.0356), whereas the PC that represents a contrast be- tween mean metabolite degree and enzyme degree is not (Spearman’s partial q 5 0.030, permutation P 5 0.6800). We conclude that high-degree enzymes are those interact- ing with high-degree metabolites and thus are the branch point enzymes predicted to have high control coefficients (Kacser and Burns 1973).

Relatively high constraint of enzymes with many con- nections should reflect their importance to network func- tion. Therefore, one might expect that such enzymes are more likely than average to be encoded by essential genes. This does not appear to be the case (fig 5B). Although es- sentiality is negatively correlated with reversibility, there is no significant relationship with either enzyme degree (Spearman’s partial q 5 0.129, permutation P 5 0.1198) or mean metabolite degree (Spearman’s partial q 5 �0.066, permutation P 5 0.4322; the same results are obtained if each measure is analyzed separately). This result could be explained if genes coding for highly connected en- zymes were preferentially duplicated, and duplicated genes were in turn less likely to be essential due to func- tional redundancy. Neither proposition appears to be true for this data set. Enzyme-coding genes with duplicates in D. melanogaster (for details, see Methods) are, if any- thing, slightly more likely to be essential than nondupli- cated genes (50.0% of duplicated genes are essential vs. 47.9% of nonduplicated genes, Fisher’s exact test P 5 1). Moreover, highly connected enzymes are less likely to be duplicated than average (fig 5C; although the results shown are for duplications in any of the species, they are virtually identical if only D. melanogaster duplications are considered).

Enzymes under Positive Selection

To identify enzymes that have evolved under positive selection in the Drosophila melanogaster group, we used the likelihood ratio test of the codeml model M8 versus M7 (see Methods; Yang 1997; Larracuente et al. 2008). Be- cause only a small fraction of all genes examined shows signs of positive selection, we used the relatively liberal

D ow

nloaded from https://academ

ic.oup.com /m

be/article/25/12/2537/1110609 by guest on 24 D ecem

ber 2021

2544 Greenberg et al.

q value cutoff of 0.1 to classify genes as adaptively evolv- ing. Choosing this threshold means that an expected 10% of the genes are false positives.

We identified 35 metabolic genes that appear to be evolving adaptively (supplementary table S1, Supplemen- tary Material online). As expected, the list includes genes coding for glutathione-S-tranferase and cytochrome P450, which are known to evolve rapidly (Low et al. 2007; Drosophila 12 Genomes Consortium 2007). Interestingly, we also found a number of enzymes implicated in female germ line development.

Overall, despite their relative constraint, enzymes are only slightly (and not significantly) less likely to evolve un- der positive selection than nonenzymes (7.8% and 10.5%, respectively, Fisher’s exact test P 5 0.0787). All pathway groups include similar fractions of adaptively evolving genes (v 2 5 7.16, permutation P 5 0.5726). For example, despite its relatively high rate of amino acid substitution, the xenobiotic detoxification pathway group only ranks fourth in the fraction of genes under positive selection, be- hind groups involved in metabolism of nucleotides, cofac- tors and vitamins, and carbohydrates. Because several pathway groups contain enzymes that also detoxify foreign compounds, the uniformity in fractions of adaptively evolv- ing genes could arise because the same positively selected genes contribute to several groups. To test this possibility, we eliminated xenobiotic-detoxifying genes from all other pathways. After this manipulation, we still saw no signif- icant heterogeneity in the fractions of adaptively evolving genes among pathway groups (v 2 5 5.8, permutation P 5 0.7856). Furthermore, the correlation between the fraction of genes a group has in common with the xenobi- otics group and the fraction under positive selection is not significant (Spearman’s q 5 0.28, P 5 0.4339).

The probability of positive selection does not correlate significantly with any of the network parameters. In partic- ular, although high-degree enzymes exhibit relatively low amino acid substitution rates, they are not less likely to be under positive selection (Spearman’s partial q 5 �0.092, permutation P 5 0.0810 for degree; Spearman’s partial q 5 0.013, permutation P 5 0.8116 for mean metabolite degree). Moreover, enzymes encoded by adaptively evolv- ing genes have no fewer connections, on average, than other enzymes (e.g., mean degree: 7.59 and 7.56 connections, re- spectively, Wilcoxon test P 5 0.86).

Discussion

In this study, we investigated the effects of network architecture on rates of amino acid substitution, duplication, and adaptation. We found that despite overall high conser- vation, enzymes vary appreciably in their evolutionary rates. This variation is shaped both by enzymes’ functional properties (xenobiotic-detoxifying enzymes evolve rela- tively rapidly) and by their position in the network (highly connected enzymes are more constrained). Although the variables we examined explain only a small fraction of the variance in protein evolution rates, the results show that even with the noisy data at our disposal, we can detect the effects of metabolic network architecture on enzymes’ rates of amino acid change.

We find that a large proportion of enzymes (36%) act in more than one pathway group. This raises the possibility that modifying one pathway during evolution could lead to changes in other metabolic functions, limiting the precision with which a species can adapt to the array of challenges facing it. Indeed, all the heterogeneity in amino acid sub- stitution rates among pathway groups is due to the presence of genes that are also responsible for detoxification of for- eign compounds. This interference could, in principle, be resolved through gene duplication with subsequent sub- functionalization (Lynch and Force 2000). Indeed, we see some evidence that this is happening: the proportion of a pathway group’s enzymes that it shares with other groups correlates strongly with the proportion of its genes that have undergone duplication in at least one of the 12 Drosophila species.

It is unclear how effectively subfunctionalization of duplicate genes frees pathways from interfering with each others’ evolution in our data set. When an enzyme-coding gene has been duplicated and the copies have acquired new functions, the duplicates still code for the same enzyme, and each may be mistakenly annotated as belonging to several pathways. For example, in our data set, each cytochrome P450 is listed as belonging to four pathway groups, al- though some of these genes may perform specialized tasks in development (Warren et al. 2002). Such misannotation would hamper our ability to detect subfunctionalization. Although random annotation errors do not appear to influ- ence our conclusions (supplementary fig. S2, Supplemen- tary Material online), only experimental analyses can eliminate more systematic biases.

Whatever their influence on the other pathways, it is clear that xenobiotic detoxification genes evolve faster than those in any other group. Are novel foreign compounds driving adaptive amino acid changes and causing this pat- tern? With the caveat that our methods may not have ade- quate power to detect positive selection, the answer appears to be, ‘‘No.’’ First, at 0.054, median dN/dS for this group is well below 1 (neutrality); second, the proportion of adap- tively evolving genes in the xenobiotics group is not signif- icantly above the data set average; third, removing the adaptively evolving genes from consideration does not change the rank of the xenobiotics group (median x actu- ally slightly increases to 0.055). Instead, it appears that genes involved in detoxifying foreign compounds are under reduced constraint. This could be because they are less crit- ical to survival than the core metabolism is. However, it is also possible that because most organisms face a wide va- riety of toxic compounds in nature, amino acid variants that are deleterious in one context can be neutral or beneficial in another (Berenbaum et al. 1996). Purifying selection would be less effective in such a scenario, leading to elevated rates of amino acid substitution.

Purifying selection seems to act more strongly on genes coding for highly connected enzymes, which is con- sistent with previous observations in yeast (Vitkup et al. 2006). Investigating this further, we find that the correlation is actually driven by a parameter that is best described by the intersection of the degree of an enzyme and the mean degree of its products and substrates. We propose that en- zymes with high values of this parameter act at branch

D ow

nloaded from https://academ

ic.oup.com /m

be/article/25/12/2537/1110609 by guest on 24 D ecem

ber 2021

Evolution in the Metabolic Network 2545

points. Metabolic control theory predicts that such enzymes should have relatively high control over flux in metabolic pathways (Kacser and Burns 1973), so changing their amino acid sequence may cause large phenotypic effects. Our results do indeed indicate that branch point enzymes are more constrained than average. This is consistent with earlier analyses of patterns of polymorphism at enzyme loci (Eanes 1999; Flowers et al. 2007), which found evidence of purifying selection primarily in branch point enzymes. However, we find that such enzymes are not more likely to be coded for by essential genes. Metabolic control theory also predicts high control coefficients for enzymes catalyz- ing irreversible reactions. In our data, such enzymes are more likely to be encoded by essential genes, although this does not appear to translate into stronger evolutionary con- straint. This distinction between essentiality and evolution- ary constraint is intriguing, but given the noisy data, it may be due to an ascertainment or statistical artifact.

Despite its impact on purifying selection, enzyme de- gree does not appear to affect rates of adaptive evolution (although, because we have estimates of degree for only 18 adaptively evolving genes, we may not have the power to detect its effect) nor does it influence the rate of gene duplication. High mean metabolite degree does seem to dampen the rate of gene duplication, although the effect is only marginally significant.

Studies of protein interaction networks also found that proteins with high degree are relatively constrained in amino acid sequence evolution (Fraser et al. 2002; Hahn et al. 2004; Fraser 2005; Hahn and Kern 2005). However, the underlying reason for this association is probably quite different (Fraser et al. 2002; Vitkup et al. 2006; Yu et al. 2007): High-degree proteins devote a larger proportion of their amino acid sequence to protein-binding sites, which are often under constraint.

Our finding that betweenness is uncorrelated with con- straint was somewhat surprising, given that it is a significant predictor of the probability that an enzyme is present in a large number of distantly related species (Guimerà and Nunes Amaral 2005; Liu et al. 2007). Perhaps, amino acid substitution rates among closely related species and gene retention across a deep phylogeny are governed by different forces. We also find no correlation between gene essential- ity and betweenness (Spearman’s partial q 5 �0.065, per- mutation P 5 0.4498). This is consistent with recent results from yeast (see supplementary table S2 [Supplementary Material online] of Yu et al. [2007]).

In several protein–protein interaction networks, the opposite patterns appear: betweenness is negatively corre- lated with amino acid substitution rates (Hahn and Kern 2005) and positively associated with essentiality (Yu et al. 2007). Thus, despite some striking similarities in structure and its effect on evolution among different networks, details of their construction are likely to play an important role and should not be overlooked (Doyle et al. 2005).

Taking advantage of whole-genome sequencing and published metabolic network information, we dissected the influences of network architecture and individual pro- tein function on rates of genic evolution. As the data be- come more refined through experimental analyses, this approach will increase in power and specificity.

Supplementary Material

Supplementary methods, figures S1–S5, and tables S1 and S2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).

Acknowledgments

We thank A. Larracuente and T. Sackton for the PAML analysis results and K. Montooth and L. Harshman for helpful discussion. The associate editor and two anon- ymous reviewers provided helpful comments that improved the manuscript. This research was supported by a National Institutes of Health grant to A.G.C. and a National Science Foundation Graduate Research Fellowship to S.R.S.

Literature Cited

Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Stat Methodol. 57:289–300.

Berenbaum MR, Favret C, Schuler MA. 1996. On defining ‘‘key innovations’’ in an adaptive radiation: cytochrome P450s and Papilionidae. Am Nat. 148:S139–S155.

Canty A, Ripley B. 2006. R-boot: bootstrap R (S-Plus) Functions. Version 1.2–34.

Chintapalli VR, Wang J, Dow JAT. 2007. Using FlyAtlas to identify better Drosophila melanogaster models of human disease. Nat Genet. 39:715–720.

Csete ME, Doyle JC. 2002. Reverse engineering of biological complexity. Science. 295:1664–1669.

Davison AC, Hinkley DV. 1997. Bootstrap methods and their application. Cambridge: Cambridge University Press.

Doyle JC, Alderson DL, Li L, Low S, Roughan M, Shalunov S, Tanaka R, Willinger W. 2005. The ‘‘robust yet fragile’’ nature of the Internet. Proc Natl Acad Sci USA. 102:14497–14502.

Drosophila 12 Genomes Consortium. 2007. Evolution of genes and genomes on the Drosophila phylogeny. Nature. 450:203–218.

Drummond DA, Raval A, Wilke CO. 2006. A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol. 23:327–337.

Eanes WF. 1999. Analysis of selection on enzyme polymor- phisms. Annu Rev Ecol Syst. 30:301–326.

Flowers J, Sezgin E, Kumagai S, Duvernell D, Matzkin L, Schmidt P, Eanes W. 2007. Adaptive evolution of metabolic pathways in Drosophila. Mol Biol Evol. 24:1347–1354.

Fraser HB. 2005. Modularity and evolutionary constraint on proteins. Nat Genet. 37:351–352.

Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW. 2002. Evolutionary rate in the protein interaction network. Science. 296:750–752.

Freeman LC. 1977. A set of measures of centrality based on betweenness. Sociometry. 40:35–41.

Girvan M, Newman MEJ. 2002. Community structure in social and biological networks. Proc Natl Acad Sci USA. 99: 7821–7826.

Guimerà R, Nunes Amaral LA. 2005. Functional cartography of complex metabolic networks. Nature. 433:895–900.

Hahn MW, Conant GC, Wagner A. 2004. Molecular evolution in large genetic networks: does connectivity equal constraint? J Mol Evol. 58:203–211.

Hahn MW, Kern AD. 2005. Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol Biol Evol. 22:803–806.

D ow

nloaded from https://academ

ic.oup.com /m

be/article/25/12/2537/1110609 by guest on 24 D ecem

ber 2021

2546 Greenberg et al.

Heinrich R, Rapoport TA. 1974. A linear steady-state treatment of enzymatic chains. General properties, control and effector strength. Eur J Biochem. 42:89–95.

Holme P, Huss M, Jeong H. 2003. Subnetwork hierarchies of biochemical pathways. Bioinformatics. 19:532–538.

Jeong H, Tombor B, Albert R, Oltvai ZN, Barabási A-L. 2000. The large-scale organization of metabolic networks. Nature. 407:651–654.

Jolliffe IT. 1986. Principal component analysis. New York: Springer.

Jones CD. 2005. The genetics of adaptation in Drosophila sechellia. Genetica. 123:137–145.

Kacser H, Burns JA. 1973. The control of flux. Symp Soc Exp Biol. 27:65–104.

Kanehisa M, Goto S. 2000. KEGG: kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28:27–30.

Larracuente AM, Sackton TB, Greenberg AJ, Wong A, Singh ND, Sturgill D, Zhang Y, Oliver B, Clark AG. 2008. Evolution of protein-coding genes in Drosophila. Trends Genet. 24:114–123.

Liu W-C, Lin W-H, Davis A, Jordán F, Yang H-T, Hwang M-J. 2007. A network perspective on the topological importance of enzymes and their phylogenetic conservation. BMC Bio- informatics. 8:121.

Low WY, Ng HL, Morton CJ, Parker MW, Batterham P, Robin C. 2007. Positive selection of glutathione s-transferases in the genus Drosophila. Genetics. 177:1363–1375.

Lynch M, Force A. 2000. The probability of duplicate gene preservation by subfunctionalization. Genetics. 154:459–473.

Newman MEJ, Girvan M. 2004. Finding and evaluating community structure in networks. Phys Rev E Stat Nonlin Soft Matter Phys. 69(2):026113.

Rausher M, Miller R, Tiffin P. 1999. Patterns of evolutionary rate variation among genes of the anthocyanin biosynthetic pathway. Mol Biol Evol. 16:266–274.

R Development Core Team. 2006. R: a language and environment for statistical computing. Vienna (Austria): R Foundation for Statistical Computing.

Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabási A-L. 2002. Hierarchical organization of modularity in metabolic networks. Science. 297:1551–1555.

Sawyer SA, Kulathinal RJ, Bustamante CD, Hartl DL. 2003. Bayesian analysis suggests that most amino acid replacements in Drosophila are driven by positive selection. J Mol Evol. 57:S154–S164.

Schäfer J, Opgen-Rhein R, Strimmer K. 2006. corpcor: efficient estimation of covariance and (partial) correlation [Internet]. Available from: http://www.strimmerlab.org/software/corpcor/.

Schuster S, Fell D, Dandekar T. 2000. A general definition of metabolic pathways useful for systematic organization and

analysis of complex metabolic networks. Nat Biotechnol. 18:326–332.

Schuster S, Pfeiffer T, Moldenhauer F, Koch I, Dandekar T. 2002. Exploring the pathway structure of metabolism: decomposition into subnetworks and application to Myco- plasma pneumoniae. Bioinformatics. 18:351–361.

Shapiro JA, Huang W, Zhang C, et al. (12 co-authors). 2007. Adaptive genic evolution in the Drosophila genomes. Proc Natl Acad Sci USA. 104:2271–2276.

Smith NGC, Eyre-Walker A. 2002. Adaptive protein evolution in Drosophila. Nature. 415:1022–1024.

Stelling J, Klamt S, Bettenbrock K, Schuster S, Gilles ED. 2002. Metabolic network structure determines key aspects of functionality and regulation. Nature. 420:190–193.

Stephanopoulos GN, Aristidou AA, Nielsen J. 1998. Metabolic engineering. London: Academic Press.

Storey JD, Tibshirani R. 2003. Statistical significance for genome- wide studies. Proc Natl Acad Sci USA. 100:9440–9445.

Tanaka R. 2005. Scale-rich metabolic networks. Phys Rev Lett. 94(16):168101.

Vitkup D, Kharchenko P, Wagner A. 2006. Influence of metabolic network structure and function on enzyme evolution. Genome Biol. 7:R39.

Wagner A, Fell DA. 2001. The small world inside large metabolic networks. Proc R Soc Lond B Biol Sci. 268: 1803–1810.

Warren JT, Petryk A, Marqués G, Jarcho M, Parvy J-P, Dauphin- Villemant C, O’Connor MB, Gilbert LI. 2002. Molecular and biochemical characterization of two P450 enzymes in the ecdysteroidogenic pathway of Drosophila melanogaster. Proc Natl Acad Sci USA. 99:11043–11048.

Whittaker J. 1990. Graphical models in applied multivariate statistics. Chichester (UK): John Wiley.

Wilson AC, Carlson SS, White TJ. 1977. Biochemical evolution. Annu Rev Biochem. 46:573–639.

Wunderlich Z, Mirny LA. 2006. Using the topology of metabolic networks to predict viability of mutant strains. Biophys J. 91:2304–2311.

Yang Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 13:555–556.

Yu H, Kim PM, Sprecher E, Trifonov V, Gerstein M. 2007. The importance of bottlenecks in protein networks: correlation with gene essentiality and expression dynamics. PLoS Comput Biol. 3:e59.

Douglas Crawford, Associate Editor

Accepted August 16, 2008

D ow

nloaded from https://academ

ic.oup.com /m

be/article/25/12/2537/1110609 by guest on 24 D ecem

ber 2021