Milo R, Phillips R (2016). Cell Biology by the Numbers (Garland Science). read online.
Photosynthetic carbon assimilation enables energy storage in the living world and produces most of the biomass in the biosphere. Rubisco (D-ribulose 1,5-bisphosphate carboxylase/oxygenase) is responsible for the vast majority of global carbon fixation and has been claimed to be the most abundant protein on Earth. Here we provide an updated and rigorous estimate for the total mass of Rubisco on Earth, concluding it is approximate to 0.7 Gt, more than an order of magnitude higher than previously thought. We find that > 90% of Rubisco enzymes are found in the approximate to 2 x 10(14) m(2) of leaves of terrestrial plants, and that Rubisco accounts for approximate to 3% of the total mass of leaves, which we estimate at approximate to 30 Gt dry weight. We use our estimate for the total mass of Rubisco to derive the effective time-averaged catalytic rate of Rubisco of approximate to 0.03 s(-1) on land and approximate to 0.6 s(-1) in the ocean. Compared with the maximal catalytic rate observed in vitro at 25 degrees C, the effective rate in the wild is approximate to 100-fold slower on land and sevenfold slower in the ocean. The lower ambient temperature, and Rubisco not working at night, can explain most of the difference from laboratory conditions in the ocean but not on land, where quantification of many more factors on a global scale is needed. Our analysis helps sharpen the dramatic difference between laboratory and wild environments and between the terrestrial and marine environments.
While protein tags are ubiquitously utilized in molecular biology, they harbor the potential to interfere with functional traits of their fusion counterparts. Systematic evaluation of the effect of protein tags on function would promote accurate use of tags in experimental setups. Here we examine the effect of green fluorescent protein tagging at either the N or C terminus of budding yeast proteins on subcellular localization and functionality. We use a competition-based approach to decipher the relative fitness of two strains tagged on the same protein but on opposite termini and from that infer the correct, physiological localization for each protein and the optimal position for tagging. Our study provides a first of a kind systematic assessment of the effect of tags on the functionality of proteins and provides a step toward broad investigation of protein fusion libraries. (C) 2018 The Authors. Published by Elsevier Ltd.
Food production dominates land, water and fertilizer use and is a greenhouse gas source. In the United States, beef production is the main agricultural resource user overall, as well as per kcal or g of protein. Here, we offer a possible, non-unique, definition of 'sustainable' beef as that subsisting exclusively on grass and by-products, and quantify its expected US production as a function of pastureland use. Assuming today's pastureland characteristics, all of the pastureland that US beef currently use can sustainably deliver ≈45% of current production. Rewilding this pastureland's less productive half (≈135 million ha) can still deliver ≈43% of current beef production. In all considered scenarios, the ≈32 million ha of high-quality cropland that beef currently use are reallocated for plant-based food production. These plant items deliver 2- to 20-fold more calories and protein than the replaced beef and increase the delivery of protective nutrients, but deliver no B12. Increased deployment of rapid rotational grazing or grassland multi-purposing may increase beef production capacity.
Enzymes catalyze a vast range of reactions. Their catalytic performances, mechanisms, global folds, and active-site architectures are also highly diverse, suggesting that enzymes are shaped by an entire range of physiological demands and evolutionary constraints, as well as by chemical and physicochemical constraints. We have attempted to identify signatures of these shaping demands and constraints. To this end, we describe a bird's-eye view of the enzyme space from two angles: evolution and chemistry. We examine various chemical reaction parameters that may have shaped the catalytic performances and active-site architectures of enzymes. We test and weigh these considerations against physiological and evolutionary factors. Although the catalytic properties of the "average" enzyme correlate with cellular metabolic demands and enzyme expression levels, at the level of individual enzymes, a multitude of physiological demands and constraints, combined with the coincidental nature of evolutionary processes, result in a complex picture. Indeed, neither reaction type (a chemical constraint) nor evolutionary origin alone can explain enzyme rates. Nevertheless, chemical constraints are apparent in the convergence of active site architectures in independently evolved enzymes, although significant variations within an architecture are common.
A census of the biomass on Earth is key for understanding the structure and dynamics of the biosphere. However, a global, quantitative view of how the biomass of different taxa compare with one another is still lacking. Here, we assemble the overall biomass composition of the biosphere, establishing a census of the approximate to 550 gigatons of carbon (Gt C) of biomass distributed among all of the kingdoms of life. We find that the kingdoms of life concentrate at different locations on the planet; plants (approximate to 450 Gt C, the dominant kingdom) are primarily terrestrial, whereas animals (approximate to 2 Gt C) are mainly marine, and bacteria (approximate to 70 Gt C) and archaea (approximate to 7 Gt C) are predominantly located in deep subsurface environments. We show that terrestrial biomass is about two orders of magnitude higher than marine biomass and estimate a total of approximate to 6 Gt C of marine biota, doubling the previous estimated quantity. Our analysis reveals that the global marine biomass pyramid contains more consumers than producers, thus increasing the scope of previous observations on inverse food pyramids. Finally, we highlight that the mass of humans is an order of magnitude higher than that of all wild mammals combined and report the historical impact of humanity on the global biomass of prominent taxa, including mammals, fish, and plants.
Food loss is widely recognized as undermining food security and environmental sustainability. However, consumption of resource-intensive food items instead of more efficient, equally nutritious alternatives can also be considered as an effective food loss. Here we define and quantify these opportunity food losses as the food loss associated with consuming resource-intensive animal-based items instead of plant-based alternatives which are nutritionally comparable, e.g., in terms of protein content. We consider replacements that minimize cropland use for each of the main US animal-based food categories. We find that although the characteristic conventional retail-to-consumer food losses are approximate to 30% for plant and animal products, the opportunity food losses of beef, pork, dairy, poultry, and eggs are 96%, 90%, 75%, 50%, and 40%, respectively. This arises because plant-based replacement diets can produce 20-fold and twofold more nutritionally similar food per cropland than beef and eggs, the most and least resource-intensive animal categories, respectively. Although conventional and opportunity food losses are both targets for improvement, the high opportunity food losses highlight the large potential savings beyond conventionally defined food losses. Concurrently replacing all animal-based items in the US diet with plant-based alternatives will add enough food to feed, in full, 350 million additional people, well above the expected benefits of eliminating all supply chain food waste. These results highlight the importance of dietary shifts to improving food availability and security.
Enzyme kinetics are fundamental to an understanding of cellular metabolism and for crafting synthetic biology applications. For decades, enzyme characterization has been based on in vitro enzyme assays. However, kinetic parameters are only available for
Viruses are incapable of autonomous energy production. Although many experimental studies make it clear that viruses are parasitic entities that hijack the molecular resources of the host, a detailed estimate for the energetic cost of viral synthesis is largely lacking. To quantify the energetic cost of viruses to their hosts, we enumerated the costs associated with two very distinct but representative DNA and RNA viruses, namely, T4 and influenza. We found that, for these viruses, translation of viral proteins is the most energetically expensive process. Interestingly, the costs of building a T4 phage and a single influenza virus are nearly the same. Due to influenza's higher burst size, however, the overall cost of a T4 phage infection is only 2-3% of the cost of an influenza infection. The costs of these infections relative to their host's estimated energy budget during the infection reveal that a T4 infection consumes about a third of its host's energy budget, whereas an influenza infection consumes only approximate to 1%. Building on our estimates for T4, we show how the energetic costs of double-stranded DNA phages scale with the capsid size, revealing that the dominant cost of building a virus can switch from translation to genome replication above a critical size. Last, using our predictions for the energetic cost of viruses, we provide estimates for the strengths of selection and genetic drift acting on newly incorporated genetic elements in viral genomes, under conditions of energy limitation.
Carbon fixation is the gateway of inorganic carbon into the biosphere. Our ability to engineer carbon fixation pathways in living organisms is expected to play a crucial role in the quest towards agricultural and energetic sustainability. Recent successes to introduce non-native carbon fixation pathways into heterotrophic hosts offer novel platforms for manipulating these pathways in genetically malleable organisms. Here, we focus on past efforts and future directions for engineering the dominant carbon fixation pathway in the biosphere, the Calvin-Benson cycle, into the well-known model organism Escherichia coil. We describe how central carbon metabolism of this heterotrophic bacterium can be manipulated to allow directed evolution of carbon fixing enzymes. Finally, we highlight future directions towards synthetic autotrophy.
A set of chemical reactions that require a metabolite to synthesize more of that metabolite is an autocatalytic cycle. Here, we show that most of the reactions in the core of central carbon metabolism are part of compact autocatalytic cycles. Such metabolic designs must meet specific conditions to support stable fluxes, hence avoiding depletion of intermediate metabolites. As such, they are subjected to constraints that may seem counter-intuitive: the enzymes of branch reactions out of the cycle must be overexpressed and the affinity of these enzymes to their substrates must be relatively weak. We use recent quantitative proteomics and fluxomics measurements to show that the above conditions hold for functioning cycles in central carbon metabolism of E. coli. This work demonstrates that the topology of a metabolic network can shape kinetic parameters of enzymes and lead to seemingly wasteful enzyme usage.
Understanding the evolution of a new metabolic capability in full mechanistic detail is challenging, as causative mutations may be masked by non-essential "hitchhiking" mutations accumulated during the evolutionary trajectory. We have previously used adaptive laboratory evolution of a rationally engineered ancestor to generate an Escherichia coli strain able to utilize CO2 fixation for sugar synthesis. Here, we reveal the genetic basis underlying this metabolic transition. Five mutations are sufficient to enable robust growth when a non-native Calvin-Benson-Bassham cycle provides all the sugar-derived metabolic building blocks. These mutations are found either in enzymes that affect the efflux of intermediates from the autocatalytic CO2 fixation cycle toward biomass (prs, serA, and pgi), or in key regulators of carbon metabolism (crp and ppsR). Using suppressor analysis, we show that a decrease in catalytic capacity is a common feature of all mutations found in enzymes. These findings highlight the enzymatic constraints that are essential to the metabolic stability of autocatalytic cycles and are relevant to future efforts in constructing non-native carbon fixation pathways.
Bacterial growth depends crucially on metabolic fluxes, which are limited by the cell's capacity to maintain metabolic enzymes. The necessary enzyme amount per unit flux is a major determinant of metabolic strategies both in evolution and bioengineering. It depends on enzyme parameters (such as kcat and KM constants), but also on metabolite concentrations. Moreover, similar amounts of different enzymes might incur different costs for the cell, depending on enzyme-specific properties such as protein size and half-life. Here, we developed enzyme cost minimization (ECM), a scalable method for computing enzyme amounts that support a given metabolic flux at a minimal protein cost. The complex interplay of enzyme and metabolite concentrations, e.g. through thermodynamic driving forces and enzyme saturation, would make it hard to solve this optimization problem directly. By treating enzyme cost as a function of metabolite levels, we formulated ECM as a numerically tractable, convex optimization problem. Its tiered approach allows for building models at different levels of detail, depending on the amount of available data. Validating our method with measured metabolite and protein levels in E. coli central metabolism, we found typical prediction fold errors of 4.1 and 2.6, respectively, for the two kinds of data. This result from the cost-optimized metabolic state is significantly better than randomly sampled metabolite profiles, supporting the hypothesis that enzyme cost is important for the fitness of E. coli. ECM can be used to predict enzyme levels and protein cost in natural and engineered pathways, and could be a valuable computational tool to assist metabolic engineering projects. Furthermore, it establishes a direct connection between protein cost and thermodynamics, and provides a physically plausible and computationally tractable way to include enzyme kinetics into constraint-based metabolic models, where kinetics have usually been ignored or oversimplified.
Feeding a growing population while minimizing environmental degradation is a global challenge requiring thoroughly rethinking food production and consumption. Dietary choices control food availability and natural resource demands. In particular, reducing or avoiding consumption of low production efficiency animal-based products can spare resources that can then yield more food. In quantifying the potential food gains of specific dietary shifts, most earlier research focused on calories, with less attention to other important nutrients, notably protein. Moreover, despite the well-known environmental burdens of livestock, only a handful of national level feed-to-food conversion efficiency estimates of dairy, beef, poultry, pork, and eggs exist. Yet such high level estimates are essential for reducing diet related environmental impacts and identifying optimal food gain paths. Here we quantify caloric and protein conversion efficiencies for US livestock categories. Wethen use these efficiencies to calculate the food availability gains expected from replacing beef in the US diet with poultry, a more efficient meat, and a plant-based alternative. Averaged over all categories, caloric and protein efficiencies are 7%-8%. At 3% in both metrics, beef is by far the least efficient. We find that reallocating the agricultural land used for beef feed to poultry feed production can meet the caloric and protein demands of approximate to 120 and approximate to 140 million additional people consuming the mean American diet, respectively, roughly 40% of current US population.
Many carbon-fixing bacteria rely on a CO2 concentrating mechanism (CCM) to elevate the CO2 concentration around the carboxylating enzyme ribulose bisphosphate carboxylase/ oxygenase (RuBisCO). The CCM is postulated to simultaneously enhance the rate of carboxylation and minimize oxygenation, a competitive reaction with O-2 also catalyzed by RuBisCO. To achieve this effect, the CCM combines two features: active transport of inorganic carbon into the cell and colocalization of carbonic anhydrase and RuBisCO inside proteinaceous microcompartments called carboxy-somes. Understanding the significance of the various CCM components requires reconciling biochemical intuition with a quantitative description of the system. To this end, we have developed a mathematical model of the CCM to analyze its energetic costs and the inherent intertwining of physiology and pH. We find that intracellular pH greatly affects the cost of inorganic carbon accumulation. At low pH the inorganic carbon pool contains more of the highly cell-permeable H2CO3, necessitating a substantial expenditure of energy on transport to maintain internal inorganic carbon levels. An intracellular pH approximate to 8 reduces leakage, making the CCM significantly more energetically efficient. This pH prediction coincides well with our measurement of intracellular pH in a model cyanobacterium. We also demonstrate that CO2 retention in the carboxysome is necessary, whereas selective uptake of HCO3- into the carboxysome would not appreciably enhance energetic efficiency. Altogether, integration of pH produces a model that is quantitatively consistent with cyanobacterial physiology, emphasizing that pH cannot be neglected when describing biological systems interacting with inorganic carbon pools.
Reported values in the literature on the number of cells in the body differ by orders of magnitude and are very seldom supported by any measurements or calculations. Here, we integrate the most up-to-date information on the number of human and bacterial cells in the body. We estimate the total number of bacteria in the 70 kg "reference man" to be 3.8?1013. For human cells, we identify the dominant role of the hematopoietic lineage to the total count (90%) and revise past estimates to 3.0?1013 human cells. Our analysis also updates the widely-cited 10:1 ratio, showing that the number of bacteria in the body is actually of the same order as the number of human cells, and their total mass is about 0.2 kg.
Copyright 2016 Elsevier Inc. All rights reserved.Data of gene expression levels across individuals, cell types, and disease states is expanding, yet our understanding of how expression levels impact phenotype is limited. Here, we present a massively parallel system for assaying the effect of gene expression levels on fitness in Saccharomyces cerevisiae by systematically altering the expression level of 100 genes at 100 distinct levels spanning a 500-fold range at high resolution. We show that the relationship between expression levels and growth is gene and environment specific and provides information on the function, stoichiometry, and interactions of genes. Wild-type expression levels in some conditions are not optimal for growth, and genes whose fitness is greatly affected by small changes in expression level tend to exhibit lower cell-to-cell variability in expression. Our study addresses a fundamental gap in understanding the functional significance of gene expression regulation and offers a framework for evaluating the phenotypic effects of expression variation.
Livestock farming incurs large and varied environmental burdens, dominated by beef. Replacing beef with resource efficient alternatives is thus potentially beneficial, but may conflict with nutritional considerations. Here we show that protein-equivalent plant based alternatives to the beef portion of the mean American diet are readily devisible, and offer mostly improved nutritional profile considering the full lipid profile, key vitamins, minerals, and micronutrients. We then show that replacement diets require on average only 10% of land, 4% of greenhouse gas (GHG) emissions, and 6% of reactive nitrogen (Nr) compared to what the replaced beef diet requires. Applied to 320 million Americans, the beef-to-plant shift can save 91 million cropland acres (and 770 million rangeland acres), 278 million metric ton CO2e, and 3.7 million metric ton Nr annually. These nationwide savings are 27%, 4%, and 32% of the respective national environmental burdens.
Can a heterotrophic organism be evolved to synthesize biomass from CO2 directly? So far, non-native carbon fixation in which biomass precursors are synthesized solely from CO2 has remained an elusive grand challenge. Here, we demonstrate how a combination of rational metabolic rewiring, recombinant expression, and laboratory evolution has led to the biosynthesis of sugars and other major biomass constituents by a fully functional Calvin-Benson-Bassham (CBB) cycle in E. coli. In the evolved bacteria, carbon fixation is performed via a non-native CBB cycle, while reducing power and energy are obtained by oxidizing a supplied organic compound (e.g., pyruvate). Genome sequencing reveals that mutations in flux branchpoints, connecting the non-native CBB cycle to biosynthetic pathways, are essential for this phenotype. The successful evolution of a non-native carbon fixation pathway, though not yet resulting in net carbon gain, strikingly demonstrates the capacity for rapid trophic-mode evolution of metabolism applicable to biotechnology.
Pyruvate formate-lyase (PFL) is a ubiquitous enzyme that supports increased ATP yield during sugar fermentation. While the PFL reaction is known to be reversible in vitro, the ability of PFL to support microbial growth by condensing acetyl-CoA and formate in vivo has never been directly tested. Here, we employ Escherichia coli mutant strains that cannot assimilate acetate via the glyoxylate shunt and use carbon labeling experiments to unequivocally demonstrate PFL-dependent co-assimilation of acetate and formate. Moreover, PFL-dependent growth is faster than growth on acetate using the glyoxylate shunt. Hence, growth via the reverse activity of PFL could have substantial ecological and biotechnological significance.
Most proteins show changes in level across growth conditions. Many of these changes seem to be coordinated with the specific growth rate rather than the growth environment or the protein function. Although cellular growth rates, gene expression levels and gene regulation have been at the center of biological research for decades, there are only a few models giving a base line prediction of the dependence of the proteome fraction occupied by a gene with the specific growth rate. We present a simple model that predicts a widely coordinated increase in the fraction of many proteins out of the proteome, proportionally with the growth rate. The model reveals how passive redistribution of resources, due to active regulation of only a few proteins, can have proteome wide effects that are quantitatively predictable. Our model provides a potential explanation for why and how such a coordinated response of a large fraction of the proteome to the specific growth rate arises under different environmental conditions. The simplicity of our model can also be useful by serving as a baseline null hypothesis in the search for active regulation. We exemplify the usage of the model by analyzing the relationship between growth rate and proteome composition for the model microorganism E. coli as reflected in recent proteomics data sets spanning various growth conditions. We find that the fraction out of the proteome of a large number of proteins, and from different cellular processes, increases proportionally with the growth rate. Notably, ribosomal proteins, which have been previously reported to increase in fraction with growth rate, are only a small part of this group of proteins. We suggest that, although the fractions of many proteins change with the growth rate, such changes may be partially driven by a global effect, not necessarily requiring specific cellular control mechanisms.
Turnover numbers, also known as kcat values, are fundamental properties of enzymes. However, kcat data are scarce and measured in vitro, thus may not faithfully represent the in vivo situation. A basic question that awaits elucidation is: how representative are kcat values for the maximal catalytic rates of enzymes in vivo? Here, we harness omics data to calculate kvivomaxkmaxvivo, the observed maximal catalytic rate of an enzyme inside cells. Comparison with kcat values from Escherichia coli, yields a correlation of r2= 0.62 in log scale (p
It is often presented as common knowledge that, in the human body, bacteria outnumber human cells by a ratio of at least 10:1. Revisiting the question, we find that the ratio is much closer to 1:1.
Genetically identical cells exposed to the same environment display variability in gene expression (noise), with important consequences for the fidelity of cellular regulation and biological function. Although population average gene expression is tightly coupled to growth rate, the effects of changes in environmental conditions on expression variability are not known. Here, we measure the single-cell expression distributions of approximately 900 Saccharomyces cerevisiae promoters across four environmental conditions using flow cytometry, and find that gene expression noise is tightly coupled to the environment and is generally higher at lower growth rates. Nutrient-poor conditions, which support lower growth rates, display elevated levels of noise for most promoters, regardless of their specific expression values. We present a simple model of noise in expression that results from having an asynchronous population, with cells at different cell-cycle stages, and with different partitioning of the cells between the stages at different growth rates. This model predicts non-monotonic global changes in noise at different growth rates as well as overall higher variability in expression for cell-cycle-regulated genes in all conditions. The consistency between this model and our data, as well as with noise measurements of cells growing in a chemostat at well-defined growth rates, suggests that cell-cycle heterogeneity is a major contributor to gene expression noise. Finally, we identify gene and promoter features that play a role in gene expression noise across conditions. Our results show the existence of growth-related global changes in gene expression noise and suggest their potential phenotypic implications.
The high environmental costs of raising livestock are now widely appreciated, yet consumption of animal-based food items continues and is expanding throughout the world. Consumers' ability to distinguish among, and rank, various interchangeable animal-based items is crucial to reducing environmental costs of diets. However, the individual environmental burdens exerted by the five dominant livestock categories - beef, dairy, poultry, pork and eggs - are not fully known. Quantifying those burdens requires splitting livestock's relatively well-known total environmental costs (e.g. land and fertilizer use for feed production) into partial categorical costs. Because such partitioning quantifies the relative environmental desirability of various animal-based food items, it is essential for environmental impact minimization efforts to be made. Yet to date, no such partitioning method exists. The present paper presents such a partitioning method for feed production-related environmental burdens. This approach treated each of the main feed classes individually - concentrates (grain, soy, by-products; supporting production of all livestock), processed roughage (mostly hay and silage) and pasture - which is key given these classes' widely disparate environmental costs. It was found that for the current US food system and national diet, concentrates are partitioned as follows: beef 0.21 +/- 0.112, poultry 0.27 +/- 0.046, dairy 0.24 +/- 0.041, pork 0.23 +/- 0.093 and eggs 0.04 +/- 0.018. Pasture and processed roughage, consumed only by cattle, are 0.92 +/- 0.034 and 0.87 +/- 0.031 due to beef, with the remainder due to dairy. In a follow-up paper, the devised methodology will be employed to partition total land, irrigated water, greenhouse gases and reactive nitrogen burdens incurred by feed production among the five edible livestock categories.
Apart from addressing humanity's growing demand for fuels, pharmaceuticals, plastics and other value added chemicals, metabolic engineering of microbes can serve as a powerful tool to address questions concerning the characteristics of cellular metabolism. Along these lines, we developed an in vivo metabolic strategy that conclusively identifies the product specificity of glycerate kinase. By deleting E. coli's phosphoglycerate mutases, we divide its central metabolism into an 'upper' and 'lower' metabolism, each requiring its own carbon source for the bacterium to grow. Glycerate can serve to replace the upper or lower carbon source depending on the product of glycerate kinase. Using this strategy we show that while glycerate kinase from Arabidopsis thaliana produces 3-phosphoglycerate, both E. coli's enzymes generate 2-phosphoglycerate. This strategy represents a general approach to decipher enzyme specificity under physiological conditions.
In metabolism research, thermodynamics is usually used to determine the directionality of a reaction or the feasibility of a pathway. However, the relationship between thermodynamic potentials and fluxes is not limited to questions of directionality: thermodynamics also affects the kinetics of reactions through the flux-force relationship, which states that the logarithm of the ratio between the forward and reverse fluxes is directly proportional to the change in Gibbs energy due to a reaction (Delta(r)G '). Accordingly, if an enzyme catalyzes a reaction with a Delta(r)G ' of -5.7 kJ/mol then the forward flux will be roughly ten times the reverse flux. As Delta(r)G ' approaches equilibrium (Delta(r)G ' = 0 kJ/mol), exponentially more enzyme counterproductively catalyzes the reverse reaction, reducing the net rate at which the reaction proceeds. Thus, the enzyme level required to achieve a given flux increases dramatically near equilibrium. Here, we develop a framework for quantifying the degree to which pathways suffer these thermodynamic limitations on flux. For each pathway, we calculate a single thermodynamically-derived metric (the Max-min Driving Force, MDF), which enables objective ranking of pathways by the degree to which their flux is constrained by low thermodynamic driving force. Our framework accounts for the effect of pH, ionic strength and metabolite concentration ranges and allows us to quantify how alterations to the pathway structure affect the pathway's thermodynamics. Applying this methodology to pathways of central metabolism sheds light on some of their features, including metabolic bypasses (e.g., fermentation pathways bypassing substrate-level phosphorylation), substrate channeling (e.g., of oxaloacetate from malate dehydrogenase to citrate synthase), and use of alternative cofactors (e.g., quinone as an electron acceptor instead of NAD). The methods presented here place another arrow in metabolic engineers' quiver, providing a simple means
The microscopic world of a cell can be as alien to our human-centered intuition as the confinement of quarks within protons or the event horizon of a black hole. We are prone to thinking by analogy-Golgi cisternae stack like pancakes, red blood cells look like donuts-but very little in our human experience is truly comparable to the immensely crowded, membrane-subdivided interior of a eukaryotic cell or the intricately layered structures of a mammalian tissue. So in our daily efforts to understand how cells work, we are faced with a challenge: how do we develop intuition that works at the microscopic scale?
Livestock production impacts air and water quality, ocean health, and greenhouse gas (GHG) emissions on regional to global scales and it is the largest use of land globally. Quantifying the environmental impacts of the various livestock categories, mostly arising from feed production, is thus a grand challenge of sustainability science. Here, we quantify land, irrigation water, and reactive nitrogen (Nr) impacts due to feed production, and recast published full life cycle GHG emission estimates, for each of the major animal-based categories in the US diet. Our calculations reveal that the environmental costs per consumed calorie of dairy, poultry, pork, and eggs are mutually comparable (to within a factor of 2), but strikingly lower than the impacts of beef. Beef production requires 28, 11, 5, and 6 times more land, irrigation water, GHG, and Nr, respectively, than the average of the other livestock categories. Preliminary analysis of three staple plant foods shows two-to sixfold lower land, GHG, and Nr requirements than those of the nonbeef animal-derived calories, whereas irrigation requirements are comparable. Our analysis is based on the best data currently available, but follow-up studies are necessary to improve parameter estimates and fill remaining knowledge gaps. Data imperfections notwithstanding, the key conclusion-that beef production demands about 1 order of magnitude more resources than alternative livestock categories-is robust under existing uncertainties. The study thus elucidates the multiple environmental benefits of potential, easy-to-implement dietary changes, and highlights the uniquely high resource demands of beef.
Proteomics techniques generate an avalanche of data and promise to satisfy biologists' long-held desire to measure absolute protein abundances on a genome-wide scale. However, can this knowledge be translated into a clearer picture of how cells invest their protein resources? This article aims to give a broad perspective on the composition of proteomes as gleaned from recent quantitative proteomics studies. We describe proteomaps, an approach for visualizing the composition of proteomes with a focus on protein abundances and functions. In proteomaps, each protein is shown as a polygon-shaped tile, with an area representing protein abundance. Functionally related proteins appear in adjacent regions. General trends in proteomes, such as the dominance of metabolism and protein production, become easily visible. We make interactive visualizations of published proteome datasets accessible at www.proteomaps.net. We suggest that evaluating the way protein resources are allocated by various organisms and cell types in different conditions will sharpen our understanding of how and why cells regulate the composition of their proteomes.
To understand gene function, genetic analysis uses large perturbations such as gene deletion, knockdown or overexpression. Large perturbations have drawbacks: they move the cell far from its normal working point, and can thus be masked by off-target effects or compensation by other genes. Here, we offer a complementary approach, called noise genetics. We use natural cell-cell variations in protein level and localization, and correlate them to the natural variations of the phenotype of the same cells. Observing these variations is made possible by recent advances in dynamic proteomics that allow measuring proteins over time in individual living cells. Using motility of human cancer cells as a model system, and time-lapse microscopy on 566 fluorescently tagged proteins, we found 74 candidate motility genes whose level or localization strongly correlate with motility in individual cells. We recovered 30 known motility genes, and validated several novel ones by mild knockdown experiments. Noise genetics can complement standard genetics for a variety of phenotypes.
Most genes change expression levels across conditions, but it is unclear which of these changes represents specific regulation and what determines their quantitative degree. Here, we accurately measured activities of B900 S. cerevisiae and B1800 E. coli promoters using fluorescent reporters. We show that in both organisms 60-90% of promoters change their expression between conditions by a constant global scaling factor that depends only on the conditions and not on the promoter's identity. Quantifying such global effects allows precise characterization of specific regulationpromoters deviating from the global scale line. These are organized into few functionally related groups that also adhere to scale lines and preserve their relative activities across conditions. Thus, only several scaling factors suffice to accurately describe genome-wide expression profiles across conditions. We present a parameter-free passive resource allocation model that quantitatively accounts for the global scaling factors. It suggests that many changes in expression across conditions result from global effects and not specific regulation, and provides means for quantitative interpretation of expression profiles.
Steady-state metabolite concentrations in a microorganism typically span several orders of magnitude. The underlying principles governing these concentrations remain poorly understood. Here, we hypothesize that observed variation can be explained in terms of a compromise between factors that favor minimizing metabolite pool sizes (e. g. limited solvent capacity) and the need to effectively utilize existing enzymes. The latter requires adequate thermodynamic driving force in metabolic reactions so that forward flux substantially exceeds reverse flux. To test this hypothesis, we developed a method, metabolic tug-of-war (mTOW), which computes steady-state metabolite concentrations in microorganisms on a genome-scale. mTOW is shown to explain up to 55% of the observed variation in measured metabolite concentrations in E. coli and C. acetobutylicum across various growth media. Our approach, based strictly on first thermodynamic principles, is the first method that successfully predicts high-throughput metabolite concentration data in bacteria across conditions.
Contrary to the textbook portrayal of glycolysis as a single pathway conserved across all domains of life, not all sugar-consuming organisms use the canonical Embden-Meyerhoff-Parnass (EMP) glycolytic pathway. Prokaryotic glucose metabolism is particularly diverse, including several alternative glycolytic pathways, the most common of which is the Entner-Doudoroff (ED) pathway. The prevalence of the ED pathway is puzzling as it produces only one ATP per glucose-half as much as the EMP pathway. We argue that the diversity of prokaryotic glucose metabolism may reflect a tradeoff between a pathway's energy (ATP) yield and the amount of enzymatic protein required to catalyze pathway flux. We introduce methods for analyzing pathways in terms of thermodynamics and kinetics and show that the ED pathway is expected to require several-fold less enzymatic protein to achieve the same glucose conversion rate as the EMP pathway. Through genomic analysis, we further show that prokaryotes use different glycolytic pathways depending on their energy supply. Specifically, energy-deprived anaerobes overwhelmingly rely upon the higher ATP yield of the EMP pathway, whereas the ED pathway is common among facultative anaerobes and even more common among aerobes. In addition to demonstrating how protein costs can explain the use of alternative metabolic strategies, this study illustrates a direct connection between an organism's environment and the thermodynamic and biochemical properties of the metabolic pathways it employs.
Translational coupling is the interdependence of translation efficiency of neighboring genes encoded within an operon. The degree of coupling may be quantified by measuring how the translation rate of a gene is modulated by the translation rate of its upstream gene. Translational coupling was observed in prokaryotic operons several decades ago, but the quantitative range of modulation translational coupling leads to and the factors governing this modulation were only partially characterized. In this study, we systematically quantify and characterize translational coupling in E. coli synthetic operons using a library of plasmids carrying fluorescent reporter genes that are controlled by a set of different ribosome binding site (RBS) sequences. The downstream gene expression level is found to be enhanced by the upstream gene expression via translational coupling with the enhancement level varying from almost no coupling to over 10-fold depending on the upstream gene's sequence. Additionally, we find that the level of translational coupling in our system is similar between the second and third locations in the operon. The coupling depends on the distance between the stop codon of the upstream gene and the start codon of the downstream gene. This study is the first to systematically and quantitatively characterize translational coupling in a synthetic E. coli operon. Our analysis will be useful in accurate manipulation of gene expression in synthetic biology and serves as a step toward understanding the mechanisms involved in translational expression modulation.
Protein levels are a dominant factor shaping natural and synthetic biological systems. Although proper functioning of metabolic pathways relies on precise control of enzyme levels, the experimental ability to balance the levels of many genes in parallel is a major outstanding challenge. Here, we introduce a rapid and modular method to span the expression space of several proteins in parallel. By combinatorially pairing genes with a compact set of ribosome-binding sites, we modulate protein abundance by several orders of magnitude. We demonstrate our strategy by using a synthetic operon containing fluorescent proteins to span a 3D color space. Using the same approach, we modulate a recombinant carotenoid biosynthesis pathway in Escherichia coli to reveal a diversity of phenotypes, each characterized by a distinct carotenoid accumulation profile. In a single combinatorial assembly, we achieve a yield of the industrially valuable compound astaxanthin 4-fold higher than previously reported. The methodology presented here provides an efficient tool for exploring a high-dimensional expression space to locate desirable phenotypes.
Sustainability indicators strive to convey the impacts of human activities on natural resource utilization, yet many fail to express these impacts in a simple relatable manner. We introduce a new sustainability indicator. EcoTime, which recasts an environmental burden of a process or item (e.g., the emission of 10 kg CO2 associated with a car trip) in time units (seconds, days, etc.). The EcoTime units represent the burden's share of a benchmark quota calculated according to location or context. For example, a developed country's average yearly CO2 emissions of 11 ton per capita would translate to 365 EcoTime days in which case the 10 kg CO2 mentioned above would equal approximate to 8 EcoTime hours. Since time units are commonly used the EcoTime indicator is easy to communicate to a varying audience alleviating challenges often associated with existing sustainability indicators. It leverages our innate ability to easily grasp contrasting time units over several orders of magnitude, ranging from seconds to years. Another key advantage of EcoTime is that its value shifts attention from the absolute environmental impact, which may not be meaningful to most people, to impact magnitude relative to world resource availability or usage, thus giving the burden an intuitive, intrinsic context. In addition, EcoTimes of different impact types can be conveniently and succinctly grouped as a vector (e.g., GHG emissions, water, or land footprints), or, because of the similar units, as a composite scalar. We provide several case study examples of the methodology. (C) 2012 Elsevier Ltd. All rights reserved.
Novel methods such as mass-spectrometry enable a view of the proteomes of cells in unprecedented detail. Recently, these efforts have culminated in quantitative measurements of the number of copies per cell for most expressed proteins in organisms ranging from bacteria to mammalian cells. Here, we estimate the expected total number of proteins per unit of cell volume using known parameters related to the composition of cells such as the fraction of cell mass that is protein, and the average protein length. Using simple arguments, we estimate a range of 2-4 million proteins per cubic micron (i.e. 1fL) in bacteria, yeast, and mammalian cells. Interestingly, we find that measured values that are reported for fission yeast and mammalian cells are often about 3-10 times lower. We discuss this apparent discrepancy and how to use the estimate as benchmark to recalibrate proteome-wide quantitative censuses or to revisit assumptions about cell composition.
Michaelis and Menten's mechanism for enzymatic catalysis is remarkable both in its simplicity and its wide applicability. The extension for reversible processes, as done by Haldane, makes it even more relevant as most enzymes catalyze reactions that are reversible in nature and carry in vivo flux in both directions. Here, we decompose the reversible Michaelis-Menten equation into three terms, each with a clear physical meaning: catalytic capacity, substrate saturation and thermodynamic driving force. This decomposition facilitates a better understanding of enzyme kinetics and highlights the relationship between thermodynamics and kinetics, a relationship which is often neglected. We further demonstrate how our separable rate law can be understood from different points of view, shedding light on factors shaping enzyme catalysis. (c) 2013 Federation of European Biochemical Societies. Published by Elsevier B. V. All rights reserved.
Electrosynthesis is a promising approach that enables the biological production of commodities, like fuels and fine chemicals, using renewably produced electricity. Several techniques have been proposed to mediate the transfer of electrons from the cathode to living cells. Of these, the electroproduction of formate as a mediator seems especially promising: formate is readily soluble, of low toxicity and-can be produced at relatively high efficiency and at reasonable current density. While organisms that are capable of formatotophic growth, i.e. growth on formate, exist naturally, they are generally less suitable for bulk cultivation and industrial needs. Hence, it may be helpful to engineer a model organism of industrial relevance, such as E. coli, for growth on formate. There are numerous metabolic pathways that can potentially support formatotrophic growth. Here we analyze these diverse pathways according to various criteria including biomass yield, thermodynamic favorability, chemical motive force, kinetics and the practical challenges posed by their expression. We find that the reductive glycine pathway, composed of the tetrahydrofolate system, the glycine cleavage system, serine hydroxymethyltransferase and serine deaminase, is a promising candidate to support electrosynthesis in E. coli. The approach presented here exemplifies how combining different computational approaches into a systematic analysis methodology provides assistance in redesigning metabolism. This article is part of a Special Issue entitled: Metals in Bioenergetics and Biomimetics Systems. (C) 2012 Elsevier B.V. All rights reserved.
Standard Gibbs energies of reactions are increasingly being used in metabolic modeling for applying thermodynamic constraints on reaction rates, metabolite concentrations and kinetic parameters. The increasing scope and diversity of metabolic models has led scientists to look for genome-scale solutions that can estimate the standard Gibbs energy of all the reactions in metabolism. Group contribution methods greatly increase coverage, albeit at the price of decreased precision. We present here a way to combine the estimations of group contribution with the more accurate reactant contributions by decomposing each reaction into two parts and applying one of the methods on each of them. This method gives priority to the reactant contributions over group contributions while guaranteeing that all estimations will be consistent, i.e. will not violate the first law of thermodynamics. We show that there is a significant increase in the accuracy of our estimations compared to standard group contribution. Specifically, our cross-validation results show an 80% reduction in the median absolute residual for reactions that can be derived by reactant contributions only. We provide the full framework and source code for deriving estimates of standard reaction Gibbs energy, as well as confidence intervals, and believe this will facilitate the wide use of thermodynamic data for a better understanding of metabolism.
The laws of thermodynamics constrain the action of biochemical systems. However, thermodynamic data on biochemical compounds can be difficult to find and is cumbersome to perform calculations with manually. Even simple thermodynamic questions like 'how much Gibbs energy is released by ATP hydrolysis at pH 5?' are complicated excessively by the search for accurate data. To address this problem, eQuilibrator couples a comprehensive and accurate database of thermodynamic properties of biochemical compounds and reactions with a simple and powerful online search and calculation interface. The web interface to eQuilibrator (http://equilibrator.weizmann.ac.il) enables easy calculation of Gibbs energies of compounds and reactions given arbitrary pH, ionic strength and metabolite concentrations. The eQuilibrator code is open-source and all thermodynamic source data are freely downloadable in standard formats. Here we describe the database characteristics and implementation and demonstrate its use.
Regulation of proteins across the cell cycle is a basic process in cell biology. It has been difficult to study this globally in human cells due to lack of methods to accurately follow protein levels and localizations over time. Estimates based on global mRNA measurements suggest that only a few percent of human genes have cell-cycle dependent mRNA levels. Here, we used dynamic proteomics to study the cell-cycle dependence of proteins. We used 495 clones of a human cell line, each with a different protein tagged fluorescently at its endogenous locus. Protein level and localization was quantified in individual cells over 24h of growth using time-lapse microscopy. Instead of standard chemical or mechanical methods for cell synchronization, we employed in-silico synchronization to place protein levels and localization on a time axis between two cell divisions. This non-perturbative synchronization approach, together with the high accuracy of the measurements, allowed a sensitive assay of cell-cycle dependence. We further developed a computational approach that uses texture features to evaluate changes in protein localizations. We find that 40% of the proteins showed cell cycle dependence, of which 11% showed changes in protein level and 35% in localization. This suggests that a broader range of cell-cycle dependent proteins exists in human cells than was previously appreciated. Most of the cell-cycle dependent proteins exhibit changes in cellular localization. Such changes can be a useful tool in the regulation of the cell-cycle being fast and efficient.
Background: Constraint-based modeling is increasingly employed for metabolic network analysis. Its underlying assumption is that natural metabolic phenotypes can be predicted by adding physicochemical constraints to remove unrealistic metabolic flux solutions. The loopless-COBRA approach provides an additional constraint that eliminates thermodynamically infeasible internal cycles (or loops) from the space of solutions. This allows the prediction of flux solutions that are more consistent with experimental data. However, it is not clear if this approach over-constrains the models by removing non-loop solutions as well. Results: Here we apply Gordan's theorem from linear algebra to prove for the first time that the constraints added in loopless-COBRA do not over-constrain the problem beyond the elimination of the loops themselves. Conclusions: The loopless-COBRA constraints can be reliably applied. Furthermore, this proof may be adapted to evaluate the theoretical soundness for other methods in constraint-based modeling.
Thermodynamics impose a major constraint on the structure of metabolic pathways. Here, we use carbon fixation pathways to demonstrate how thermodynamics shape the structure of pathways and determine the cellular resources they consume. We analyze the energetic profile of prototypical reactions and show that each reaction type displays a characteristic change in Gibbs energy. Specifically, although carbon fixation pathways display a considerable structural variability, they are all energetically constrained by two types of reactions: carboxylation and carboxyl reduction. In fact, all adenosine triphosphate (ATP) molecules consumed by carbon fixation pathways - with a single exception - are used, directly or indirectly, to power one of these unfavorable reactions. When an indirect coupling is employed, the energy released by ATP hydrolysis is used to establish another chemical bond with high energy of hydrolysis, e.g. a thioester. This bond is cleaved by a downstream enzyme to energize an unfavorable reaction. Notably, many pathways exhibit reduced ATP requirement as they couple unfavorable carboxylation or carboxyl reduction reactions to exergonic reactions other than ATP hydrolysis. In the most extreme example, the reductive acetyl coenzyme A (acetyl-CoA) pathway bypasses almost all ATP-consuming reactions. On the other hand, the reductive pentose phosphate pathway appears to be the least ATP-efficient because it is the only carbon fixation pathway that invests ATP in metabolic aims other than carboxylation and carboxyl reduction. Altogether, our analysis indicates that basic thermodynamic considerations accurately predict the resource investment required to support a metabolic pathway and further identifies biochemical mechanisms that can decrease this requirement. (C) 2012 Elsevier B.V. All rights reserved.
Motivation: The laws of thermodynamics describe a direct, quantitative relationship between metabolite concentrations and reaction directionality. Despite great efforts, thermodynamic data suffers from limited coverage, scattered accessibility and nonstandard annotations. We present a framework for unifying thermodynamic data from multiple sources and demonstrate two new techniques for extrapolating the Gibbs energies of unmeasured reactions and conditions. Results: Both methods account for changes in cellular conditions (pH, ionic strength, etc.) by using linear regression over the delta G(degrees) of pseudoisomers and reactions. The Pseudoisomeric Reactant Contribution method systematically infers compound formation energies using measured K' and pK(a) data. The Pseudoisomeric Group Contribution method extends the group contribution method and achieves a high coverage of unmeasured reactions. We define a continuous index that predicts the reversibility of a reaction under a given physiological concentration range. In the characteristic physiological range 3 mu M-3mM, we find that roughly half of the reactions in Escherichia coli's metabolism are reversible. These new tools can increase the accuracy of thermodynamic-based models, especially in non-standard pH and ionic strengths. The reversibility index can help modelers decide which reactions are reversible in physiological conditions.
Identifying the factors that determine microbial growth rate under various environmental and genetic conditions is a major challenge of systems biology. While current genome-scale metabolic modeling approaches enable us to successfully predict a variety of metabolic phenotypes, including maximal biomass yield, the prediction of actual growth rate is a long standing goal. This gap stems from strictly relying on data regarding reaction stoichiometry and directionality, without accounting for enzyme kinetic considerations. Here we present a novel metabolic network-based approach, MetabOlic Modeling with ENzyme kineTics (MOMENT), which predicts metabolic flux rate and growth rate by utilizing prior data on enzyme turnover rates and enzyme molecular weights, without requiring measurements of nutrient uptake rates. The method is based on an identified design principle of metabolism in which enzymes catalyzing high flux reactions across different media tend to be more efficient in terms of having higher turnover numbers. Extending upon previous attempts to utilize kinetic data in genome-scale metabolic modeling, our approach takes into account the requirement for specific enzyme concentrations for catalyzing predicted metabolic flux rates, considering isozymes, protein complexes, and multi-functional enzymes. MOMENT is shown to significantly improve the prediction accuracy of various metabolic phenotypes in E. coli, including intracellular flux rates and changes in gene expression levels under different growth rates. Most importantly, MOMENT is shown to predict growth rates of E. coli under a diverse set of media that are correlated with experimental measurements, markedly improving upon existing state-of-the art stoichiometric modeling approaches. These results support the view that a physiological bound on cellular enzyme concentrations is a key factor that determines microbial growth rate.
Metabolic pathways may seem arbitrary and unnecessarily complex. In many cases, a chemist might devise a simpler route for the biochemical transformation, so why has nature chosen such complex solutions? In this review, we distill lessons from a century of metabolic research and introduce new observations suggesting that the intricate structure of metabolic pathways can be explained by a small set of biochemical principles. Using glycolysis as an example, we demonstrate how three key biochemical constraints-thermodynamic favorability, availability of enzymatic mechanisms and the physicochemical properties of pathway intermediates-eliminate otherwise plausible metabolic strategies. Considering these constraints, glycolysis contains no unnecessary steps and represents one of the very few pathway structures that meet cellular demands. The analysis presented here can be applied to metabolic engineering efforts for the rational design of pathways that produce a desired product while satisfying biochemical constraints.
Metabolic engineering of plants can reduce the cost and environmental impact of agriculture while providing for the needs of a growing population. Although our understanding of plant metabolism continues to increase at a rapid pace, relatively few plant metabolic engineering projects with commercial potential have emerged, in part because of a lack of principles for the rational manipulation of plant phenotype. One underexplored approach to identifying such design principles derives from analysis of the dominant constraints on plant fitness, and the evolutionary innovations in response to those constraints, that gave rise to the enormous diversity of natural plant metabolic pathways.
While the reductive pentose phosphate cycle is responsible for the fixation of most of the carbon in the biosphere, it has several natural substitutes. In fact, due to the characterization of three new carbon fixation pathways in the last decade, the diversity of known metabolic solutions for autotrophic growth has doubled. In this review, the different pathways are analysed and compared according to various criteria, trying to connect each of the different metabolic alternatives to suitable environments or metabolic goals. The different roles of carbon fixation are discussed; in addition to sustaining autotrophic growth it can also be used for energy conservation and as an electron sink for the recycling of reduced electron carriers. Our main focus in this review is on thermodynamic and kinetic aspects, including thermodynamically challenging reactions, the ATP requirement of each pathway, energetic constraints on carbon fixation, and factors that are expected to limit the rate of the pathways. Finally, possible metabolic structures of yet unknown carbon fixation pathways are suggested and discussed.
Background: C-4 plants such as corn and sugarcane assimilate atmospheric CO2 into biomass by means of the C-4 carbon fixation pathway. We asked how PEP formation rate, a key step in the carbon fixation pathway, might work at a precise rate, regulated by light, despite fluctuations in substrate and enzyme levels constituting and regulating this process. Results: We present a putative mechanism for robustness in C-4 carbon fixation, involving a key enzyme in the pathway, pyruvate orthophosphate dikinase (PPDK), which is regulated by a bifunctional enzyme, Regulatory Protein (RP). The robust mechanism is based on avidity of the bifunctional enzyme RP to its multimeric substrate PPDK, and on a product-inhibition feedback loop that couples the system output to the activity of the bifunctional regulator. The model provides an explanation for several unusual biochemical characteristics of the system and predicts that the system's output, phosphoenolpyruvate (PEP) formation rate, is insensitive to fluctuations in enzyme levels (PPDK and RP), substrate levels (ATP and pyruvate) and the catalytic rate of PPDK, while remaining sensitive to the system's input (light levels). Conclusions: The presented PPDK mechanism is a new way to achieve robustness using product inhibition as a feedback loop on a bifunctional regulatory enzyme. This mechanism exhibits robustness to protein and metabolite levels as well as to catalytic rate changes. At the same time, the output of the system remains tuned to input levels.
Latency and ongoing replication(1) have both been proposed to explain the drug-insensitive human immunodeficiency virus (HIV) reservoir maintained during antiretroviral therapy. Here we explore a novel mechanism for ongoing HIV replication in the face of antiretroviral drugs. We propose a model whereby multiple infections(2,3) per cell lead to reduced sensitivity to drugs without requiring drug-resistant mutations, and experimentally validate the model using multiple infections per cell by cell-free HIV in the presence of the drug tenofovir. We then examine the drug sensitivity of cell-to-cell spread of HIV(4-7), a mode of HIV transmission that can lead to multiple infection events per target cell(8-10). Infections originating from cell-free virus decrease strongly in the presence of antiretrovirals tenofovir and efavirenz whereas infections involving cell-to-cell spread are markedly less sensitive to the drugs. The reduction in sensitivity is sufficient to keep multiple rounds of infection from terminating in the presence of drugs. We examine replication from cell-to-cell spread in the presence of clinical drug concentrations using a stochastic infection model and find that replication is intermittent, without substantial accumulation of mutations. If cell-to-cell spread has the same properties in vivo, it may have adverse consequences for the immune system(11-13), lead to therapy failure in individuals with risk factors(14), and potentially contribute to viral persistence and hence be a barrier to curing HIV infection.
What governs the concentrations of metabolites within living cells? Beyond specific metabolic and enzymatic considerations, are there global trends that affect their values? We hypothesize that the physico-chemical properties of metabolites considerably affect their in-vivo concentrations. The recently achieved experimental capability to measure the concentrations of many metabolites simultaneously has made the testing of this hypothesis possible. Here, we analyze such recently available data sets of metabolite concentrations within E. coli, S. cerevisiae, B. subtilis and human. Overall, these data sets encompass more than twenty conditions, each containing dozens (28-108) of simultaneously measured metabolites. We test for correlations with various physico-chemical properties and find that the number of charged atoms, non-polar surface area, lipophilicity and solubility consistently correlate with concentration. In most data sets, a change in one of these properties elicits a similar to 100 fold increase in metabolite concentrations. We find that the non-polar surface area and number of charged atoms account for almost half of the variation in concentrations in the most reliable and comprehensive data set. Analyzing specific groups of metabolites, such as amino-acids or phosphorylated nucleotides, reveals even a higher dependence of concentration on hydrophobicity. We suggest that these findings can be explained by evolutionary constraints imposed on metabolite concentrations and discuss possible selective pressures that can account for them. These include the reduction of solute leakage through the lipid membrane, avoidance of deleterious aggregates and reduction of non-specific hydrophobic binding. By highlighting the global constraints imposed on metabolic pathways, future research could shed light onto aspects of biochemical evolution and the chemical constraints that bound metabolic engineering efforts.
The kinetic parameters of enzymes are key to understanding the rate and specificity of most biological processes. Although specific trends are frequently studied for individual enzymes, global trends are rarely addressed. We performed an analysis of k(cat) and K-M values of several thousand enzymes collected from the literature. We found that the "average enzyme" exhibits a k(cat) of similar to 10 s(-1) and a k(cat)/K-M of similar to 10(5) s(-1) M-1, much below the diffusion limit and the characteristic textbook portrayal of kinetically superior enzymes. Why do most enzymes exhibit moderate catalytic efficiencies? Maximal rates may not evolve in cases where weaker selection pressures are expected. We find, for example, that enzymes operating in secondary metabolism are, on average, similar to 30-fold slower than those of central metabolism. We also find indications that the physicochemical properties of substrates affect the kinetic parameters. Specifically, low molecular mass and hydrophobicity appear to limit K-M optimization. In accordance, substitution with phosphate, Colt, or other large modifiers considerably lowers the K-M values of enzymes utilizing the substituted substrates. It therefore appears that both evolutionary selection pressures and physicochemical constraints shape the kinetic parameters of enzymes. It also seems likely that the catalytic efficiency of some enzymes toward their natural substrates could be increased in many cases by natural or laboratory evolution.
P>Cyanobacteria play a key role in marine photosynthesis, which contributes to the global carbon cycle and to the world oxygen supply. Genes encoding the photosystem-II (PSII) reaction centre are found in many cyanophage genomes, and it was suggested that the horizontal transfer of these genes might be involved in increasing phage fitness. Recently, evidence for the existence of phages carrying Photosystem-I (PSI) genes was also reported. Here, using a combination of different marine metagenomic datasets and a unique crossing of the datasets, we now describe the finding of phages that, as in plants and cyanobacteria, contain both PSII and PSI genes. In addition, these phages also contain NADH dehydrogenase genes. The presence of modified PSII and PSI genes in the same viral entities in combination with electron transfer proteins like NAD(P)H dehydrogenase (NDH-1) strongly points to a role in perturbation of the cyanobacterial host photosynthetic electron flow. We therefore suggest that, depending on the physiological condition of the infected cyanobacterial host, the viruses may use different options to maximize survival. The modified PSI may alternate between functioning with PSII in linear electron transfer and contributing to the production of both NADPH and ATP or functioning independently of PSII in cyclic mode via the NDH-1 complex and thus producing only ATP.
Central carbon metabolism uses a complex series of enzymatic steps to convert sugars into metabolic precursors. These precursors are then used to generate the entire biomass of the cell. Are there simplifying principles that can explain the structure of such metabolic networks? Here we address this question by studying central carbon metabolism in E. coli. We use all known classes of enzymes that work on carbohydrates to generate rules for converting compounds and for generating possible paths between compounds. We find that central carbon metabolism is built as a minimal walk between the 12 precursor metabolites that form the basis for biomass and one precursor essential for the positive net ATP balance in glycolysis: every pair of consecutive precursors in the network is connected by the minimal number of enzymatic steps. Similarly, input sugars are converted into precursors by the shortest possible enzymatic paths. This suggests an optimality principle for the structure of central carbon metabolism. The present approach may be used to study other metabolic networks and to design new minimal pathways.
Carbon fixation is the process by which CO(2) is incorporated into organic compounds. In modern agriculture in which water, light, and nutrients can be abundant, carbon fixation could become a significant growth-limiting factor. Hence, increasing the fixation rate is of major importance in the road toward sustainability in food and energy production. There have been recent attempts to improve the rate and specificity of Rubisco, the carboxylating enzyme operating in the Calvin-Benson cycle; however, they have achieved only limited success. Nature employs several alternative carbon fixation pathways, which prompted us to ask whether more efficient novel synthetic cycles could be devised. Using the entire repertoire of approximately 5,000 metabolic enzymes known to occur in nature, we computationally identified alternative carbon fixation pathways that combine existing metabolic building blocks from various organisms. We compared the natural and synthetic pathways based on physicochemical criteria that include kinetics, energetics, and topology. Our study suggests that some of the proposed synthetic pathways could have significant quantitative advantages over their natural counterparts, such as the overall kinetic rate. One such cycle, which is predicted to be two to three times faster than the Calvin-Benson cycle, employs the most effective carboxylating enzyme, phosphoenolpyruvate carboxylase, using the core of the naturally evolved C4 cycle. Although implementing such alternative cycles presents daunting challenges related to expression levels, activity, stability, localization, and regulation, we believe our findings suggest exciting avenues of exploration in the grand challenge of enhancing food and renewable fuel production via metabolic engineering and synthetic biology.
Rubisco (D-ribulose 1,5-bisphosphate carboxylase/oxygenase), probably the most abundant protein in the biosphere, performs an essential part in the process of carbon fixation through photosynthesis, thus facilitating life on earth. Despite the significant effect that Rubisco has on the fitness of plants and other photosynthetic organisms, this enzyme is known to have a low catalytic rate and a tendency to confuse its substrate, carbon dioxide, with oxygen. This apparent inefficiency is puzzling and raises questions regarding the roles of evolution versus biochemical constraints in shaping Rubisco. Here we examine these questions by analyzing the measured kinetic parameters of Rubisco from various organisms living in various environments. The analysis presented here suggests that the evolution of Rubisco is confined to an effectively one-dimensional landscape, which is manifested in simple power law correlations between its kinetic parameters. Within this one-dimensional landscape, which may represent biochemical and structural constraints, Rubisco appears to be tuned to the intracellular environment in which it resides such that the net photosynthesis rate is nearly optimal. Our analysis indicates that the specificity of Rubisco is not the main determinant of its efficiency but rather the trade-off between the carboxylation velocity and CO(2) affinity. As a result, the presence of oxygen has only a moderate effect on the optimal performance of Rubisco, which is determined mostly by the local CO(2) concentration. Rubisco appears as an experimentally testable example for the evolution of proteins subject both to strong selection pressure and to biochemical constraints that strongly confine the evolutionary plasticity to a low-dimensional landscape.
BioNumbers (http://www.bionumbers.hms.harvard.edu) is a database of key numbers in molecular and cell biology-the quantitative properties of biological systems of interest to computational, systems and molecular cell biologists. Contents of the database range from cell sizes to metabolite concentrations, from reaction rates to generation times, from genome sizes to the number of mitochondria in a cell. While always of importance to biologists, having numbers in hand is becoming increasingly critical for experimenting, modeling, and analyzing biological systems. BioNumbers was motivated by an appreciation of how long it can take to find even the simplest number in the vast biological literature. All numbers are taken directly from a literature source and that reference is provided with the number. BioNumbers is designed to be highly searchable and queries can be performed by keywords or browsed by menus. BioNumbers is a collaborative community platform where registered users can add content and make comments on existing data. All new entries and commentary are curated to maintain high quality. Here we describe the database characteristics and implementation, demonstrate its use, and discuss future directions for its development.
Although the quantitative description of biological systems has been going on for centuries, recent advances in the measurement of phenomena ranging from metabolism to gene expression to signal transduction have resulted in a new emphasis on biological numeracy. This article describes the confluence of two different approaches to biological numbers. First, an impressive array of quantitative measurements make it possible to develop intuition about biological numbers ranging from how many gigatons of atmospheric carbon are fixed every year in the process of photosynthesis to the number of membrane transporters needed to provide sugars to rapidly dividing Escherichia coli cells. As a result of the vast array of such quantitative data, the BioNumbers web site has recently been developed as a repository for biology by the numbers. Second, a complementary and powerful tradition of numerical estimates familiar from the physical sciences and canonized in the so-called "Fermi problems" calls for efforts to estimate key biological quantities on the basis of a few foundational facts and simple ideas from physics and chemistry. In this article, we describe these two approaches and illustrate their synergism in several particularly appealing case studies. These case studies reveal the impact that an emphasis on numbers can have on important biological questions.
The sun's spectrum harvested through photosynthesis is the primary source of energy for life on earth. Plants, green algae, and cyanobacteria-the major primary producers on earth-utilize reaction centers that operate at wavelengths of 680 and 700 nm. Why were these wavelengths "chosen" in evolution? This study analyzes the efficiency of light conversion into chemical energy as a function of hypothetical reaction center absorption wavelengths given the sun's spectrum and the overpotential cost associated with charge separation. Surprisingly, it is found here that when taking into account the empirical charge separation cost the range 680-720 nm maximizes the conversion efficiency. This suggests the possibility that the wavelengths of photosystem I and II were optimized at some point in their evolution for the maximal utilization of the sun's spectrum.
A current challenge in biology is to understand the dynamics of protein circuits in living human cells. Can one define and test equations for the dynamics and variability of a protein over time? Here, we address this experimentally and theoretically, by means of accurate time-resolved measurements of endogenously tagged proteins in individual human cells. As a model system, we choose three stable proteins displaying cell-cycle-dependant dynamics. We find that protein accumulation with time per cell is quadratic for proteins with long mRNA life times and approximately linear for a protein with short mRNA lifetime. Both behaviors correspond to a classical model of transcription and translation. A stochastic model, in which genes slowly switch between ON and OFF states, captures measured cell-cell variability. The data suggests, in accordance with the model, that switching to the gene ON state is exponentially distributed and that the cell-cell distribution of protein levels can be approximated by a Gamma distribution throughout the cell cycle. These results suggest that relatively simple models may describe protein dynamics in individual human cells.
Why do seemingly identical cells respond differently to a drug? To address this, we studied the dynamics and variability of the protein response of human cancer cells to a chemotherapy drug, camptothecin. We present a dynamic- proteomics approach that measures the levels and locations of nearly 1000 different endogenously tagged proteins in individual living cells at high temporal resolution. All cells show rapid translocation of proteins specific to the drug mechanism, including the drug target ( topoisomerase- 1), and slower, wide- ranging temporal waves of protein degradation and accumulation. However, the cells differ in the behavior of a subset of proteins. We identify proteins whose dynamics differ widely between cells, in a way that corresponds to the outcomes- cell death or survival. This opens the way to understanding molecular responses to drugs in individual cells.
Modulation of the activity of the molecular chaperone HSP90 has been extensively discussed as a means to alter phenotype in many traits and organisms. Such changes can be due to the exposure of cryptic genetic variation, which in some instances may also be accomplished by mild environmental alteration. Should such polymorphisms be widespread, natural selection may be more effective at producing phenotypic change in suboptimal environments. However, the frequency and identity of buffered polymorphisms in natural populations are unknown. Here, we employ quantitative genetic dissection of an Arabidopsis thaliana developmental response, hypocotyl elongation in the dark, to detail the underpinnings of genetic variation responsive to HSP90 modulation. We demonstrate that HSP90-dependent alleles occur in continuously distributed, environmentally responsive traits and are amenable to quantitative genetic mapping techniques. Furthermore, such alleles are frequent in natural populations and can have significant effects on natural phenotypic variation. We also find that HSP90 modulation has both general and allele-specific effects on developmental stability; that is, developmental stability is a phenotypic trait that can be affected by natural variation. However, effects of revealed variation on trait means outweigh effects of decreased developmental stability, and the HSP90-dependent trait alterations could be acted on by natural selection. Thus, HSP90 may centrally influence canalization, assimilation, and the rapid evolutionary alteration of phenotype through the concealment and exposure of cryptic genetic variation.
Physiological and evolutionary adaptations operate at very different time scales. Nevertheless, there are reasons to believe there should be a strong relationship between the two, as together they modify the phenotype. Physiological adaptations change phenotype by altering certain microscopic parameters; evolutionary adaptation can either alter genetically these same parameters or others to achieve distinct or similar ends. Although qualitative discussions of this relationship abound, there has been very little quantitative analysis. Here, we use the hemoglobin molecule as a model system to quantify the relationship between physiological and evolutionary adaptations. We compare measurements of oxygen saturation curves of 25 mammals with those of human hemoglobin under a wide range of physiological conditions. We fit the data sets to the Monod-Wyman-Changeux model to extract microscopic parameters. Our analysis demonstrates that physiological and evolutionary change act on different parameters. The main parameter that changes in the physiology of hemoglobin is relatively constant in evolution, whereas the main parameter that changes in the evolution of hemoglobin is relatively constant in physiology. This orthogonality suggests continued selection for physiological adaptability and hints at a role for this adaptability in evolutionary change.
Biological signaling systems produce an output, such as the level of a phosphorylated protein, in response to defined input signals. The output level as a function of the input level is called the system's input-output relation. One may ask whether this input-output relation is sensitive to changes in the concentrations of the system's components, such as proteins and ATP. Because component concentrations often vary from cell to cell, it might be expected that the input-output relation will likewise vary. If this is the case, different cells exposed to the same input signal will display different outputs. Such variability can be deleterious in systems where survival depends on accurate match of output to input. Here we suggest a mechanism that can provide input-output robustness, that is, an input-output relation that does not depend on variations in the concentrations of any of the system's components. The mechanism is based on certain bacterial signaling systems. It explains how specific molecular details can work together to provide robustness. Moreover, it suggests an approach that can help identify a wide family of nonequilibrium mechanisms that potentially have robust input-output relations.
Myelination in the peripheral nervous system requires close contact between Schwann cells and the axon, but the underlying molecular basis remains largely unknown. Here we show that cell adhesion molecules (CAMs) of the nectin-like (Necl, also known as SynCAM or Cadm) family mediate Schwann cell-axon interaction during myelination. Necl4 is the main Necl expressed by myelinating Schwann cells and is located along the internodes in direct apposition to Necl1, which is localized on axons. Necl4 serves as the glial binding partner for axonal Necl1, and the interaction between these two CAMs mediates Schwann cell adhesion. The disruption of the interaction between Necl1 and Necl4 by their soluble extracellular domains, or the expression of a dominant-negative Necl4 in Schwann cells, inhibits myelination. These results suggest that Necl proteins are important for mediating axon-glia contact during myelination in peripheral nerves.
Diverse cellular processes are carried out by distinct integrin-mediated adhesions. Cell spreading and migration are driven by focal complexes; robust adhesion to the extracellular matrix by focal adhesions; and matrix remodeling by fibrillar adhesions. The mechanism(s) regulating the spatio-temporal distribution and dynamics of the three types of adhesion are unknown. Here, we combine live-cell imaging, labeling with phosphospecific-antibodies and overexpression of a novel tyrosine phosphomimetic mutant of paxillin, to demonstrate that the modulation of tyrosine phosphorylation of paxillin regulates both the assembly and turnover of adhesion sites. Moreover, phosphorylated paxillin enhanced lamellipodial protrusions, whereas non-phosphorylated paxillin was essential for fibrillar adhesion formation and for fibronectin fibrillogenesis. We further show that focal adhesion kinase preferentially interacted with the tyrosine phosphomimetic paxillin and its recruitment is implicated in high turnover of focal complexes and translocation of focal adhesions. We created a mathematical model that recapitulates the salient features of the measured dynamics, and conclude that tyrosine phosphorylation of the adaptor protein paxillin functions as a major switch, regulating the adhesive phenotype of cells.
We present a protocol to tag proteins expressed from their endogenous chromosomal locations in individual mammalian cells using central dogma tagging. The protocol can be used to build libraries of cell clones, each expressing one endogenous protein tagged with a fluorophore such as the yellow fluorescent protein. Each round of library generation produces 100-200 cell clones and takes about 1 month. The protocol integrates procedures for high-throughput single-cell cloning using flow cytometry, high-throughput cDNA generation and 3' rapid amplification of cDNA ends, semi-automatic protein localization screening using fluorescent microscopy and freezing cells in 96-well format.
A long term goal for molecular biologists is to visualize and quantify the levels and localizations of all proteins at the single cell level under endogenous regulation throughout time. Recent advances in protein tagging, microscopy, and image analysis have brought this goal much closer. But how to integrate these techniques to arrive at proteome scale results? Here I review one approach, incorporating random endogenous gene tagging, high-throughput incubated time- lapse microscopy, and automated image analysis, that can provide information on, for example, the accumulation rates of proteins throughout the cell cycle and the variability of protein level expression. Dynamic proteomics has the potential to shed light on many long standing questions and could contribute to challenging undertakings such as following signal transduction in a mammalian cell from input to output.
Protein expression is a stochastic process that leads to phenotypic variation among cells(1-6). The cell - cell distribution of protein levels in microorganisms has been well characterized(7-23) but little is known about such variability in human cells. Here, we studied the variability of protein levels in human cells, as well as the temporal dynamics of this variability, and addressed whether cells with higher than average protein levels eventually have lower than average levels, and if so, over what timescale does this mixing occur. We measured fluctuations over time in the levels of 20 endogenous proteins in living human cells, tagged by the gene for yellow fluorescent protein at their chromosomal loci(24). We found variability with a standard deviation that ranged, for different proteins, from about 15% to 30% of the mean. Mixing between high and low levels occurred for all proteins, but the mixing time was longer than two cell generations ( more than 40 h) for many proteins. We also tagged pairs of proteins with two colours, and found that the levels of proteins in the same biological pathway were far more correlated than those of proteins in different pathways. The persistent memory for protein levels that we found might underlie individuality in cell behaviour and could set a timescale needed for signals to affect fully every member of a cell population.
We examined cell cycle-dependent changes in the proteome of human cells by systematically measuring protein dynamics in individual living cells. We used time-lapse microscopy to measure the dynamics of a random subset of 20 nuclear proteins, each tagged with yellow fluorescent protein (YFP) at its endogenous chromosomal location. We synchronized the cells in silico by aligning protein dynamics in each cell between consecutive divisions. We observed widespread (40%) cell-cycle dependence of nuclear protein levels and detected previously unknown cell cycle-dependent localization changes. This approach to dynamic proteomics can aid in discovery and accurate quantification of the extensive regulation of protein concentration and localization in individual living cells.
Western harmony is comprised of sequences of chords, which obey grammatical rules. It is of interest to develop a compact representation of the harmonic movement of chord sequences. Here, we apply an approach from analysis of complex networks, known as "network motifs" to define repeating dynamical patterns in musical harmony. We describe each piece as a graph, where the nodes are chords and the directed edges connect chords which occur consecutively in the piece. We detect several patterns, each of which is a walk on this graph, which recur in diverse musical pieces from the Baroque to modern-day popular music. These patterns include cycles of three or four nodes, with up to two mutual edges (edges that point in both directions). Cliques and patterns with more than two mutual edges are rare. Some of these universal patterns of harmony are well known and correspond to basic principles of music theory such as hierarchy and directionality. This approach can be extended to search for recurring patterns in other musical components and to study other dynamical systems that can be represented as walks on graphs.
Understanding the dynamics and variability of protein circuitry requires accurate measurements in living cells as well as theoretical models. To address this, we employed one of the best-studied protein circuits in human cells, the negative feedback loop between the tumor suppressor p53 and the oncogene Mdm2. We measured the dynamics of fluorescently tagged p53 and Mdm2 over several days in individual living cells. We found that isogenic cells in the same environment behaved in highly variable ways following DNA-damaging gamma irradiation: some cells showed undamped oscillations for at least 3 days (more than 10 peaks). The amplitude of the oscillations was much more variable than the period. Sister cells continued to oscillate in a correlated way after cell division, but lost correlation after about 11 h on average. Other cells showed low-frequency fluctuations that did not resemble oscillations. We also analyzed different families of mathematical models of the system, including a novel checkpoint mechanism. The models point to the possible source of the variability in the oscillations: low-frequency noise in protein production rates, rather than noise in other parameters such as degradation rates. This study provides a view of the extensive variability of the behavior of a protein circuit in living human cells, both from cell to cell and in the same cell over time.
Can complex engineered and biological networks be coarse-grained into smaller and more understandable versions in which each node represents an entire pattern in the original network? To address this, we define coarse-graining units as connectivity patterns which can serve as the nodes of a coarse-grained network and present algorithms to detect them. We use this approach to systematically reverse-engineer electronic circuits, forming understandable high-level maps from incomprehensible transistor wiring: first, a coarse-grained version in which each node is a gate made of several transistors is established. Then the coarse-grained network is itself coarse-grained, resulting in a high-level blueprint in which each node is a circuit module made of many gates. We apply our approach also to a mammalian protein signal-transduction network, to find a simplified coarse-grained network with three main signaling channels that resemble multi-layered perceptrons made of cross-interacting MAP-kinase cascades. We find that both biological and electronic networks are "self-dissimilar," with different network motifs at each level. The present approach may be used to simplify a variety of directed and nondirected, natural and designed networks.
King [preceding Comment, Phys. Rev. E 70, 058101 (2004)) points out biases in one of the two common algorithms for generating simple random graphs-the matching. or stub-pairing.. algorithm. We clarify that in our simulations of simple graphs we used a different algorithm, the Markov-chain Monte Carlo switching algorithm, which is more uniform. As for multigraphs, the stub-pairing algorithm indeed samples uniformly configurations rather than multigraphs, as King points out, and thus is relevant for our model, which perians to configurations. Finally, we demonstrate that the algorithm we used to generate families of random networks with scale-free out-degree and compact in-degree does not result in noticeable biases.
Biological and technological networks contain patterns, termed network motifs, which occur far more often than in randomized networks. Network motifs were suggested to be elementary building blocks that carry out key functions in the network. It is of interest to understand how network motifs combine to form larger structures. To address this, we present a systematic approach to define "motif generalizations": families of motifs of different sizes that share a common architectural theme. To define motif generalizations, we first define "roles" in a subgraph according to structural equivalence. For example, the feedforward loop triad-a motif in transcription, neuronal, and some electronic networks-has three roles: an input node, an output node, and an internal node. The roles are used to define possible generalizations of the motif. The feedforward loop can have three simple generalizations, based on replicating each of the three roles and their connections. We present algorithms for efficiently detecting motif generalizations. We find that the transcription networks of bacteria and yeast display only one of the three generalizations, the multi-output feedforward generalization. In contrast, the neuronal network of C. elegans mainly displays the multi-input generalization. Forward-logic electronic circuits display a multi-input, multi-output hybrid. Thus, networks which share a common motif can have very different generalizations of that motif. Using mathematical modeling, we describe the information processing functions of the different motif generalizations in transcription, neuronal, and electronic networks.
Biological and engineered networks have recently been shown to display network motifs: a small set of characteristic patterns that occur much more frequently than in randomized networks with the same degree sequence. Network motifs were demonstrated to play key information processing roles in biological regulation networks. Existing algorithms for detecting network motifs act by exhaustively enumerating all subgraphs with a given number of nodes in the network. The runtime of such algorithms increases strongly with network size. Here, we present a novel algorithm that allows estimation of subgraph concentrations and detection of network motifs at a runtime that is asymptotically independent of the network size. This algorithm is based on random sampling of subgraphs. Network motifs are detected with a surprisingly small number of samples in a wide variety of networks. Our method can be applied to estimate the concentrations of larger subgraphs in larger networks than was previously possible with exhaustive enumeration algorithms. We present results for high-order motifs in several biological networks and discuss their possible functions.
Genes and proteins generate molecular circuitry that enables the cell to process information and respond to stimuli. A major challenge is to identify characteristic patterns in this network of interactions that may shed light on basic cellular mechanisms. Previous studies have analyzed aspects of this network, concentrating on either transcription-regulation or protein-protein interactions. Here we search for composite network motifs: characteristic network patterns consisting of both transcription-regulation and protein-protein interactions that recur significantly more often than in random networks. To this end we developed algorithms for detecting motifs in networks with two or more types of interactions and applied them to an integrated data set of protein-protein interactions and transcription regulation in Saccharomyces cerevisiae. We found a two-protein mixed-feedback loop motif, five types, of three-protein motifs exhibiting coregulation and complex formation, and many motifs involving four proteins. Virtually all four-protein motifs consisted of combinations of smaller motifs. This study presents a basic framework for detecting the building blocks of networks with multiple types of interactions.
Complex biological, technological, and sociological networks can be of very different sizes and connectivities, making it difficult to compare their structures. Here we present an approach to systematically study similarity in the local structure of networks, based on the significance profile (SP) of small subgraphs in the network compared to randomized networks. We find several superfamilies of previously unrelated networks with very similar SPs. One superfamily, including transcription networks of microorganisms, represents "rate-limited" information-processing networks strongly constrained by the response time of their components. A distinct superfamily includes protein signaling, developmental genetic networks, and neuronal wiring. Additional superfamilies include power grids, protein-structure networks and geometric networks, World Wide Web links and social networks, and word-adjacency networks from different languages.
Understanding the subgraph distribution in random networks is important for modeling complex systems. In classic Erdos networks, which exhibit a Poissonian degree distribution, the number of appearances of a subgraph G with n nodes and g edges scales with network size as similar toN(n-g). However, many natural networks have a non-Poissonian degree distribution. Here we present approximate equations for the average number of subgraphs in an ensemble of random sparse directed networks, characterized by an arbitrary degree sequence. We find scaling rules for the commonly occurring case of directed scale-free networks, in which the outgoing degree distribution scales as P(k)similar tok(-gamma). Considering the power exponent of the degree distribution, gamma, as a control parameter, we show that random networks exhibit transitions between three regimes. In each regime, the subgraph number of appearances follows a different scaling law, similar toN(alpha) , where alpha=n-g+s-1 for gammagamma(c), where s is the maximal outdegree in the subgraph, and gamma(c)=s+1. We find that certain subgraphs appear much more frequently than in Erdos networks. These results are in very good agreement with numerical simulations. This has implications for detecting network motifs, subgraphs that occur in natural networks significantly more than in their randomized counterparts.
Complex networks are studied across many fields of science. To uncover their structural design principles, we defined network motifs, patterns of interconnections occurring in complex networks at numbers that are significantly higher than those in randomized networks. We found such motifs in networks from biochemistry, neurobiology, ecology, and engineering. The motifs shared by ecological food webs were distinct from the motifs shared by the genetic networks of Escherichia coli and Saccharomyces cerevisiae or from those found in the World Wide Web. Similar motifs were found in networks that perform information processing, even though they describe elements as different as biomolecules within a cell and synaptic connections between neurons in Caenorhabditis elegans. Motifs may thus define universal classes of networks. This approach may uncover the basic building blocks of most networks.
Little is known about the design principles(1-10) of transcriptional regulation networks that control gene expression in cells. Recent advances in data collection and analysis(2,11,12), however, are generating unprecedented amounts of information about gene regulation networks. To understand these complex wiring diagrams(1-10,13), we sought to break down such networks into basic building blocks(2). We generalize the notion of motifs, widely used for sequence analysis, to the level of networks. We define 'network motifs' as patterns of interconnections that recur in many different parts of a network at frequencies much higher than those found in randomized networks. We applied new algorithms for systematically detecting network motifs to one of the best-characterized regulation networks, that of direct transcriptional interactions in Escherichia coli(3,6). We find that much of the network is composed of repeated appearances of three highly significant motifs. Each network motif has a specific function in determining gene expression, such as generating temporal expression programs and governing the responses to fluctuating external signals. The motif structure also allows an easily interpretable view of the entire known transcriptional network of the organism. This approach may help define the basic computational elements of other biological networks.