6.S: Genetic Basis of Complexity (References) - Biology

6.S: Genetic Basis of Complexity (References) - Biology

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

6.S: Genetic Basis of Complexity (References)

Genetic Complexity of Sinoatrial Node Dysfunction

The pacemaker cells of the cardiac sinoatrial node (SAN) are essential for normal cardiac automaticity. Dysfunction in cardiac pacemaking results in human sinoatrial node dysfunction (SND). SND more generally occurs in the elderly population and is associated with impaired pacemaker function causing abnormal heart rhythm. Individuals with SND have a variety of symptoms including sinus bradycardia, sinus arrest, SAN block, bradycardia/tachycardia syndrome, and syncope. Importantly, individuals with SND report chronotropic incompetence in response to stress and/or exercise. SND may be genetic or secondary to systemic or cardiovascular conditions. Current management of patients with SND is limited to the relief of arrhythmia symptoms and pacemaker implantation if indicated. Lack of effective therapeutic measures that target the underlying causes of SND renders management of these patients challenging due to its progressive nature and has highlighted a critical need to improve our understanding of its underlying mechanistic basis of SND. This review focuses on current information on the genetics underlying SND, followed by future implications of this knowledge in the management of individuals with SND.

Keywords: GIRK4 HCN4 Nav1.5 atrial fibrillation calsequestrin-2 genetics sick sinus syndrome sinoatrial node dysfunction.

Copyright © 2021 Wallace, El Refaey, Mesirca, Hund, Mangoni and Mohler.

Conflict of interest statement

The reviewer HZ declared a past co-authorship with the authors PM and MM to the handling editor. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

What causes the continuous distribution of phenotypes for quantitative traits?

The continuous variation for complex traits is due to genetic complexity and environmental sensitivity. Genetic complexity arises from segregating alleles at multiple loci. The effect of each of these alleles on the trait phenotype is often relatively small, and their expression is sensitive to the environment. Allelic effects can also depend on genetic background and sex. Because of this complexity, many genotypes can give rise to the same phenotype, and the same genotype can have different phenotypic effects in different environments. Thus, there is no clear relationship between genotype and phenotype.

The genetic basis for the evolution of soma: mechanistic evidence for the co-option of a stress-induced gene into a developmental master regulator

In multicellular organisms with specialized cells, the most significant distinction among cell types is between reproductive (germ) cells and non-reproductive/somatic cells (soma). Although soma contributed to the marked increase in complexity of many multicellular lineages, little is known about its evolutionary origins. We have previously suggested that the evolution of genes responsible for the differentiation of somatic cells involved the co-option of life history trade-off genes that in unicellular organisms enhanced survival at a cost to immediate reproduction. In the multicellular green alga, Volvox carteri, cell fate is established early in development by the differential expression of a master regulatory gene known as regA. A closely related RegA-Like Sequence (RLS1) is present in its single-celled relative, Chlamydomonas reinhardtii. RLS1 is expressed in response to stress, and we proposed that an environmentally induced RLS1-like gene was co-opted into a developmental pathway in the lineage leading to V. carteri. However, the exact evolutionary scenario responsible for the postulated co-option event remains to be determined. Here, we show that in addition to being developmentally regulated, regA can also be induced by environmental cues, indicating that regA has maintained its ancestral regulation. We also found that the absence of a functional RegA protein confers increased sensitivity to stress, consistent with RegA having a direct or indirect role in stress responses. Overall, this study (i) provides mechanistic evidence for the co-option of an environmentally induced gene into a major developmental regulator, (ii) supports the view that major morphological innovations can evolve via regulatory changes and (iii) argues for the role of stress in the evolution of multicellular complexity.

Keywords: Volvox carteri co-option development evolution of soma regA stress.


Novel thermal environment induces a rewiring of metabolic regulation

Temperature is a major factor modulating the expression of numerous genes in ectotherms and is particularly well studied in Drosophila [19, 20, 39]. Our experimental populations which evolved in a novel hot thermal environment displayed highly significant differences in gene expression involving many genes of well-defined pathways. Of particular interest were genes which were down-regulated in the hot-evolved populations, because they suggest a global down-regulation of energy production in hot-evolved flies, affecting glycolysis, TCA cycle, and oxidative phosphorylation pathways. Interestingly, a highly replicated study in Escherichia coli found that RNA polymerase was the most frequently targeted gene across replicates, resulting in a lower rate of protein synthesis [40], providing further evidence that an important evolutionary response to hot environments is to reduce the increase in energy production and protein synthesis, which is increased in hot environments and probably imposes a significant cost.

Consistent with modified metabolic rewiring of the hot-evolved populations, we found significant differences in CO2 production relative to the ancestral and cold-evolved control populations (see Fig. 2e Additional file 1: Figure S3 and Supplementary Methods and Results). Contrary to naïve expectations, CO2 production was higher in the hot-evolved flies. Nevertheless, resting metabolism and gene expression are measured at two different moments of the daily cycle of the evolving populations, suggesting that the link between gene expression and energy production might not be straightforward. Additionally, higher CO2 production in hot-evolved flies is consistent with increasing O2 consumption associated with decreased AMPK activity [41]. Further insights into this counter-intuitive pattern of CO2 consumption come from a metabolomic analysis of D. melanogaster under a wide range of developmental temperatures [42]. At extreme temperatures the flies were depleted of sugars and energy metabolites (NAD+, NADP+, and AMP), which is attributed to their inability to maintain cellular homeostasis. If the hot conditions of our experiment have the same effect, flies not evolved to this environment may also be depleted of sugars and energy metabolites. In response, enzymes in the glycolysis, TCA cycle, and oxidative phosphorylation pathways could be up-regulated. Hot-evolved flies may have acquired the ability to maintain cellular homeostasis at high temperatures, allowing a higher resting metabolism without up-regulation of the metabolic pathway genes.

Our results contrast a recent study where CO2 production was conserved among D. melanogaster populations which evolved in different thermal environments [43]. With several experimental details differing between the studies (isofemale lines vs. pools of outbred individuals, 20 min measurements during the day vs. resting metabolism overnight), the interpretation of this apparent discrepancy is difficult. Nevertheless, it aligns well with the general controversy about the effect of temperature on the evolution of metabolism [44]. We conclude that the consistent differences in CO2 production between ancestral and evolved populations provide strong evidence of temperature-specific evolution of metabolism regulation but also indicate that the underlying physiological changes are more complex.

AMPK explains the phenotypic changes observed in hot-evolved populations

Based on the genomic analyses alone, it is not possible to rule out other genes in the Sestrin peak as targets of selection, or three other small genes that overlap with the selection signature of SNF4Aγ (Additional file 2: Table S3). In combination with the expression data, however, the role of SNF4Aγ and Sestrin as the primary drivers of the metabolic rewiring becomes evident. Sestrin modulates the phosphorylation rate of AMP-activated protein kinase (AMPK) [45], which is composed of SNF4Aγ and two other subunits. AMPK is a key player in energy homeostasis at the cellular and the organismal levels, and both SNF4Aγ and Sestrin are directly linked to AMPK activity [45,46,47]. Low levels of ATP result in the activation of AMPK, which causes up-regulation of glycolysis and biogenesis of mitochondria [48]. Furthermore, energetically costly pathways, such as fatty acid production and gluconeogenesis, are down-regulated by AMPK [49]. Inactivation of AMPK causes down-regulation of glycolysis and up-regulation of anabolic pathways such as fatty acid production, which were both seen in our data. Interestingly, Pfk, the target enzyme for AMPK in glycolysis, is the first down-regulated enzyme of the glycolysis pathway in our data set (Fig. 2a, blue arrow). In D. melanogaster, RNA interference-mediated down-regulation of SNF4Aγ increases glucose content of muscles and the fat body [50] and induces starvation behavior [41]. Some of the genes of the insulin receptor signaling pathway were also differentially expressed in the hot-evolved populations (Ilp6, InR, see Additional file 1: Figure S1). Moreover, some key enzymes involved in fatty acid production (ACCoAs, ACC and FASN2, Desat1, CG30008, CG33110, CG18609 see Additional file 1: Figure S1) also show also signal of up-regulation, consistent with the direct inhibition of ACC by AMPK [51]. Increased temperatures and heat stress deplete fat storage in D. melanogaster [52] by invoking apoptosis in the fat body — a process dependent on SNF4Aγ [53] that links the starvation-like expression pattern observed here to temperature adaptation. Sestrin is also connected with autophagy regulation in Drosophila, through its role in activating AMPK [54, 55].

Thus, our results indicate that the activity of the key metabolic regulator AMPK is modulated through the differential regulation of the subunit SNF4Aγ and interacting gene Sestrin in hot-evolved populations. Given the central role of SNF4Aγ and Sestrin for temperature-dependent metabolic rewiring, we reasoned that both genes should vary along temperature clines in natural populations. While we did not find evidence for clinality of Sestrin, the patterns for SNF4Aγ matched our expectations. A whole genome polymorphism analysis identified SNF4Aγ as one of the top candidates in clinal North American D. melanogaster populations [22]. Clinal and seasonal variation of SNF4Aγ in D. melanogaster and D. simulans further implicate temperature as an adaptive driver [21, 24]. Reanalyzing clinal population genetic data [23], SNF4Aγ is among the 603 most differentiated genes shared by North American and Australian D. simulans populations. Gene expression of SNF4Aγ is clinal in European D. subobscura populations, with southern populations having lower expression levels [19], which parallels the response observed in our experimental evolution populations. Because the selected haplotype block may be partially maintained in other populations, we tested the diagnostic SNPs for clinal variation. Remarkably, populations from the extreme ends of the North American cline exhibit a clinal signal for the diagnostic SNPs. Nevertheless, the signal was mixed for less extreme populations.

Large-effect loci segregating at intermediate allele frequencies drive rapid evolution

The combined analysis of transcriptomic and whole genome resequencing data of a freshly collected D. simulans population evolving in a new thermal environment identified two genes, both connected to AMPK, a central metabolic switch. While many possibilities exist as to how metabolism could be regulated, the strong selection response in all replicates suggests that two major-effect loci are driving the adaptive metabolic response in our populations. The observed selection signature clearly indicates that adaptation in our E&R study [27] is dominated by a small number of loci with strong effect, providing another example for rapid adaptation driven by a few major-effect loci [12,13,14,15,16].

The two haplotypes driving the metabolic switch in our experimental populations segregate at intermediate frequencies in the founder population and show clinal variation. Thus, it is highly plausible that these genes contribute to similar adaptive processes in natural populations, which probably occur over very short time scales. Because temperature varies seasonally, it is possible that spatial and temporal heterogeneity maintains the selected alleles at intermediate frequency in D. simulans [10, 24].

The fact that few large-effect loci resulted in a clear selection signature in our experiment does, however, not preclude that several minor effect loci also influence the metabolic rewiring in hot environments. Yet, our computer simulations suggest that these two loci probably explain more than 50% of the phenotypic change, even when minor effect loci are also contributing (Fig. 4). Previously, it had been shown that major-effect alleles contributing to quantitative traits show the fastest selection response, but with an increasing number of generations these loci are out-competed because small-effect alleles gradually increase in frequency [56]. The reason for the loss of the large-effect alleles is that it is easier to obtain genotypes close to the fitness optimum with small-effect alleles, while large-effect alleles could cause overshooting, resulting in more extreme phenotypes than favored by selection. Hence, the analysis of these experimental populations after a longer time interval could be very informative to understand the dynamics of adaptive alleles in natural populations.

With the favored allele being fixed or close to fixation in southern populations in the USA, it would be interesting to study the adaptive response in these populations. Because AMPK will probably not further contribute to adaptation, such an experiment could reveal other adaptive signals that were not detected in this study. Would such populations be segregating for other major alleles or would a polygenic response detected?

Next steps

Experimental evolution provides an excellent framework for experimental testing of selected alleles. Allelic replacements with the CRISPR/Cas9 technology enable the direct comparison of selected and non-selected alleles in an otherwise homogeneous genetic background. Nevertheless, the mapping resolution in our study is still rather low. Replacing a genomic region of > 10 kb in D. simulans, a species with lower transformation efficiency than D. melanogaster, is extremely challenging. Thus, the next steps would require some further fine mapping of the target of selection. We anticipate that adding chromosomes without the selected alleles during an extended experimental evolution will provide more opportunity in recombination to obtain a smaller candidate region. Once sufficiently small candidate regions are cloned, many follow-up experiments are conceivable, ranging from competition experiments of selected and non-selected alleles in an experimental evolution setting to detailed biochemical comparisons using metabolomics, transcriptomics, and proteomics.


To identify genetic variants that affect susceptibility of a variety of diseases, genome-wide association studies (GWAS) genotype a dense set of common SNPs (Single Nucleotide Polymorphism) and test allelic frequencies among a cohort of affected people and non-affected people [1]. Traditional analysis methods for GWAS data only consider one SNP at a time and test its association with disease. This type of analysis strategy is only suitable for simple Mendelian disorders. Some common complex diseases such as various types of cancers, cardiovascular disease, and diabetes are influenced by multiple genetic variants. Therefore, detecting high-order epistasis, which refers to the interactive effect of two or more genetic variants on complex human diseases, can help to unravel how genetic risk factors confer susceptibility to complex diseases [2]. However, the very large number of SNPs checked in a typical GWAS and the enormous number of possible SNP combinations make detecting high-order epistatic interactions from GWAS data computationally challenging [3]. Moreover, how to measure the association between a set of SNPs and the phenotype presents another grand statistical challenge.

During the past decade, two types of heuristic computational methods have been proposed to detect epistatic interactions: prediction/classification-based methods and association-based methods. Prediction/classification-based methods try to find the best set of SNPs, which can generate the highest prediction/classification accuracy including, for example, multifactor dimensionality reduction (MDR) [4], penalized logistic regression (e.g., stepPLR [5], and lassoPLR [6]), support vector machine (SVM) [7], and random forest [8]. MDR is a non-parametric and model-free method based on constructing a risk table for every SNP combination [4]. If the case and control ratio in a cell of this risk table is larger than 1, MDR will label it as "high risk", otherwise, "low risk". By the risk table, MDR can predict disease risk and will select the SNP combination with the highest prediction accuracy. StepPLR and lassoPLR make some modifications to avoid the overfitting problems that standard logistic regression methods suffer from [9] when detecting epistatic interactions. For example, stepPLR combines the logistic regression criterion with a penalization of the L2-norm of the coefficients. This modification makes stepPLR more robust to high-order epistatic interactions [5]. Two machine learning methods: SVM [7] and random forest [8] have also been applied to detecting epistatic interactions. Machine learning methods are based on binary classification (prediction) and treat cases as positives and controls as negatives in SNP data. They use SVM or random forest as a predictor and select a set of SNPs with the highest prediction/classification accuracy by feature selection. Some prediction/classification-based methods can only be applied to small-scale analysis (i.e., a small set of SNPs) due to their computational complexity. Moreover, almost all prediction/classification-based methods tend to introduce a large number of false positives, which may result in a huge cost for further biological validation experiments [10].

Bayesian epistasis association mapping (BEAM) is a scalable and association-based method [11]. It partitions SNPs into three groups: group 0 is for normal SNPs, group 1 contains disease SNPs affecting disease risk independently, and group 2 contains disease SNPs that jointly contribute to the disease risk (interactions). Given a fixed partition, BEAM can get the posterior probability of this partition from SNP data based on Bayesian theory. A Markov Chain Monte Carlo method is used to reach the optimal SNP partition with maximum posterior probability in BEAM. One drawback of BEAM is that identifying both single disease SNP and SNP combinations simultaneously makes BEAM over-complex and weakens its power.

Recently, we propose a new Markov blanket-based method, DASSO-MB, to detect epistatic interactions in case-control studies [10]. The Markov Blanket is a minimal set of variables, which can completely shield the target variable from all other variables based on Markov condition property [12]. Thus, Markov blanket methods can detect the causal disease SNPs with the fewest false positives. Furthermore, the heuristic search strategy in Markov blanket methods can avoid the time-consuming training process as in SVM and random forests. However, the faithfulness assumption in Markov blanket methods, which can hardly always be ensured, may hinder their applications in detecting epistatic interactions [13].

In this paper, we address the two critical challenges (small sample sizes and high dimensionality) in epistatic interaction detection by introducing a score-based Bayesian network structure learning method, EpiBN (Epistatic interaction detection using Bayesian Network model), which employs a Branch-and-Bound technique and a new scoring function. Bayesian networks provide a succinct representation of the joint probability distribution and conditional independence among a set of variables. In general, a score-based structure learning method for Bayesian networks first defines a scoring function reflecting the fitness between each possible structure and the observed data, and then searches for a structure with the maximum score. Comparing to Markov blanket methods, the merits of applying score-based Bayesian network structure learning method to epistatic interaction detection include: (1) the faithfulness assumption can be relaxed and (2) heuristic search method can solve the classical XOR (Exclusive or) problem [14]. We apply the EpiBN method to simulated datasets based on four disease models and three real datasets: Age-related Macular Degeneration (AMD) dataset, late-onset Alzheimer's disease (LOAD) dataset, and autism dataset. We demonstrate that the proposed method outperforms some commonly-used methods such as SVM, MDR, and BEAM, especially when the number of samples is small.


Our population epigenetic results, obtained in the setting of an innate immunity cell population, demonstrate extensive differences in DNA methylation profiles between two populations that differ in their genetic ancestry but share the same present-day environment. Such population differences were observed at the epigenome-wide level (explaining

12% of the total variance in DNA methylation) and involved 12,050 sites that were mostly located in genes with functions related to cell periphery or immune response regulation. Previous studies have searched for ancestry-related differences in DNA methylation in various human populations and cell types [16, 38,39,40,41, 43, 95]. Although comparisons across studies are complicated by differences in experimental settings and statistical thresholds used to detect ancestry-associated CpG sites, these range from 299 between Caucasian- and Asian/mixed-descent individuals living in Canada [16] to 36,897 between European CEU and African YRI [39]. An interesting insight that can be drawn from our analyses is that genes involved in the activation and regulation of immune responses tend to present higher levels of DNA methylation in individuals of European ancestry, with respect to those of African ancestry, mostly owing to genetic control. That up to 16% of immune-related genes that are hyper-methylated in Europeans are also differentially expressed between populations [48] could provide a mechanistic explanation for the ancestry-related differences in transcriptional responses to bacteria reported in macrophages, where European ancestry is associated with lower inflammatory responses [49].

Although variation in past environmental exposures and socioeconomic factors may contribute to population differences in DNA methylation, we found that 70% of differentially methylated sites between African and European ancestry groups were associated with at least one meQTL. This indicates that population differences in DNA methylation are mostly driven by DNA sequence variants [38, 40,41,42]. In some cases, a single genetic variant can account for important population differences at multiple CpG sites, as attested by the trans-meQTL we detected at CTCF, whose local genetic variation has been shown to alter distant DNA methylation patterns in whole blood [65]. We show that a CTCF variant (rs7203742) regulates DNA methylation of 30 distant CpGs, 40% of which are differentially methylated between populations. We also found that all CTCF trans-regulated CpGs fall within a TFBS, confirming our initial hypothesis about the mechanism by which a genetic variant might alter DNA methylation at a distant CpG site. Interestingly, 9 out of the 30 CTCF trans-regulated CpGs fall within a TFBS of CTCF, while the remaining 21 fall within a TFBS specific to other TFs such as YY1, ESR1, or ZNF143. This observation is consistent with a model of pioneer transcription factor activity [96] and suggests that CTCF acts as a pioneer factor that will generate changes in chromatin state that, in turn, will become accessible for binding of secondary factors.

At the genome-wide level, we find that the quantitative impact of DNA methylation on gene expression variation is lower than that reported by some previous studies, possibly reflecting differences in experimental settings and statistical power (e.g., cell types and sample sizes) [23, 65, 84, 89]. For example, a study of 204 healthy newborns detected substantial variation across tissues in the number of genes whose expression levels were associated with DNA methylation, ranging from 596 in fibroblasts to 3838 in T cells [23]. We detected, at the non-stimulated state, 811 eQTM-genes (6% of the total number of expressed genes), a figure that drops to 230 for reQTM-genes across stimulation conditions. However, a limitation of our study is that we measured DNA methylation at the basal state, while gene expression was obtained after 6 h. Studies including a more comprehensive range of epigenetic marks obtained at different time points—in different cell types and tissues originating from individuals of various ancestries—are needed to more precisely understand the interplay between these regulatory elements and quantify their respective roles in the regulation of transcriptional activity.

The detected eQTMs were found to be drastically enriched in genetic control (OR

33.2, P < 1 × 10 −326 , Fig. 3c), which highlights the coordinated action of genetic and epigenetic factors in driving gene expression variation but raises questions about the causal role of DNA methylation [56]. Despite cautious interpretation of causality in mediation analyses is required [97], our analysis provides a first estimate of the potential direct role of DNA methylation in regulating transcriptional activity, in both resting and stimulated monocytes. At the non-stimulated state, we find that

20% of eQTM-genes show evidence of a causal mediation effect of DNA methylation. Although a similar extent of mediation was found upon immune stimulation (

17%), we detected specific patterns upon treatment with viral challenges, where a higher occurrence of positive associations was observed among mediated cases. These findings mostly reflected cases where high levels of DNA methylation were associated with low gene expression in the non-stimulated condition, thus requiring stronger responses to reach high levels of gene expression upon cell perturbation. These trends suggest a major, direct, and context-specific role of DNA methylation in the regulation of immune responses, whose complexity requires further investigation.

Finally, we found that meQTLs, in particular those associated with ancestry-related differences, are enriched in GWAS hits related to immune disorders. This suggests that DNA methylation has an important impact on the cellular activity of monocytes and ultimately affect phenotypic outcomes. Nonetheless, a large fraction of the variance of DNA methylation and gene expression remains unexplained. Additional work is needed to quantify the relative impact of genetic, epigenetic, environmental, and lifestyle factors in driving variation of DNA methylation and gene expression, both in resting and stimulated cells. Furthermore, although the causal mediation analyses presented in this study reinforce the notion that DNA methylation can play a direct role in regulating gene expression in humans [23, 98], monitoring the kinetics of variation in DNA methylation and gene expression after exposure to different infectious agents will broaden our understanding of the interplay between these molecular phenotypes and their impact on endpoint phenotypes.


With the combination of biochemistry, muscle cell biology, and the use of modeling animals, molecular genetic-based information has shed light on understanding the pathomechanism of muscular dystrophies. As reviewed, primary genetic or the secondary functional disruption of the matrix–plasma membrane linkage appears to be the cause of several forms of muscular dystrophies. The mechanical property of skeletal muscle is maintained by contractile elements and elastic elements, which are provided by the sarcomere and the extracellular matrix, respectively. In addition, lateral transmission of the contractile force is mediated by matrix–receptor interaction. Thus, based on the fact that the matrix–cytoskeleton linkage is a key for maintenance of skeletal muscle function, several therapeutic strategies have been proposed as introduced in this review and by others. Hopefully, such approaches will contribute to a greater understanding of the disease etiology and lead to appropriate therapeutic strategies to treat muscular dystrophies.


PCWDEs originally obtained by beetles from bacteria and fungi via HGT enabled efficient symbiont-independent digestion of plant biomass, the most abundant source of carbohydrates on Earth. We propose that this key innovation facilitated the evolution of uniquely specialized plant-feeding habits, such as leaf and seed mining and stem and wood boring, and likely also some forms of specialized fungus feeding, for example fungus farming in Platypodinae and Scolytinae (67). While this remains uncertain, the appearance and expansions of putative PCWDEs and invertases in beetle genomes are correlated with significant increases in diversification rate among specialized herbivorous beetles (Buprestoidea and Phytophaga). Our findings may help explain the disparity in the degree of feeding specialization and species richness observed among groups of herbivorous beetles possessing or lacking a diverse repertoire of PCWDEs, as well as the existence of groups of beetles that feed on plants (notably including angiosperms) but are not unusually species-rich. PCWDEs originally obtained via HGT likely played an important role in the adaptive radiation of other groups of herbivorous insects, for example certain Lepidoptera and Hemiptera, which have at least some of these gene families (8, 14, 68).

The extraordinary diversity of beetles thus appears to have resulted from multiple factors, including a low rate of lineage extinction over a long evolutionary history (2, 5), codiversification with angiosperms (2), and the adaptive radiation of specialized herbivorous beetles following convergent horizontal transfers (and “domestication”) of microbial genes encoding PCWDEs. More broadly, our findings show how large-scale genomic data can reveal new insights into the evolution and genomic basis of insect biodiversity and underscore the intimacy and complexity of the relationships between insects, plants, and microorganisms, as well as their concerted roles in the “origins of terrestrial organic diversity” (60).


We thank Dr. Karsten Zengler and Marc Abrams for reviewing the manuscript and providing constructive suggestions. This work was supported by NIH Grants AI124316 and GM057089, and Novo Nordisk Foundation Grant NNF10CC1016517. We are grateful to Drs. Rebecca Lindsey, Nancy Strockbine, Shi Chen, Sang Jun Lee, Dana Boyd, Mehmet Berkmen, Henning Sørum, David Rozak, Shannon Lyn Johnson, Craig Winstanely, Roger Johnson, and Weihua Huang for generously providing bacterial strains for this study.