Transcription Factor expression level

Transcription Factor expression level

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Could someone explain me in detail what is meant by "TF expression LEVEL", what does it represent and how is it measured? This is for example used in figure 3 of this paper: A census of human transcription factors: function, expression and evolution.

Transcription factors are an important cellular regulatory mechanism to control gene expression. The expression of TF is important for cell function, differentiation and also for responses to environmental signals.

We know quite a number of different TF, but only for a little part of them their exact function (in terms of which in which tissues they are expressed, when and which genes they regulate) and this is where the paper you read comes in.

They test for the expression of roughly 1400 TF in a number of different tissues, to see where they are expressed (figure 3 from the paper):

The group used micro-arrays to analyze the expression levels (in this case how much transcript from the genes is made assuming that this also translated). The quantification was done using the signal strength of the array - the more transcript you have, the stronger the signal is.

The expression of a certain TF in a specific tissues shows its involvement in gene regulation (either direct or indirect) there.

2.10: Regulation of Gene Expression

Regulation is all about decision making. Gene regulation is, therefore, all about understanding how cells make decisions about which genes to turn on, turn off or to tune up or tune down. In the following section we discuss some of the fundamental mechanisms and principles used by cells to regulate gene expression in response to changes in cellular or external factors. This biology is important for understanding how cells adjust changing environments, including how some cells, in multicellular organisms, decide to become specialized for certain functions (e.g. tissues).

Since the subject of regulation is both a very deep and broad topic of study in biology, in Bis2a we don't try to cover every detail - there are simply too many. Rather, as we have done for all other topics, we try to focus on (a) outlining some of the core logical constructs and questions that you must have when you approach ANY scenario involving regulation, (b) learning some common vocabulary and ubiquitous mechanisms and (c) examining a few concrete examples that illustrate the points made in a and b.

Synthetic biology technologies for beta cell generation

Synthetic messenger RNAs

Synthetic messenger RNAs (Syn-mRNAs) are the most robust and efficient tools to generate iPSCs from somatic cells and outperform alternative integrative and non-integrative DNA-based reprogramming methods. 97 Recently, they have been used to differentiate human pancreatic ductal cells into glucose-sensitive insulin-secreting cells. 98 Corritore et al. 98 demonstrated that transfection of synthetic mRNA encoding for the transcription factor MafA alone can reprogram adult pancreatic ductal cells into insulin-secreting cells. However, insulin and somatostatin co-expressing cells were also generated, and the glucose-responsive insulin secretion was not optimal. 98 The dosage and timing of the transcription factor expression can be controlled by using synthetic RNAs, which can aid the differentiation of beta cells from closely related cell types as well as PSCs.

Intriguingly, synthetic mRNA-based technology has also been used to efficiently sort and enrich desired cell types, including pancreatic beta-like cells, in vitro. 99 Miki et al. 99 developed a synthetic mRNA-based switch purification system utilizing microRNAs (miRNAs) that could sort a heterogeneous iPSC-derived differentiated population into a homogeneous one ( Fig. 3 B). In particular, the authors developed an mRNA-based switch for miRNA-375 and were able to sort exclusively insulin-producing cells derived from iPSCs. Therefore, syn-mRNAs hold enormous potential for controlling critical cell-fate decisions and for the purification of desired cell types from a heterogeneous iPSC-derived population.


Transcription factors are essential for the regulation of gene expression and are, as a consequence, found in all living organisms. The number of transcription factors found within an organism increases with genome size, and larger genomes tend to have more transcription factors per gene. [12]

There are approximately 2800 proteins in the human genome that contain DNA-binding domains, and 1600 of these are presumed to function as transcription factors, [3] though other studies indicate it to be a smaller number. [13] Therefore, approximately 10% of genes in the genome code for transcription factors, which makes this family the single largest family of human proteins. Furthermore, genes are often flanked by several binding sites for distinct transcription factors, and efficient expression of each of these genes requires the cooperative action of several different transcription factors (see, for example, hepatocyte nuclear factors). Hence, the combinatorial use of a subset of the approximately 2000 human transcription factors easily accounts for the unique regulation of each gene in the human genome during development. [11]

Transcription factors bind to either enhancer or promoter regions of DNA adjacent to the genes that they regulate. Depending on the transcription factor, the transcription of the adjacent gene is either up- or down-regulated. Transcription factors use a variety of mechanisms for the regulation of gene expression. [14] These mechanisms include:

  • stabilize or block the binding of RNA polymerase to DNA
  • catalyze the acetylation or deacetylation of histone proteins. The transcription factor can either do this directly or recruit other proteins with this catalytic activity. Many transcription factors use one or the other of two opposing mechanisms to regulate transcription: [15]
      (HAT) activity – acetylates histone proteins, which weakens the association of DNA with histones, which make the DNA more accessible to transcription, thereby up-regulating transcription (HDAC) activity – deacetylates histone proteins, which strengthens the association of DNA with histones, which make the DNA less accessible to transcription, thereby down-regulating transcription
  • Transcription factors are one of the groups of proteins that read and interpret the genetic "blueprint" in the DNA. They bind to the DNA and help initiate a program of increased or decreased gene transcription. As such, they are vital for many important cellular processes. Below are some of the important functions and biological roles transcription factors are involved in:

    Basal transcription regulation Edit

    In eukaryotes, an important class of transcription factors called general transcription factors (GTFs) are necessary for transcription to occur. [17] [18] [19] Many of these GTFs do not actually bind DNA, but rather are part of the large transcription preinitiation complex that interacts with RNA polymerase directly. The most common GTFs are TFIIA, TFIIB, TFIID (see also TATA binding protein), TFIIE, TFIIF, and TFIIH. [20] The preinitiation complex binds to promoter regions of DNA upstream to the gene that they regulate.

    Differential enhancement of transcription Edit

    Other transcription factors differentially regulate the expression of various genes by binding to enhancer regions of DNA adjacent to regulated genes. These transcription factors are critical to making sure that genes are expressed in the right cell at the right time and in the right amount, depending on the changing requirements of the organism.

    Development Edit

    Many transcription factors in multicellular organisms are involved in development. [21] Responding to stimuli, these transcription factors turn on/off the transcription of the appropriate genes, which, in turn, allows for changes in cell morphology or activities needed for cell fate determination and cellular differentiation. The Hox transcription factor family, for example, is important for proper body pattern formation in organisms as diverse as fruit flies to humans. [22] [23] Another example is the transcription factor encoded by the sex-determining region Y (SRY) gene, which plays a major role in determining sex in humans. [24]

    Response to intercellular signals Edit

    Cells can communicate with each other by releasing molecules that produce signaling cascades within another receptive cell. If the signal requires upregulation or downregulation of genes in the recipient cell, often transcription factors will be downstream in the signaling cascade. [25] Estrogen signaling is an example of a fairly short signaling cascade that involves the estrogen receptor transcription factor: Estrogen is secreted by tissues such as the ovaries and placenta, crosses the cell membrane of the recipient cell, and is bound by the estrogen receptor in the cell's cytoplasm. The estrogen receptor then goes to the cell's nucleus and binds to its DNA-binding sites, changing the transcriptional regulation of the associated genes. [26]

    Response to environment Edit

    Not only do transcription factors act downstream of signaling cascades related to biological stimuli but they can also be downstream of signaling cascades involved in environmental stimuli. Examples include heat shock factor (HSF), which upregulates genes necessary for survival at higher temperatures, [27] hypoxia inducible factor (HIF), which upregulates genes necessary for cell survival in low-oxygen environments, [28] and sterol regulatory element binding protein (SREBP), which helps maintain proper lipid levels in the cell. [29]

    Cell cycle control Edit

    Many transcription factors, especially some that are proto-oncogenes or tumor suppressors, help regulate the cell cycle and as such determine how large a cell will get and when it can divide into two daughter cells. [30] [31] One example is the Myc oncogene, which has important roles in cell growth and apoptosis. [32]

    Pathogenesis Edit

    Transcription factors can also be used to alter gene expression in a host cell to promote pathogenesis. A well studied example of this are the transcription-activator like effectors (TAL effectors) secreted by Xanthomonas bacteria. When injected into plants, these proteins can enter the nucleus of the plant cell, bind plant promoter sequences, and activate transcription of plant genes that aid in bacterial infection. [33] TAL effectors contain a central repeat region in which there is a simple relationship between the identity of two critical residues in sequential repeats and sequential DNA bases in the TAL effector's target site. [34] [35] This property likely makes it easier for these proteins to evolve in order to better compete with the defense mechanisms of the host cell. [36]

    It is common in biology for important processes to have multiple layers of regulation and control. This is also true with transcription factors: Not only do transcription factors control the rates of transcription to regulate the amounts of gene products (RNA and protein) available to the cell but transcription factors themselves are regulated (often by other transcription factors). Below is a brief synopsis of some of the ways that the activity of transcription factors can be regulated:

    Synthesis Edit

    Transcription factors (like all proteins) are transcribed from a gene on a chromosome into RNA, and then the RNA is translated into protein. Any of these steps can be regulated to affect the production (and thus activity) of a transcription factor. An implication of this is that transcription factors can regulate themselves. For example, in a negative feedback loop, the transcription factor acts as its own repressor: If the transcription factor protein binds the DNA of its own gene, it down-regulates the production of more of itself. This is one mechanism to maintain low levels of a transcription factor in a cell. [37]

    Nuclear localization Edit

    In eukaryotes, transcription factors (like most proteins) are transcribed in the nucleus but are then translated in the cell's cytoplasm. Many proteins that are active in the nucleus contain nuclear localization signals that direct them to the nucleus. But, for many transcription factors, this is a key point in their regulation. [38] Important classes of transcription factors such as some nuclear receptors must first bind a ligand while in the cytoplasm before they can relocate to the nucleus. [38]

    Activation Edit

    Transcription factors may be activated (or deactivated) through their signal-sensing domain by a number of mechanisms including:

      binding – Not only is ligand binding able to influence where a transcription factor is located within a cell but ligand binding can also affect whether the transcription factor is in an active state and capable of binding DNA or other cofactors (see, for example, nuclear receptors). [39][40] – Many transcription factors such as STAT proteins must be phosphorylated before they can bind DNA.
    • interaction with other transcription factors (e.g., homo- or hetero-dimerization) or coregulatory proteins

    Accessibility of DNA-binding site Edit

    In eukaryotes, DNA is organized with the help of histones into compact particles called nucleosomes, where sequences of about 147 DNA base pairs make

    1.65 turns around histone protein octamers. DNA within nucleosomes is inaccessible to many transcription factors. Some transcription factors, so-called pioneer factors are still able to bind their DNA binding sites on the nucleosomal DNA. For most other transcription factors, the nucleosome should be actively unwound by molecular motors such as chromatin remodelers. [41] Alternatively, the nucleosome can be partially unwrapped by thermal fluctuations, allowing temporary access to the transcription factor binding site. In many cases, a transcription factor needs to compete for binding to its DNA binding site with other transcription factors and histones or non-histone chromatin proteins. [42] Pairs of transcription factors and other proteins can play antagonistic roles (activator versus repressor) in the regulation of the same gene.

    Availability of other cofactors/transcription factors Edit

    Most transcription factors do not work alone. Many large TF families form complex homotypic or heterotypic interactions through dimerization. [43] For gene transcription to occur, a number of transcription factors must bind to DNA regulatory sequences. This collection of transcription factors, in turn, recruit intermediary proteins such as cofactors that allow efficient recruitment of the preinitiation complex and RNA polymerase. Thus, for a single transcription factor to initiate transcription, all of these other proteins must also be present, and the transcription factor must be in a state where it can bind to them if necessary. Cofactors are proteins that modulate the effects of transcription factors. Cofactors are interchangeable between specific gene promoters the protein complex that occupies the promoter DNA and the amino acid sequence of the cofactor determine its spatial conformation. For example, certain steroid receptors can exchange cofactors with NF-κB, which is a switch between inflammation and cellular differentiation thereby steroids can affect the inflammatory response and function of certain tissues. [44]

    Interaction with methylated cytosine Edit

    Transcription factors and methylated cytosines in DNA both have major roles in regulating gene expression. (Methylation of cytosine in DNA primarily occurs where cytosine is followed by guanine in the 5’ to 3’ DNA sequence, a CpG site.) Methylation of CpG sites in a promoter region of a gene usually represses gene transcription, [45] while methylation of CpGs in the body of a gene increases expression. [46] TET enzymes play a central role in demethylation of methylated cytosines. Demethylation of CpGs in a gene promoter by TET enzyme activity increases transcription of the gene. [47]

    The DNA binding sites of 519 transcription factors were evaluated. [48] Of these, 169 transcription factors (33%) did not have CpG dinucleotides in their binding sites, and 33 transcription factors (6%) could bind to a CpG-containing motif but did not display a preference for a binding site with either a methylated or unmethylated CpG. There were 117 transcription factors (23%) that were inhibited from binding to their binding sequence if it contained a methylated CpG site, 175 transcription factors (34%) that had enhanced binding if their binding sequence had a methylated CpG site, and 25 transcription factors (5%) were either inhibited or had enhanced binding depending on where in the binding sequence the methylated CpG was located.

    TET enzymes do not specifically bind to methylcytosine except when recruited (see DNA demethylation). Multiple transcription factors important in cell differentiation and lineage specification, including NANOG, SALL4A, WT1, EBF1, PU.1, and E2A, have been shown to recruit TET enzymes to specific genomic loci (primarily enhancers) to act on methylcytosine (mC) and convert it to hydroxymethylcytosine hmC (and in most cases marking them for subsequent complete demethylation to cytosine). [49] TET-mediated conversion of mC to hmC appears to disrupt the binding of 5mC-binding proteins including MECP2 and MBD (Methyl-CpG-binding domain) proteins, facilitating nucleosome remodeling and the binding of transcription factors, thereby activating transcription of those genes. EGR1 is an important transcription factor in memory formation. It has an essential role in brain neuron epigenetic reprogramming. The transcription factor EGR1 recruits the TET1 protein that initiates a pathway of DNA demethylation. [50] EGR1, together with TET1, is employed in programming the distribution of methylation sites on brain DNA during brain development and in learning (see Epigenetics in learning and memory).

    Transcription factors are modular in structure and contain the following domains: [1]

    • DNA-binding domain (DBD), which attaches to specific sequences of DNA (enhancer or promoter. Necessary component for all vectors. Used to drive transcription of the vector's transgene promoter sequences) adjacent to regulated genes. DNA sequences that bind transcription factors are often referred to as response elements.
    • Activation domain (AD), which contains binding sites for other proteins such as transcription coregulators. These binding sites are frequently referred to as activation functions (AFs), Transactivation domain (TAD) or Trans-activating domainTAD but not mix with topologically associating domain TAD. [51]
    • An optional signal-sensing domain (SSD) (e.g., a ligand binding domain), which senses external signals and, in response, transmits these signals to the rest of the transcription complex, resulting in up- or down-regulation of gene expression. Also, the DBD and signal-sensing domains may reside on separate proteins that associate within the transcription complex to regulate gene expression.

    DNA-binding domain Edit

    The portion (domain) of the transcription factor that binds DNA is called its DNA-binding domain. Below is a partial list of some of the major families of DNA-binding domains/transcription factors:

    Family InterPro Pfam SCOP
    basic helix-loop-helix [52] InterPro: IPR001092 Pfam PF00010 SCOP 47460
    basic-leucine zipper (bZIP) [53] InterPro: IPR004827 Pfam PF00170 SCOP 57959
    C-terminal effector domain of the bipartite response regulators InterPro: IPR001789 Pfam PF00072 SCOP 46894
    AP2/ERF/GCC box InterPro: IPR001471 Pfam PF00847 SCOP 54176
    helix-turn-helix [54]
    homeodomain proteins, which are encoded by homeobox genes, are transcription factors. Homeodomain proteins play critical roles in the regulation of development. [55] [56] InterPro: IPR009057 Pfam PF00046 SCOP 46689
    lambda repressor-like InterPro: IPR010982 SCOP 47413
    srf-like (serum response factor) InterPro: IPR002100 Pfam PF00319 SCOP 55455
    paired box [57]
    winged helix InterPro: IPR013196 Pfam PF08279 SCOP 46785
    zinc fingers [58]
    * multi-domain Cys2His2 zinc fingers [59] InterPro: IPR007087 Pfam PF00096 SCOP 57667
    * Zn2/Cys6 SCOP 57701
    * Zn2/Cys8 nuclear receptor zinc finger InterPro: IPR001628 Pfam PF00105 SCOP 57716

    Response elements Edit

    The DNA sequence that a transcription factor binds to is called a transcription factor-binding site or response element. [60]

    Transcription factors interact with their binding sites using a combination of electrostatic (of which hydrogen bonds are a special case) and Van der Waals forces. Due to the nature of these chemical interactions, most transcription factors bind DNA in a sequence specific manner. However, not all bases in the transcription factor-binding site may actually interact with the transcription factor. In addition, some of these interactions may be weaker than others. Thus, transcription factors do not bind just one sequence but are capable of binding a subset of closely related sequences, each with a different strength of interaction.

    For example, although the consensus binding site for the TATA-binding protein (TBP) is TATAAAA, the TBP transcription factor can also bind similar sequences such as TATATAT or TATATAA.

    Because transcription factors can bind a set of related sequences and these sequences tend to be short, potential transcription factor binding sites can occur by chance if the DNA sequence is long enough. It is unlikely, however, that a transcription factor will bind all compatible sequences in the genome of the cell. Other constraints, such as DNA accessibility in the cell or availability of cofactors may also help dictate where a transcription factor will actually bind. Thus, given the genome sequence it is still difficult to predict where a transcription factor will actually bind in a living cell.

    Additional recognition specificity, however, may be obtained through the use of more than one DNA-binding domain (for example tandem DBDs in the same transcription factor or through dimerization of two transcription factors) that bind to two or more adjacent sequences of DNA.

    Transcription factors are of clinical significance for at least two reasons: (1) mutations can be associated with specific diseases, and (2) they can be targets of medications.

    Disorders Edit

    Due to their important roles in development, intercellular signaling, and cell cycle, some human diseases have been associated with mutations in transcription factors. [61]

    Many transcription factors are either tumor suppressors or oncogenes, and, thus, mutations or aberrant regulation of them is associated with cancer. Three groups of transcription factors are known to be important in human cancer: (1) the NF-kappaB and AP-1 families, (2) the STAT family and (3) the steroid receptors. [62]

    Below are a few of the better-studied examples:

    Condition Description Locus
    Rett syndrome Mutations in the MECP2 transcription factor are associated with Rett syndrome, a neurodevelopmental disorder. [63] [64] Xq28
    Diabetes A rare form of diabetes called MODY (Maturity onset diabetes of the young) can be caused by mutations in hepatocyte nuclear factors (HNFs) [65] or insulin promoter factor-1 (IPF1/Pdx1). [66] multiple
    Developmental verbal dyspraxia Mutations in the FOXP2 transcription factor are associated with developmental verbal dyspraxia, a disease in which individuals are unable to produce the finely coordinated movements required for speech. [67] 7q31
    Autoimmune diseases Mutations in the FOXP3 transcription factor cause a rare form of autoimmune disease called IPEX. [68] Xp11.23-q13.3
    Li-Fraumeni syndrome Caused by mutations in the tumor suppressor p53. [69] 17p13.1
    Breast cancer The STAT family is relevant to breast cancer. [70] multiple
    Multiple cancers The HOX family are involved in a variety of cancers. [71] multiple
    Osteoarthritis Mutation or reduced activity of SOX9 [72]

    Potential drug targets Edit

    Approximately 10% of currently prescribed drugs directly target the nuclear receptor class of transcription factors. [73] Examples include tamoxifen and bicalutamide for the treatment of breast and prostate cancer, respectively, and various types of anti-inflammatory and anabolic steroids. [74] In addition, transcription factors are often indirectly modulated by drugs through signaling cascades. It might be possible to directly target other less-explored transcription factors such as NF-κB with drugs. [75] [76] [77] [78] Transcription factors outside the nuclear receptor family are thought to be more difficult to target with small molecule therapeutics since it is not clear that they are "drugable" but progress has been made on Pax2 [79] [80] and the notch pathway. [81]

    Gene duplications have played a crucial role in the evolution of species. This applies particularly to transcription factors. Once they occur as duplicates, accumulated mutations encoding for one copy can take place without negatively affecting the regulation of downstream targets. However, changes of the DNA binding specificities of the single-copy LEAFY transcription factor, which occurs in most land plants, have recently been elucidated. In that respect, a single-copy transcription factor can undergo a change of specificity through a promiscuous intermediate without losing function. Similar mechanisms have been proposed in the context of all alternative phylogenetic hypotheses, and the role of transcription factors in the evolution of all species. [82] [83]

    There are different technologies available to analyze transcription factors. On the genomic level, DNA-sequencing [84] and database research are commonly used [85] The protein version of the transcription factor is detectable by using specific antibodies. The sample is detected on a western blot. By using electrophoretic mobility shift assay (EMSA), [86] the activation profile of transcription factors can be detected. A multiplex approach for activation profiling is a TF chip system where several different transcription factors can be detected in parallel.

    The most commonly used method for identifying transcription factor binding sites is chromatin immunoprecipitation (ChIP). [87] This technique relies on chemical fixation of chromatin with formaldehyde, followed by co-precipitation of DNA and the transcription factor of interest using an antibody that specifically targets that protein. The DNA sequences can then be identified by microarray or high-throughput sequencing (ChIP-seq) to determine transcription factor binding sites. If no antibody is available for the protein of interest, DamID may be a convenient alternative. [88]

    As described in more detail below, transcription factors may be classified by their (1) mechanism of action, (2) regulatory function, or (3) sequence homology (and hence structural similarity) in their DNA-binding domains.

    Mechanistic Edit

    There are two mechanistic classes of transcription factors:

      are involved in the formation of a preinitiation complex. The most common are abbreviated as TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH. They are ubiquitous and interact with the core promoter region surrounding the transcription start site(s) of all class II genes. [89]
    • Upstream transcription factors are proteins that bind somewhere upstream of the initiation site to stimulate or repress transcription. These are roughly synonymous with specific transcription factors, because they vary considerably depending on what recognition sequences are present in the proximity of the gene. [90]

    Functional Edit

    Transcription factors have been classified according to their regulatory function: [11]

    • I. constitutively active – present in all cells at all times – general transcription factors, Sp1, NF1, CCAAT
    • II. conditionally active – requires activation
      • II.A developmental (cell specific) – expression is tightly controlled, but, once expressed, require no additional activation – GATA, HNF, PIT-1, MyoD, Myf5, Hox, Winged Helix
      • II.B signal-dependent – requires external signal for activation
        • II.B.1 extracellular ligand (endocrine or paracrine)-dependent – nuclear receptors
        • II.B.2 intracellular ligand (autocrine)-dependent - activated by small intracellular molecules – SREBP, p53, orphan nuclear receptors
        • II.B.3 cell membrane receptor-dependent – second messenger signaling cascades resulting in the phosphorylation of the transcription factor
          • II.B.3.a resident nuclear factors – reside in the nucleus regardless of activation state – CREB, AP-1, Mef2
          • II.B.3.b latent cytoplasmic factors – inactive form reside in the cytoplasm, but, when activated, are translocated into the nucleus – STAT, R-SMAD, NF-κB, Notch, TUBBY, NFAT

          Structural Edit

          Transcription factors are often classified based on the sequence similarity and hence the tertiary structure of their DNA-binding domains: [91] [10] [92] [9]

          Function of Transcription Factors

          The principal role transcription factors play is in allowing cells to differentiate. Through their ability to initiate or repress site-specific transcription, each cell in our bodies can differentiate into a different cell type despite containing the same exact genetic code. Turning genes on or off allows cells to potentiate into the different tissues and organs that make up our bodies. (In fact, embryonic and adult stem cells are a class of undifferentiated cells that can differentiate into the cell type that it is placed next to.)

          Transcription factors also make genetic fine-tuning possible. Modulating the activity and the amount of transcription factor can upregulate (increase) or downregulate (decrease) the rates of the chosen gene’s transcription. Therefore, these alterations not only allow a gene to be expressed, but they determine at which level each gene is expressed. For example, when the insulin levels in our blood are elevated, our cells trigger a downregulation in the expression of insulin receptor. So, although insulin receptors continue to be made for life-sustaining purposes, the levels of expression subside to accommodate our body’s new internal conditions.

          The method of action is reflected in the classification the factor falls under, as discussed below.

          Regulation of Gene Expression | Genetics

          In this article we will discuss about the regulation of gene expression in prokaryotes and eukaryotes.

          The DNA of a microbial cell consists of genes, a few to thousands, which do not express at the same time. At a particular time only a few genes express and synthesize the desired protein. The other genes remain silent at this moment and express when required. Requirement of gene expression is governed by the environment in which they grow. This shows that the genes have a property to switch on and switch off.

          The Genetic Code that 20 different amino acids constitute different protein. All are synthesised by codons. Therefore, synthesis of all the amino acids requires energy which is useless because all the amino acids constituting proteins are not needed at a time.

          Hence, there is need to control the synthesis of those amino acids (proteins) which are not required. By doing this the energy of a living cell is conserved and cells become more competent. Therefore, a control system is operative which is known as gene regulation.

          There are certain substrates called inducers that induce the enzyme synthesis. For example, if yeast cells are grown in medium containing lactose, an enzyme lactase is formed. Lactase hydrolyses the lactose into glucose and galactose. In the absence of lactase, lactose synthesis does not occur.

          This shows that lactose induces the enzyme lactase. Therefore, lactase is known as inducible enzyme. In addition, sometimes the end product of metabolism has inhibitory effect on the synthesis of enzyme. This phenomenon is called feed back or end product inhibition.

          From the outgoing discussion it appears that a cell has auto-control mediated by the gene itself. For the first time Francois Jacob and Jacques Monod (1961) at the Pasteur Institute (Paris) put forward a hypothesis to explain the induction and repression of enzyme synthesis.

          They investigated the regulation of activities of genes which controls lactose fermentation in E. coli through synthesis of an enzyme, β-galactosidase. For this significant contribution in the field of biochemistry they were awarded Nobel Prize in Medicine in 1965.

          Regulation of Gene Expression in Prokaryotes:

          Gene expression of prokaryotes is controlled basically at two levels i.e. transcription and translation stages. In addition, mRNA degradation and protein modification also play a role in regulation. Most of the prokaryotic genes that are regulated are controlled at transcriptional stage.

          Other control measures operating at different levels are given in Table. 10.2:

          Transcriptional Control in Prokaryotes:

          It is a general strategy in a living organism that chemical changes occur by a metabolic pathway through a chain of reactions. Each step is determined by the enzymes. Again synthesis of an enzyme comes under the control of genetic material i.e. DNA in living organisms. Enzymes (proteins) are synthesised via two steps: transcription and translation.

          Transcription refers to synthesis of mRNA. Transcription is regulated at or around promoter region of a gene. By controlling the ability of RNA polymerase to the promoter the cell can modulate the amount of message being transcribed through the structural gene. However, if RNA polymerase has bound, again it can modulate transcription.

          By doing so the amount of gene product synthesized is also modulated. The coding region is also called structural gene. Adjacent to it are regulatory regions that control the structural genes. The regulatory regions are composed of promoter (for the initiation of transcription) and an operator (where a diffusible regulatory protein binds) regions.

          The molecular mechanisms for each of regulatory patterns vary widely but usually fall in one of two major groups: negative regulation and positive regulation. In negative regulation an inhibitor is present in the cell and prevents transcription. This inhibitor is called as repressor.

          An inducer i.e. antagonist repressor is required to permit the initiation of transcription. In a positive regulated system an effector molecule (i.e. a protein, molecule or molecular complex) activates a promoter. The repressor proteins produce negative control, whereas the activator proteins produce positive control.

          Since the transcription process is accomplished in three steps (RNA polymerase binding, isomerization of a few nucleotides and release of RNA polymerase from promoter region), the negative regulators usually block the binding, whereas the activators interact with RNA polymerase making one or more steps.

          Fig. 10.19 shows the negative and positive regulation mechanism of the genes. In negative regulation (A) an inhibitor is bound to the DNA molecule. It must be removed for efficient transcription. In positive regulation (B) an effector molecule must bind to DNA for transcription.

          i. The Lac Operon Model (Jacob-Monod Model):

          For the first time Jacob and Monod (1961) gave the concept of operon model to explain the regulation of gene action. An operon is defined as several distinct genes situated in tandem, all controlled by a common regulatory region.

          Commonly an operon consists of repressor, promoter, operator and structural genes. The message produced by an operon is polycistronic because the information of all the structural genes resides on a single molecule of mRNA.

          The regulatory mechanism of operon responsible for utilization of lactose as a carbon source is called the lac operon. It was extensively studied for the first time by Jacob and Monod (1961). Lactose is a disaccharide which is composed of glucose and galactose (Fig. 10.20).

          The lactose utilizing system consists of two types of components the structural genes (lacZ, lacY and lacA) the products of which are required for transport and metabolism of lactose and the regulatory genes (the lad, the lacO and the lacP). These two components together comprises of the lac operon (Fig. 10.21a).

          One of the most key features is that operon provides a mechanism for the coordinate expression of structural genes controlled by regulatory genes. Secondly, operon shows polarity i.e. the genes Z, Y and A synthesise equal quantities of three enzymes β-galactosidase (by lacZ), permease (by lacY) and acetylase (by lacA). These are synthesized in an order i.e. β- galactosidase first and acetylase in the last.

          (i) The Structural Genes:

          The structural genes form one long polycistronic mRNA molecule. The number of structural gene corresponds to the number of proteins. Each structural gene is controlled independently, and transcribes mRNA molecules separately.

          This depends on substrates to be utilized. For example, in lac operon three structural genes (Z, Y and A) are associated with lactose utilization (Fig. 10.21A). β-galactose is the product of lacZ that cleaves β-1 → 4 linkage of lactose and releases the free monosaccharides.

          This enzyme is a tetramer of four identical subunits each with molecular weight of 1,16,400. The enzyme permease (a product of lacY) facilitates the lactose to enter inside the bacterium.

          Permease has molecular weight of 46,500. It is hydrophobic. The cells mutant in lacZ and lacY are designated as Lac – i.e. the bacteria cannot grow in lactose-free medium. The enzyme transacetylase (30,000 MW) is a product of lacA whose no definite role has been assigned.

          The lac operon consists of a promoter (P) and an operator (O) together with the structural genes. The initiation codon of lacZ is TAG that corresponds to AUG of mRNA. It is situated 10 bp away from the end of operator gene. However, the lac operon cannot function in the presence of sugars other than lactose.

          The operator gene is about 28 bp in length present adjacent to lacZ gene. The base pairs in the operator region are palindrome i.e. show two fold symmetry from a point (Fig. 10.22). The operator overlaps the promoter region.

          The lac repressor proteins (a tetramer of four subunits) bind to the lac operator in vitro and protect part of the lac operator in vitro and protect part of the promoter region from the digestion of DNase.

          The repressor proteins bind to the operator and form an operator-repressor complex which in turn physically blocks the transcription of Z,Y and A genes by preventing the release of RNA polymerase to begin transcription (Fig. 10.21b).

          In bacteriophage λ there are two operators the OL and OR which have different base sequences. Lambda repressor (gpcl) is rapidly synthesized, binds to OL and OR and inhibits the synthesis of mRNA and production of proteins gpcll and gpcII.

          (iii) The Promoter Gene:

          The promoter gene is about 100, nucleotide long and continuous with the operator gene. Gilbert (1974) and Dickson (1975) have worked out the complete nucleotide sequence of the control region of lac operon. The promoter gene lies between the operator gene and regulator gene.

          Like operators the promoter region consists of palindromic sequence of nucleotides (Figs. 10.22 and 10.23). These palindromic sequences are recognized by such proteins that have symmetrically arranged subunits. This section of two fold symmetry is present on the CRP site that binds to a protein called CRP (cyclic AMP receptor protein). The CRP is encoded by CRP gene (Fig. 10.25).

          It has been shown experimentally that CRP binds to cAMP (cyclic AMP found in E. coli and other organisms) molecule and form a cAMP-CRP complex. This complex is required for transcription because it binds to promoter and enhances the attachment of RNA polymerase to the promoter.

          Therefore, it increases transcription and translation processes. Thus, cAMP-CRP is a positive regulator in contrast to the repressor, and the lac operon is controlled by both positively and negatively.

          According to a model proposed by Pribnow (1975) the promoter region consists of three important components which are present at a fixed position to each other.

          These components are:

          (i) The recognition sequence,

          (ii) The binding sequence, and

          (iii) An mRNA initiation site.

          The recognition sequence is situated outside the polymerase binding site that is why it is protected from DNase. Firstly, RNA polymerase binds to DNA and forms a complex with the recognition sequence. The binding site is 7 bp long (5’TATGTTG) and present at such region that is protected from DNase. In other organisms the base pairs do not differ from more than two bases. Hence, it can be written as 5′ TATPuATG.

          The mRNA initiation site is present near the binding site on one of the two bases. The initiation site is also protected from DNase. However, there is overlapping of promoter and operator in lac operon, Moreover, there is a sequence 5’CCGG, 20 bp left to mRNA initiation site. This is known as Hpall site (5’CCGG) because of being cleaved at this site by the restriction enzyme Hpall.

          (iv) The Repressor (Regulator) Gene:

          Repressor gene determines the transcription of structural gene.

          It is of two types:

          It codes for amino acid of a defined repressor protein.

          After synthesis the repressor molecules are diffused from the ribosome and bind to the operator in the absence of an inducer. Finally, the path of RNA polymerase is blocked and mRNA is not transcribed. Consequently, no protein synthesis occurs. This type of mechanism occurs in the inducible system of active repressor.

          Moreover, when an inducer (e.g. lactose) is present, it binds to repressor proteins and forms an inducer-repressor complex. This complex cannot bind to the operator. Due to formation of complex the repressor undergoes changes in conformation of shape and becomes inactive. Consequently, the structural genes can synthesise the polycistronic mRNAs and the later synthesizes enzymes (proteins).

          In contrast, in the reversible system the regulator gene synthesizes repressor protein that is inactive and, therefore, fails to bind to operator. Consequently, proteins are synthesised by the structural genes.

          However, the repressor proteins can be activated in the presence of a co-repressor. The co-repressor together with repressor proteins forms the repressor-co-repressor complex. This complex binds to operator gene and blocks protein synthesis.

          Jacob and Monod (1961) could not identify the repressor protein. Gilbert and Muller – Hill (1966) succeeded in isolating the lac repressor from the Lac mutant cells of E. coli inside which the lac repressor was about ten times greater than the normal cells. The lac repressor proteins have been crystallized. It has a molecular weight of about 1,50,000.

          It consists of four subunits-each has 347 amino acid residues and molecular weight of about 40,000 Daltons. The repressor proteins have strong affinity for a segment of 12-15 base pairs of operator gene. This binding of repressor blocks the synthesis of mRNA transcript by RNA polymerase.

          The lac operon is induced when E. coli cells are kept in medium containing lactose. The lactose is taken up inside the cell where it undergoes glycosylation i.e. molecular rearrangement from lactose to allolactose. The galactosyl residue is present on 6 rather than 4 position of glucose (Fig. 10.20). Glycosylation is done by β-galactosidase that is constitutively present in the cell before induction.

          Allolactose is the real inducer molecule. The lac repressor protein is an allosteric molecule with specific binding sites for DNA and inducer. Allolacctose binds to lac repressor to form an inducer- repressor complex. Binding of inducer to repressor allosterically changes the repressor lowering its affinity for lacO DNA.

          Consequently repressor is released from lacO due to changes in three dimensional conformations. This is called allosteric effect. After being free lacO allows the RNA polymerase to form mRNA transcript. Here, allolactose acts as the effector molecule and checks the regulatory protein from binding to lacO (operator) gene.

          ii. Positive Regulation of the lac Operon-Catabolic Control:

          Cyclic AMP (cAMP) is the small molecule which is distributed in animal tissues, and controls the action of many hormones. It is also present in E. coli and the other bacteria. The cAMP is synthesized by the enzyme adenyl cyclase. (Fig. 10.24). Its concentration is directly regulated by glucose metabolism.

          The Lac operon has an additional positive regulatory control mechanism to avoid the wastage of energy during the synthesis of lactose-utilizing proteins while there is adequate supply of glucose.

          When E. coli grows in a medium containing glucose the cAMP concentration in the cells falls down. This mechanism is poorly understood. However, the note worthy point is that cAMP regulates the activity of lac operon (and other operons also).

          In contrast when E. coli cells are fed with alternate carbon source e.g. succinate, cAMP level increases. The crp locus expresses the enzyme adenylate cyclase that converts the ATP to cAMP.

          How does cAMP increase the process of transcription, is not known clearly. It has been shown experimentally that cAMP binds to the proteins expressed by crp locus which is known as cAMP receptor protein (CRP) or catabolic activator protein (CAP) (Fig. 10.25).

          Therefore, CRP-cAMP complex binds to the CAP-binding site present on lac promoter. The CRP -cAMP bound complex promotes the helix destabilization downstream, and facilitates RNA polymerase binding. This results in efficient open promoter formation and in turn transcription.

          iii. The PaJaMo Experiment:

          The key experiment in understanding the induction of β-galactosidase was done by Arthur Pardee, Jacob and Monod therefore, it is called PaJaMo experiment. They found that if a DNA molecule containing the lac operon enters a cell devoid of lac operon (lac – ), then the lac – cells are converted in to lac + cells.

          The lac operon expresses in the new cells, provided the DNA contains complete genes or open reading frames and a good promoter. The genes express and RNA polymerase binds to the promoter. The genes are transcribed, ribosomes bind to the mRNA, and β-galactosidase is synthesised.

          II. Regulation of Gene Expression in Eukaryotes:

          There is much variation and complexity in regulation of genes in eukaryotes. Because in eukaryotes different genes are expressed at different developmental stages of cells or different tissues under the influence of different types of stimuli imposed by external environment. Eukaryotic DNA undergoes several changes such as double stranded, linear thread, nucleosome, fibres, chromatid and chromosomes.

          Gene expression and regulation take place only when DNA is in double stranded linear form. Moreover, if the promoter or regulator region of any gene is organized into chromosome, initiation of transcription does not take place.

          Therefore, changes in state of chromatin occur by chromatin remodeling which results in gene activation. Thus packaging of DNA influences gene expression. In majority of cases regulation of gene expression takes place at transcription level. Regulation of expression at processing or translation level may also occur in eukaryotes.

          Gene expression can be regulated at several steps in the pathway from DNA to RNA to protein in a cell as described below:

          i. Transcriptional control:

          Controlling the gene expression during transcription

          ii. RNA processing control:

          Control 8f processing of primary RNA transcripts to form mature mRNA

          iii. RNA transport control:

          Control of transport of mature mRNA from nucleus to cytoplasm

          iv. Translational control:

          Selection of mRNAs in cytoplasm to be translated by ribosome.

          v. mRNA degradation control:

          Selective degradation of certain mRNA molecules in the cytoplasm, or

          vi. Protein activity control:

          Selective activation, inactivation or compartmentalization of specific protein molecule after their synthesis. Only transcriptional control ensures that no superfluous intermediates are synthesized.

          (i) Regulation through Transcriptional Factors:

          Unlike prokaryotes, there are multiple DNA binding proteins called transcription factors that control transcription in eukaryotes. These proteins are grouped into two major classes: the general transcriptional factors (GTFs) and the regulatory transcriptional factors (RTFs) The eukaryotic RNA polymerase fails to recognize the promoter directly.

          Therefore, the GTFs bind first the promoter directly (TATA sequence of all prokaryotes). RNA polymerase starts transcription at promoter site. The RTFs bind the regulatory site of the genes which is far away from the promoter.

          The RTFs bind to all the regulatory sequences of gene and control the rate of assembly of GTFs at the promoter. The RTFs either increase or decrease the transcription. When transcription is increased, this property is called activator. The decreasing level of transcription is called repression.

          (ii) Britten-Davidson Model for Gene Regulation:

          Regulation at transcription level involves both activation and repression of genes. Because genes may be switched on in some cases and switched off in others. Various models have been proposed for regulation of gene expression in eukaryotes. In 1969, Britten and Davidson proposed a model called gene battery model or Britten- Davidson model which is very popular. This model was further elaborated in 1973.

          According to this model, there are four classes of sequences:

          (i) Producer genes (which are comparable to structural genes of prokaryotes),

          (ii) Receptor site (comparable to operator gene in bacterial operon),

          (iii) Integrator gene (comparable to regulator gene synthesizing an activator RNA which may or may not synthesize protein before it activates the receptor site), and

          (iv) Sensor site (regulates the activity of integrator gene which can be transcribed only after activation of sensor site). The four classes of se­quences are interrelated (Fig. 10.27).

          In this model producer gene and integrator gene are involved in transcription, whereas the receptor and sen­sor sequences help in recogni­tion without participating in RNA synthesis.

          It has been proposed that receptor site and integrator gene are repeated several times so that the activ­ity of a large number of genes may be controlled in the same cell, same activator may rec­ognize all the repeats, and sev­eral enzymes of one pathway may be synthesized simultaneously.

          Transcription of the same gene is done in different developmental stages. This is achieved by several receptor sites and integrator genes. Each producer gene possesses many receptors sites, each site responds to one activator (Fig. 10.28) so that several genes can be recognized by a single activator. But at different time the same gene may be activated by different activators.

          A set of structural genes controlled by one sensor site is called ‘gene battery’. Several sets of genes may be activated when major changes are required. If one sensor site gets associated with them, transcription of all integrators may be caused at the same time. Thus, transcription of several producer genes is caused through receptor sites.

          Effect of transcription factor resource sharing on gene expression noise

          Gene expression is intrinsically a stochastic (noisy) process with important implications for cellular functions. Deciphering the underlying mechanisms of gene expression noise remains one of the key challenges of regulatory biology. Theoretical models of transcription often incorporate the kinetics of how transcription factors (TFs) interact with a single promoter to impact gene expression noise. However, inside single cells multiple identical gene copies as well as additional binding sites can compete for a limiting pool of TFs. Here we develop a simple kinetic model of transcription, which explicitly incorporates this interplay between TF copy number and its binding sites. We show that TF sharing enhances noise in mRNA distribution across an isogenic population of cells. Moreover, when a single gene copy shares it's TFs with multiple competitor sites, the mRNA variance as a function of the mean remains unaltered by their presence. Hence, all the data for variance as a function of mean expression collapse onto a single master curve independent of the strength and number of competitor sites. However, this result does not hold true when the competition stems from multiple copies of the same gene. Therefore, although previous studies showed that the mean expression follows a universal master curve, our findings suggest that different scenarios of competition bear distinct signatures at the level of variance. Intriguingly, the introduction of competitor sites can transform a unimodal mRNA distribution into a multimodal distribution. These results demonstrate the impact of limited availability of TF resource on the regulation of noise in gene expression.

          Conflict of interest statement

          The authors have declared that no competing interests exist.


          Fig 1. Different scenarios of competition for…

          Fig 1. Different scenarios of competition for a pool of TFs among promoters and other…

          Fig 2. A kinetic model of transcription,…

          Fig 2. A kinetic model of transcription, incorporating the TFs’ binding and unbinding to multiple…

          Fig 3. Our model of TF sharing…

          Fig 3. Our model of TF sharing predicts the behavior of the first two moments…

          Fig 4. Prediction of mRNA variance as…

          Fig 4. Prediction of mRNA variance as a function of mean, when the promoters share…

          Fig 5. Introduction of competitor sites can…

          Fig 5. Introduction of competitor sites can lead to multi-modal mRNA distribution.

          Decoupling transcription factor expression and activity enables dimmer switch gene regulation

          Gene-regulatory networks achieve complex mappings of inputs to outputs through mechanisms that are poorly understood. We found that in the galactose-responsive pathway in Saccharomyces cerevisiae, the decision to activate the transcription of genes encoding pathway components is controlled independently from the expression level, resulting in behavior resembling that of a mechanical dimmer switch. This was not a direct result of chromatin regulation or combinatorial control at galactose-responsive promoters rather, this behavior was achieved by hierarchical regulation of the expression and activity of a single transcription factor. Hierarchical regulation is ubiquitous, and thus dimmer switch regulation is likely a key feature of many biological systems. Dimmer switch gene regulation may allow cells to fine-tune their responses to multi-input environments on both physiological and evolutionary time scales.

          Copyright © 2021 The Authors, some rights reserved exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.

          Conflict of interest statement

          Competing interests: The authors declare no competing interests.


          Fig. 1.. Switch-like activation and rheostat-like control…

          Fig. 1.. Switch-like activation and rheostat-like control of expression level are genetically decoupled in the…

          Fig. 2.. Decoupling occurs through regulation of…

          Fig. 2.. Decoupling occurs through regulation of the Gal4p transcription factor rather than directly at…

          Fig. 3.. Mechanistic model: switch-and-rheostat works through…

          Fig. 3.. Mechanistic model: switch-and-rheostat works through decoupled regulation of Gal4p activity and abundance.

          Enhancers and Transcription

          In some eukaryotic genes, there are regions that help increase or enhance transcription. These regions, called enhancers, are not necessarily close to the genes they enhance. They can be located upstream of a gene, within the coding region of the gene, downstream of a gene, or may be thousands of nucleotides away.

          Enhancer regions are binding sequences, or sites, for transcription factors. When a DNA-bending protein binds to an enhancer, the shape of the DNA changes. This shape change allows the interaction between the activators bound to the enhancers and the transcription factors bound to the promoter region and the RNA polymerase to occur. Whereas DNA is generally depicted as a straight line in two dimensions, it is actually a three-dimensional object. Therefore, a nucleotide sequence thousands of nucleotides away can fold over and interact with a specific promoter.

          Figure (PageIndex<1>): Enhancers: An enhancer is a DNA sequence that promotes transcription. Each enhancer is made up of short DNA sequences called distal control elements. Activators bound to the distal control elements interact with mediator proteins and transcription factors.


          Data details and processing

          We analyzed a cohort of 50 Yoruban samples, for which genotypes of SNPs that are fully ascertained from sequencing data [29] along with RNA-sequencing data [30] are publicly available. Briefly, the raw dataset consists of 10,553,953 genotyped SNPs and expression measurements (quantile-quantile normalized values) of 18,147 genes with Ensembl gene ID across these 50 samples. Standard filters have been applied to the genetic data: minor allele frequency >0.05, SNP missingness rate <0.1 and individual missingness rate <0.1 [53]. After filtering, data for analysis consist of 50 samples with 7,206,056 SNPs.

          Association testing

          For association analysis, we considered only SNPs that resided within candidate regulatory regions along the genome. For trans association, we tested for association between a SNP and every gene we considered SNPs within the span of known exons and TFs (including introns) [54]. We tested for association using linear regression (Figure S2 in Additional file 1).

          Obtaining a random distribution of association test statistics

          Examining the random distribution of association tests was helpful in evaluating the empirical significance of results. This was achieved by generating 100,000 random pairs of sources and targets for exonic and TF variation separately. We used a strict randomization process of edges switching. We picked a source gene from all sources in the real data we then picked a target gene from all targets in the real data with a P-value cutoff of 10 -6 . When evaluating the number of targets per TF source, we created 1,000 sets of random TF source and gene target pairs each set contained 370 such pairs corresponding to 370 TF source-target pairs at a P-value cutoff of 10 -6 in the real data.

          Identifying topological properties of source-target pairs projected on the PPI network

          We used the PPI network provided by the Human Protein Reference Database [49]. The undirected network contains 9,671 nodes and 37,041 edges. For each node, we calculated its degree: the number of edges incident on the node. We defined a distance between every two nodes as the number of edges on the shortest path between them. All pair-wise shortest paths were determined using the Floyd-Warshall algorithm [55]. In cases where the network had more than one connected component, nodes from two different components were defined to have a distance of twice the maximal distance obtained within the components.

          Identifying topological trends across association P-values

          For exons, we observed the emergence of true positive associations between P-values 10 -6 and 10 -7 (Figure S2a in Additional file 1). Therefore, we focused on P-values <10 -6 and sorted all source-target pairs according to the significance of their association signal. We considered each prefix of this list, that is, each subset of source-target pairs exceeding a particular threshold, for significance of association signal. For each such subset, we reported each one of the topological properties defined above averaged over the subset. We calculated Spearman's correlation coefficient between significance thresholds and each of these cumulative averages. In a similar process, we randomly chose an equal number of arbitrary source-target pairs on the PPI network. Adding these pairs one by one created a distribution of analogous cumulative averages for permuted pairs. We recorded the Spearman correlation coefficient for these 100,000 permuted distributions. We calculated the empirical P-value for the significance of the observed correlation coefficients by counting the number of times when permuted r > real r and divided this by the number of permutations.

          Expression analysis

          We calculated all pairwise co-expression correlations for all gene pairs in the dataset using Spearman rank-correlation test, and therefore obtained the distribution of the correlation coefficient r. To determine whether the distribution of r between source-target pairs differed from its background distribution, we employed the Wilcoxon ranked-sum test.

          Enrichment of eSNPs for cis effects

          We examined whether eSNPs that were associated with a target's expression level also affected expression levels of the corresponding source. We tested this by considering, for each source-target pair, the one eSNP most associated to the expression for the target. We tallied the source-target pairs for which this eSNP was also significantly associated (P <0.05) with the expression level of the source. Under the null, the number of such pairs is a random variable that is binomially distributed. Bin (n = number ofunique source genes, P =0.05).

          Unit and path annotation

          We defined units of genes by considering a TF source and its gene targets. We examined shortest paths within the PPI network between eSNP exon source and its gene target. The enrichment of units and paths with gene subsets from the Gene Ontology [38], and KEGG [37] databases was calculated by Genatomy [36]. We reported only units or paths with annotations that had a significant FDR of 0.05 or better. The description of genes in units or paths is cited from the National Center for Biotechnology Information Gene database and GeneCards [56].

          Finding transcription factor source-target pairs in the experimental database

          The ChIP Enrichment Analysis (ChEA) database [39] represents a collection of interactions describing the binding of transcription factors to DNA, collected from ChIP-X (ChIP-chip, ChIP-sequencing, ChIP-positron emission tomography and DNA adenine methyltransferase identification) experiments. For each TF source and target, we examined if they were present in ChEA. We repeated the same procedure for 100,000 permuted pairs of a random TF source and a random gene target. We then compared, using Fisher's exact test, the number of pairs in ChEA between real and permutation pairs, out of all pairs where the TF source was included in the database.

          Finding PPI network decomposition to clusters

          The decomposition of the PPI network to clusters was computed by using the Louvain algorithm presented in [57]. This is a heuristic method that is based on modularity optimization. The method consists of two phases and partitions the network into clusters such that the number of edges between clusters is significantly less than expected by chance. The method provides a mathematical measure for modularity with network-size normalized values, ranging from 0 (low modularity) to 1 (maximum modularity). This method has been previously applied to various biological networks [58] and specifically to a PPI network [59].

          Significance of source and target residing in the same PPI cluster

          For each exon and TF source-target pair, we recorded whether both source and target resided in the same PPI cluster. We repeated the same procedure with 100,000 permuted unique source-target pairs from nodes on the PPI network. We then compared the number of cluster co-occurrences between real data and permutations using the Fisher exact test.

          Comparing shortest paths annotation content

          We recorded all genes along the shortest paths between exonic sources and targets, both in real and permuted data. We then looked for enrichment in this set of genes (at least 10 genes per category, FDR <0.05). We created sets of 1,000 permuted 55 shortest paths (from the 17,564 shortest paths in permutations) that followed the exact length distribution of the 55 real paths. For each one of the six categories that was not enriched in permutations, we performed two analyses: first, we counted how many genes from each category appeared in the real paths (with repetitions, that is if gene × from category Y appeared in two shortest paths we counted it twice) and second, we counted how many of the 55 paths had at least one gene from this category. We repeated the same procedures for the 1,000 permuted sets. For each category, we then counted how many of the 1,000 permutations achieved equal or greater numbers than seen for the real data (empirical P-value).

          Watch the video: TRANSCRIPTIONAL FACTORS: Gene regulation and the role of oestrogen explained. (August 2022).