Polymorphism in number of chromosomes?

Polymorphism in number of chromosomes?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

The answer to this question, saying that Down Syndrome - a trisomy of human chromosome 21 - is caused by de novo mutation (rather than resulting from standing variation) made me think about polymorphisms in the number of autosomes (not so much for sex chromosomes because of dosage compensation). The reason for this question is that I would never have thought that an aneuploidy, in theory, could be a segregating trait because of meiotic barriers.

I found evidence of ploidy polymorphisms in plants (especially their hybrids) that occurs when plants with different karyotypes hybridise [see, for example, Vandenhout et al. (1995) and Husband (2014)]. Additionally, I also found that this also applies to some aquatic vertebrates, however, 'fish' are also known for karyotype diversity (see Zhou et al. (2007) and Zhao et al. (2016)).

In rather sharp contrast to that, mammals are (almost) strictly diploid and most mammal species are characterised by fixed chromosome numbers, even though exceptions exist, i.e in mice and other rodents that exhibit intraspecific variation in diploid numbers - these are created by Robertsonian translovations [Graphodatsky et al. (2011)].

However, all of these are diploid and variation always needs to be expressed as a chromosome number of the form $2n$ because of meiosis. The karyotype of a human with trisomie 21 cannot be expressed in that form as this person has $2n +1$ chromosomes. All cases of aneuploidiy in autosomes I am aware of cause disease. My (related) questions now are:

(1) Are there described cases of non-deleterious aneuploidies in autosomes?

Those cannot be segregating in the populations as aneuploidies are not inherited to children, so more importantly:

(2) Are there indirectly segregating aneuploidies in autosomes (i.e. some sort of heritable trisomy that e.g. result from a trisomy followed by chromosomal rearrangement)?

If so, do any of these also exist in mammals, particularly in humans?

Good questions, but you make a claim that isn't necessarily true:

However, all of these are diploid and variation always needs to be expressed as a chromosome number of the form $2n$ because of meiosis. The karyotype of a human with trisomy 21 cannot be expressed in that form as this person has $2n+1$ chromosomes.

The emphasized text isn't necessarily true - there are certainly more involved ways to describe the karyotype of a given individual or cell line. For instance, take the GM12878 cell line, likely the most well-defined cell line in terms of genetics, epigenetics, and chromatin biology today. It's karyotype is listed as:

46,XX[23].arr[hg19] 9p13.1(38,787,480-40,911,212)x3

This is a fairly common way of representing changes known as copy number variation, wherein only a small region of a given chromosome is duplicated or deleted. For this cell line, which relatively closely mimics lymphoblastic B cells, the genome is nearly normal (46 chromosomes, XX) with only a small amplification on chromosome 9 (9p13.1(38,787,480-40,911,212)x3). This is a mild example, but smaller copy number variations like this are quite common and can amplify or delete both large chunks or very focal regions of genomes.

But to answer your first question, no, there are no known non-deleterious trisomies in autosomes. Trisomy 13 (Patau's syndrome) and trisomy 18 (Edward's syndrome) children can live to birth, but die within the first few months of life. Unsurprisingly, these chromosomes are also the smallest chromosomes with respect to the number of transcripts that they encode 1. This suggests that the additional amount of genetic material determines the severity of the defects, and this trend also holds true in mice. From a figure caption of the cited paper (which is likely behind a paywall for those without institutional access):

A correlation between degree of aneuploidy and organismal fitness in humans and >mice. (A) The number of known human transcripts/chromosome in humans. Trisomies >below the line develop to birth. Trisomies above the line are embryonic lethal. Only those chromosomes containing the least amount of transcripts survive birth. (B) In mice, survival of the embryo is inversely correlated with the size of the chromosome that is present in three copies. Linear regression analysis fits the data with an R2 of 0.71.

I was unable to find any examples for your second question. Cancer cells can get very messed up in terms of copy number, but I've never seen anything about heritable aneuploidy.

Where monomorphism means having only one form and dimorphism means there are only two forms, the term polymorphism is a very specific term in genetics and biology. The term relates to the multiple forms of a gene that can exist.

Instead, polymorphism refers to forms that are discontinuous (have discrete variation), bimodal (having or involving two modes), or polymodal (multiple modes). For example, earlobes are either attached, or they are not—it is an either/or trait.

Height, on the other hand, is not a set characteristic. It varies by genetics, but not in the manner you may think.

Genetic polymorphism refers to the occurrence of two or more genetically determined phenotypes in a certain population, in proportions that the rarest of the characteristics cannot be maintained just by recurrent mutation (a general frequency of mutation).

Polymorphism promotes diversity and persists over many generations because no single form has an overall advantage or disadvantage over the others in terms of natural selection.

Originally used to describe visible forms of genes, polymorphism is now used to include cryptic modes such as blood types, which require a blood test to decipher.


Its short generation time and large number of progeny have made Drosophila melanogaster one of the best model systems for performing large-scale genetic screens for mutations that affect a given process. For example, the classic screens for mutations that affect the patterning of the embryo led to the discovery of almost all of the genes that control segmentation [1]. Similar screens have identified many of the factors that mediate the development of the nervous system [2], and enhancer and suppressor screens have proved a valuable approach for finding novel components of signal transduction pathways [3,4]. The range of forward genetic screens has been significantly advanced by the adaptation of the yeast flp/FRT recombination system to generate mitotic clones of cells that are homozygous for a particular mutagenized chromosome arm [5]. This technique makes Drosophila the only multicellular organism in which it is possible to perform phenotypic screens for mutations that affect the behavior of almost any cell at any stage of development [6,7,8].

The most common mutagens used in Drosophila are P elements, which generate mutations by inserting into genes, and chemicals, such as ethyl methyl sulpfnate (EMS), which modifies bases in the DNA to cause mainly single base substitutions, also known as point mutations [9]. The main advantage of P elements is that it is very straightforward to identify the gene that has been mutated, but P elements are relatively inefficient mutagens. The majority of Drosophila genes are predicted to be 'cold spots' for P-element insertion [10], and P-element insertions have been recovered in only a fifth of the genes within a 2.9 megabase (Mb) region that has been thoroughly characterized [11]. This makes P elements a poor mutagen for saturation screens that set out to identify most of the genes required for a particular process. In contrast, chemical mutagens cause a much higher mutation rate, and have less bias in their ability to cause mutations in different loci. The main drawback of these mutagens, however, is that it is not trivial to map point mutations to specific genes, and this is often the rate-limiting step in the analysis.

Two complementary approaches have been standard in mapping point mutations: mapping by meiotic recombination between visible genetic markers, and deletion mapping, in which the mutation is positioned by its failure to complement deficiencies. However, mapping a mutation between widely spaced markers can only give a statistical estimate of its position, and existing collections of deficiencies do not cover the whole genome. Furthermore, the exact positions of many of the visible markers and the breakpoints of the deletions have often only been inferred from cytological and genetic data, and this can make it difficult to link the genetic and molecular maps [12]. Various other approaches have been used to refine the location of the gene, such as P-mediated male recombination [13], or meiotic recombination between two P elements that flank the region containing the mutation. However, several rounds of P-mediated recombination are often necessary to narrow down the region sufficiently for candidate genes to be tested.

In other organisms, which lack the many visible genetic markers and deletions of Drosophila, mutations have been commonly mapped using molecular polymorphisms, such as restriction fragment length polymorphisms (RFLPs) or SNPs [14,15]. These have the advantage that they directly link the genetic and physical maps of the chromosome, and can provide a very high density of markers that allow the precise mapping of mutations. For example, the sequencing of the human genome has revealed a very large number of SNPs that cover the genome at a density of one per 1.9 kb, and this will greatly facilitate the mapping of both single-gene and polygenic disease loci [16]. This approach has also been employed to a limited extent in Drosophila, but up till now the difficulty and expense of discovering polymorphisms at the DNA level has confined its use to the mapping of mutations within small regions between closely linked markers [17,18].

The recent completion of the Drosophila genome sequence [19] makes it possible to search for molecular polymorphisms much more efficiently. It should now be possible to discover enough SNPs to allow the rapid mapping of point mutations within entire chromosome arms. Teeter et al. [20] and Hoskins et al. [21] have already generated a collection of SNPs that are polymorphic between different inbred wild-type strains. Here we report a high-density map for chromosome arm 3R that has been specifically designed for the rapid mapping of mutations from clonal screens that use the standard FRT chromosome. This approach has two advantages over the use of SNP maps derived from wild-type lines. First, most genetic screens are performed in more complex genetic backgrounds that cannot be traced back to a specific wild-type isolate, and it is therefore more convenient to have a collection of SNPs that can be used directly on the mutagenized chromosomes. Second, by screening for polymorphisms between this chromosome and a standard third chromosome that carries four visible mutations on 3R, we have developed a hybrid mapping strategy that exploits both traditional and molecular markers. This reduces the cost of SNP mapping by a factor of four, and makes it affordable for even small Drosophila laboratories. Using this strategy, a mutant can be mapped within two months to a region of about 50 kb with a single meiotic recombination cross.

11.11A: MHC Polymorphism and Antigen Binding

Major histocompatibility complex (MHC) is a cell-surface molecule encoded by a large gene family in all vertebrates. MHC molecules display a molecular fraction called an epitope and mediate interactions of leukocytes with other leukocytes or body cells. The MHC gene family is divided into three subgroups&mdashclass I, class II, and class III. Diversity of antigen presentation, mediated by MHC classes I and II, is attained in multiple ways:

  1. The MHC&rsquos genetic encoding is polygenic,
  2. MHC genes are highly polymorphic and have many variants,
  3. Several MHC genes are expressed from both inherited alleles (variants).

MHC gene families are found in all vertebrates, though they vary widely. Chickens have among the smallest known MHC regions (19 genes).

In humans, the MHC region occurs on chromosome 6. Human MHC class I and II are also called human leukocyte antigen (HLA). To clarify the usage, some of the biomedical literature uses HLA to refer specifically to the HLA protein molecules and reserves MHC for the region of the genome that encodes for this molecule, but this is not a consistent convention.

Figure: HLA MHC complex: The human leukocyte antigen (HLA) system is the name of the major histocompatibility complex (MHC) in humans. The super locus contains a large number of genes related to immune system function in humans. This group of genes resides on chromosome 6, encodes cell-surface antigen-presenting proteins and has many other functions.

The most intensely-studied HLA genes are the nine so-called classical MHC genes: HLA-A, HLA-B, HLA-C, HLA-DPA1, HLA-DPB1, HLA-DQA1, HLA-DQB1, HLA-DRA, and HLA-DRB1. In humans, the MHC is divided into three regions: classes I, II, and III. The A, B, C, E, F, and G genes belong to MHC class I, whereas the six D genes belong to class II. MHC genes are expressed in co-dominant fashion. This means that the alleles inherited from both progenitors are expressed in an equivalent way. As there are 3 Class-I genes, named in humans HLA-A, HLA-B and HLA-C, and as each person inherits a set of genes from each progenitor, that means that any cell in an individual can express 6 different types of MHC-I molecules.

In the Class-II locus, each person inherits a couple of genes HLA-DP (DPA1 and DPA2, which encode &alpha and &beta chains), a couple of genes HLA-DQ (DQA1 and DQA2, for &alpha and &beta chains), one gene HLA-DR&alpha (DRA1) and one or two genes HLA-DR&beta (DRB1 and DRB3, -4 o -5). That means that one heterozygous individual can inherit 6 or 8 Class-II alleles, three or four from each progenitor.

The set of alleles that is present in each chromosome is called MHC haplotype. In humans, each HLA allele is named with a number. Each heterozygous individual will have two MHC haplotypes, one in each chromosome (one of paternal origin and the other of maternal origin).

The MHC genes are highly polymorphic this means that there are many different alleles in the different individuals inside a population. The polymorphism is so high that in a mixed population (non-endogamic) there are not two individuals with exactly the same set of MHC genes and molecules, with the exception of identical twins.

The polymorphic regions in each allele are located in the region for peptide contact, which is going to be displayed to the lymphocyte. For this reason, the contact region for each allele of MHC molecule is highly variable, as the polymorphic residues of the MHC will create specific clefts in which only certain types of residues of the peptide can enter. This imposes a very specific link between the MHC molecule and the peptide, and it implies that each MHC variant will be able to bind specifically only those peptides that are able to properly enter in the cleft of the MHC molecule, which is variable for each allele. In this way, the MHC molecules have a broad specificity, because they can bind many, but not all, types of possible peptides.

The evolution of the MHC polymorphism ensures that a population will not succumb to a new pathogen or a mutated one, because at least some individuals will be able to develop an adequate immune response to win over the pathogen. The variations in the MHC molecules (responsible for the polymorphism) are the result of the inheritance of different MHC molecules, and they are not induced by recombination, as it is the case for the antigen receptors.

Because of the high levels of allelic diversity found within its genes, MHC has also attracted the attention of many evolutionary biologists.

The basic principles of SNP array are the same as the DNA microarray. These are the convergence of DNA hybridization, fluorescence microscopy, and solid surface DNA capture. The three mandatory components of the SNP arrays are: [3]

  1. An array containing immobilized allele-specific oligonucleotide (ASO) probes.
  2. Fragmented nucleic acid sequences of target, labelled with fluorescent dyes.
  3. A detection system that records and interprets the hybridization signal.

The ASO probes are often chosen based on sequencing of a representative panel of individuals: positions found to vary in the panel at a specified frequency are used as the basis for probes. SNP chips are generally described by the number of SNP positions they assay. Two probes must be used for each SNP position to detect both alleles if only one probe were used, experimental failure would be indistinguishable from homozygosity of the non-probed allele. [4]

A SNP array is a useful tool for studying slight variations between whole genomes. The most important clinical applications of SNP arrays are for determining disease susceptibility [5] and for measuring the efficacy of drug therapies designed specifically for individuals. [6] In research, SNP arrays are most frequently used for genome-wide association studies. [7] Each individual has many SNPs. SNP-based genetic linkage analysis can be used to map disease loci, and determine disease susceptibility genes in individuals. The combination of SNP maps and high density SNP arrays allows SNPs to be used as markers for genetic diseases that have complex traits. For example, genome-wide association studies have identified SNPs associated with diseases such as rheumatoid arthritis, [8] prostate cancer, [9] A SNP array can also be used to generate a virtual karyotype using software to determine the copy number of each SNP on the array and then align the SNPs in chromosomal order. [10]

SNPs can also be used to study genetic abnormalities in cancer. For example, SNP arrays can be used to study loss of heterozygosity (LOH). LOH occurs when one allele of a gene is mutated in a deleterious way and the normally-functioning allele is lost. LOH occurs commonly in oncogenesis. For example, tumor suppressor genes help keep cancer from developing. If a person has one mutated and dysfunctional copy of a tumor suppressor gene and his second, functional copy of the gene gets damaged, they may become more likely to develop cancer. [11]

Other chip-based methods such as comparative genomic hybridization can detect genomic gains or deletions leading to LOH. SNP arrays, however, have an additional advantage of being able to detect copy-neutral LOH (also called uniparental disomy or gene conversion). Copy-neutral LOH is a form of allelic imbalance. In copy-neutral LOH, one allele or whole chromosome from a parent is missing. This problem leads to duplication of the other parental allele. Copy-neutral LOH may be pathological. For example, say that the mother's allele is wild-type and fully functional, and the father's allele is mutated. If the mother's allele is missing and the child has two copies of the father's mutant allele, disease can occur.

High density SNP arrays help scientists identify patterns of allelic imbalance. These studies have potential prognostic and diagnostic uses. Because LOH is so common in many human cancers, SNP arrays have great potential in cancer diagnostics. For example, recent SNP array studies have shown that solid tumors such as gastric cancer and liver cancer show LOH, as do non-solid malignancies such as hematologic malignancies, ALL, MDS, CML and others. These studies may provide insights into how these diseases develop, as well as information about how to create therapies for them. [12]

Breeding in a number of animal and plant species has been revolutionized by the emergence of SNP arrays. The method is based on the prediction of genetic merit by incorporating relationships among individuals based on SNP array data. [13] This process is known as genomic selection.

Polymorphism in number of chromosomes? - Biology

Zoom along a three-dimensional rendering of 650,000 nucleotides of human chromosome 11 to see how little actually encodes protein. Take a guided tour of less than 1% of your genetic material to see new and unusual views of your chromosomal landscape.

Just as we chart the world around us, we can map human chromosomes. The features of chromosomes can include protein-coding genes, ancient molecular parasites known as transposons, or stretches of repeat sequences.

We will now take a tour of about 650,000 nucleotides from the tip of the short arm of human Chromosome 11. This is equal to about half of one percent of the entire chromosome and about 1/5,000th of the human genome. From a distance we can discern 28 genes, denoted by red and yellow blocks. The red "exons" carry the DNA code for protein, while the yellow "introns" are noncoding. Also prominent are more than 500 transposons, or jumping genes, denoted by blue and purple blocks. If we zoom in, we can take a closer look at the structure of this chromosome region. We first encounter a cluster of five small genes, averaging about 1,500 nucleotides in length. These encode components of hemoglobin, the oxygen carrying molecule of the blood. Beta globin is a common component of adult blood, and a mutation to a single nucleotide in this gene is responsible for sickle cell anemia. Delta globin, a minor component of adult blood, is followed by a nonfunctional copy of beta globin, termed a pseudogene. Gamma and epsilon globins are expressed in the embryo and fetus.

(1:25) Next we encounter two small genes that encode olfactory receptors, common features of Chromosome 11. These are followed by an "intergenic region" of 183,000 nucleotides, lacking any known genes. Scattered throughout this region are numerous "simple repeats" composed of multiple copies of a repeated sequence of 2-50 nucleotides. Two green blocks identify repeats longer than 100 nucleotides. Variations in the number of repeats between people, create a DNA difference, or "polymorphism," which can be used in forensic biology, paternity testing, or disease diagnosis. Blue and purple boxes identify the more than 100 transposons that litter the intergenic region. These molecular parasites make up about half of the human genome by weight, and the majority move about using an enzyme that was later borrowed by viruses such as HIV. Each of the millions of transposons in the human genome arose from an individual "jump" at some point in evolution. The majority of transposons have not jumped for millions of years and, thus, are "molecular fossils." As we will see, the majority of transposons are located within gene clusters and even within genes &ndash a fact that perplexes scientists.

(02:57) The intergenic region is followed by two adjacent ubiquilin genes, which are involved in key cell processes, from replication to "programmed" death. Ubiquilin 3 is expressed specifically in the testis, where it is believed to help regulate sperm development. These are followed by a cluster of gene locations (LOC) thought to encode olfactory receptors, which receive stimuli in the nose to allow us to detect smells. At 31,110 nucleotides long, the first gene in this cluster, LOC120009, is the longest we will encounter on our journey. Its 11 coding exons are indicated in red, but most of its bulk comes from its yellow introns and 29 blue and purple transposons. However, the majority of olfactory receptors are short. The next four gene locations are more typical of olfactory receptors in having only one or two coding exons. About 60% of our smell receptors are nonfunctional. Presumably, humans have less need for smell in locating food and interacting socially. The mutations that inactivate many receptors vary among people, meaning that there is a DNA basis for the observation that some people can smell better than others! It also suggests that the loss of smelling acuity has occurred very recently in human evolution and is still ongoing.

(04:38) Next follows a cluster of four genes in the tripartite motif (TRIM) family. TRIM proteins contain three motifs, or structures, through which they bind to DNA to regulate gene activity. Averaging about 21,000 nucleotides and having about eight coding exons, the TRIM genes come very close to the average size of human genes. Different proteins can be produced by a single TRIM gene, by making different combinations of coding exons. TRIM 34 and 22 help mediate the antiviral activity of interferon and offer insight into the fight against HIV. Our tour ends with another cluster of nine olfactory receptor genes (LOC). Chromosome 11 contains about 40% the estimated 1,000 genes for olfactory receptors in the human genome. There is such a concentration of receptor genes at the tip of Chromosome 11 that this whole region could be called an olfactory supercluster, in which the beta globin, ubiquilin, and TRIM clusters are embedded.

chromosome 11, chromosome region, olfactory receptors, human chromosomes, human chromosome, forensic biology, cell anemia, beta globin, dna code, disease diagnosis, human genome, transposons, pseudogene, nucleotides, introns


The human genome consists of over 3 billion base pairs which reside in every nucleated cell of the body [1, 2]. The genome, which has remained well conserved throughout evolution, is at least 99.5 % identical between any two humans on the planet [3]. Modern genomic tools have revealed that it is more complex, diverse, and dynamic than previously thought, even though the genetic variation is limited to between 0.1 % [4𠄶] and 0.4 % [7] of the genome. Sequence variations, even in non-protein coding regions of the DNA, have begun to alter our understanding of the human genome. While some studies have linked certain variants to being predictive of disease susceptibility and drug response, the majority of diseases have a very complex genetic signature (reviewed in [8, 9]). Biomedical research is shifting towards understanding the functional importance of many such variations and their association with human diseases.

At the heart of these novel discoveries are the modern DNA sequencing tools, which continue to evolve at a rapid pace. The new sequencing technologies continue to become cheaper and more precise, and facilitate novel medical and biological breakthroughs all over the world [10, 11]. Scientific research has become nearly inconceivable without employing sequencing technology but, with the progress of technology and the increasing sequencing of individuals, a massive amount of data is being generated. However, any data without context and analysis is useless. The data from sequencing must be carefully annotated, securely stored, and easily accessible from repositories when needed. Such arduous tasks require functional collaboration among clinicians, researchers, and health professionals [12].

In a recent thread in the ResearchGate portal [13], an ongoing discussion on the difference between a mutation and a polymorphism elicited a response from more than three hundred participants from various scientific backgrounds. The variety of responses prompted us to write this document as a paper aimed at stimulating the discussion further and possibly finding a consensus on the usage of the terms mutation and polymorphism in the context of a reference sequence in a personal genome project.

Polymorphism for the number of tandemly multiplicated glycerol-3-phosphate dehydrogenase genes in Drosophila melanogaster

A 26-kilobase-pair region encompassing the sn-glycerol-3-phosphate dehydrogenase (sn-glycerol-3-phosphate:NAD+ 2-oxidoreductase, EC locus in Drosophila melanogaster from two natural populations in Japan was surveyed by restriction mapping. Both tandem duplications and triplications in this region were found in both populations. Detailed analysis of 86 chromosome 2 lines revealed restriction site and allozyme polymorphisms in the transcriptional unit: two restriction sites and the allozymes [fast (F) or slow (S)] were polymorphic among both duplication-bearing chromosomes and those carrying the standard sequence. This finding suggests recurrent recombination and/or gene conversion in this 5-kilobase-pair region. The differences observed for restriction site and allozyme haplotypes among the triplicated sequence both within and between populations, together with the distribution in natural populations, suggest a relatively recent ancestry of the triplication events and an independent origin in respective populations. Such events may represent the process of the formation of multigene families [compare Ohta, T. (1987) Genetics 115, 207-213]. Finally, the evolution of this type of polymorphism is discussed.

Gene Density and Human Nucleotide Polymorphism

Bret A. Payseur, Michael W. Nachman, Gene Density and Human Nucleotide Polymorphism, Molecular Biology and Evolution, Volume 19, Issue 3, March 2002, Pages 336–340,

Population genetics theory indicates that natural selection will affect levels and patterns of genetic variation at closely linked loci. Background selection ( Charlesworth, Morgan, and Charlesworth 1993 ) proposes that the removal of recurrent deleterious mutations and associated neutral variants will cause a reduction of nucleotide variation in low-recombination regions. The strength of background selection depends on the deleterious mutation rate, the magnitude of selection and dominance, and the recombination rate. Genetic hitchhiking ( Maynard Smith and Haigh 1974 ), the fixation of advantageous alleles and the associated fixation of linked neutral alleles, can also decrease nucleotide diversity in low-recombination regions. The extent of genetic hitchhiking depends on the strength of selection and the rate of recombination. Therefore, under both background selection and genetic hitchhiking, theory predicts that genomic regions that rarely recombine may be subject to reductions in nucleotide diversity. Furthermore, if the rate of deleterious mutation or selective sweeps (or both) is sufficiently high, background selection ( Hudson and Kaplan 1995 ) and genetic hitchhiking ( Wiehe and Stephan 1993 ) models predict an overall positive correlation between nucleotide polymorphism and recombination rate.

Empirical investigations of nucleotide variation support these predictions. In Drosophila melanogaster, regions of the genome with little recombination show reduced heterozygosity ( Aguade, Miyashita, and Langley 1989 Begun and Aquadro 1991 Berry, Ajioka, and Kreitman 1991 ). Furthermore, there is evidence that nucleotide variation and recombination rate are positively correlated in several taxa, including fruit flies ( Begun and Aquadro 1992 ), house mice ( Nachman 1997 ), goatgrasses ( Dvorak, Luo, and Yang 1998 ), sea beets ( Kraft et al. 1998 ), tomatoes ( Stephan and Langley 1998 ), humans ( Nachman et al. 1998 Przeworski, Hudson, and Di Rienzo 2000 Nachman 2001 ), and maize ( Tenaillon et al. 2001 ). The combination of theoretical and empirical results indicates that selection acting at linked sites is likely to be a major force shaping genomic patterns of nucleotide variation.

The documented relationship between nucleotide variation and recombination rate raises the question of whether other measurable variables can explain additional variation in nucleotide polymorphism in the context of selection at linked sites. We predict that the effects of selection at linked sites will depend on local gene density. If selection acts primarily on genes, genomic regions with high gene density will harbor more potential selective targets than genomic regions with low gene density. This prediction should be valid irrespective of whether positive or purifying selection is driving observed patterns. Humans provide a good system in which this prediction can be tested, for two reasons. First, gene density varies substantially across the genome ( International Human Genome Sequencing Consortium 2001 Venter et al. 2001 ). For example, sequence data suggest that chromosome 19 has an average of 23 genes per Mbp, whereas chromosome 4 averages only 6 genes per Mbp ( Venter et al. 2001 ). Second, estimates of nucleotide polymorphism assessed using reasonable sample sizes are available for multiple loci across the human genome. Here, we demonstrate that nucleotide diversity and gene density are negatively correlated in humans. This result provides further evidence for the importance of selection at linked sites and suggests that the number of genes in a genomic region is a reasonable indicator of selective intensity.

We assessed the relationship between nucleotide polymorphism (measured by Watterson's θ̂ [1975]) and gene density using data from sequence-based studies of variation that sampled more than 10 chromosomes ( table 1 ). The variance in θ̂ can be quite large with sample sizes smaller than 10 ( Pluzhnikov and Donnelly 1996 ).

For X-linked loci, nucleotide diversity was multiplied by 4/3 to account for the fact that the effective population size of the X chromosome is ¾ of that of the autosomes (assuming a sex ratio of 1). Sequence-based maps of the human genome (, June 2001 Version) were used to estimate the base pair position of each locus. Gene density was estimated by counting the number of genes in a window, including 1 Mbp of sequence on either side of each locus ( Gene density estimates based on a 10-Mbp window gave similar results. Recombination rates were taken from Payseur and Nachman (2000) , who compared the genetic and physical positions of microsatellites spaced at approximately 2-Mbp intervals. Recombination rates for X-linked loci were multiplied by ⅔ to correct for differences in population recombination rates. All variables were approximately normally distributed (visual inspection of histograms Shapiro-Wilks goodness-of-fit test P > 0.05). Additionally, the residuals from the regression of θ̂ on all variables were normally distributed (P > 0.05). All analyses were done using least-squares linear regression.

Nucleotide polymorphism and recombination rate are strongly, positively correlated (R 2 = 0.63 P = 0.0002 fig. 1a ) for these data, despite no evidence for a positive relationship between divergence and recombination rate (P > 0.05). Comparing the residuals of the regression of nucleotide polymorphism on recombination rate with gene density reveals a significant negative association (R 2 = 0.25 P = 0.04 fig. 1b ). As predicted, nucleotide polymorphism is reduced in regions with higher gene density, once recombination rate variation is taken into account. A model including both recombination rate and gene density as independent variables explains 68% (adjusted R 2 P = 0.0001 recombination rate: P = 0.0001 gene density: P = 0.05) of the variation in nucleotide polymorphism. There is weak evidence for a negative association between nucleotide polymorphism and gene density alone (R 2 = 0.17 P = 0.10). There is no evidence of a statistical interaction between recombination rate and gene density, although such an interaction would be difficult to detect with our small sample size. We also asked whether an alternative measure of nucleotide variation, the average pairwise divergence between sequences, π̂ ( Nei and Li 1979 ), is associated with gene density. There is a slight trend toward a negative relationship, but it is not statistically significant (P > 0.05 in all tests). θ̂ and π̂ incorporate different aspects of the data in their estimates of nucleotide diversity. Whereas θ̂ is estimated by counting the number of segregating sites in the total sample, π̂ is estimated by comparing all the pairwise sequence combinations and calculating the average number of differences. As a result, π̂ contains information about allele frequencies and θ̂ does not. However, θ̂ has a lower sampling variance than π̂. Using the average number of sampled chromosomes (n = 124) and the average θ̂ or π̂ value (approximately 0.1%) for the studies included in our analysis, under an infinite sites model with no recombination, the sampling variance of π̂ (0.034%) is nearly twice that of θ̂ (0.019%). Although this effect may be ameliorated by recombination ( Pluzhnikov and Donnelly 1996 ), the increased statistical difficulty in estimating π̂ may contribute to our failure to detect an association between gene density and π̂.

An alternative interpretation of our results is that nucleotide polymorphism is shaped by other variables that are correlated with gene density or recombination rate. GC content is positively correlated with both gene density ( International Human Genome Sequencing Consortium 2001 ) and recombination rate ( Fullerton, Bernardo Carvalho, and Clark 2001 ) in humans. Consequently, we asked whether GC content was associated with nucleotide polymorphism alone or once gene density and recombination rate had been taken into account. There is no evidence that GC content affects levels of polymorphism in these data (P > 0.05, bivariate and multiple linear regression analyses), although a weak correlation between SNP (single nucleotide polymorphism) heterozygosity and GC content in humans has been reported ( International SNP Map Working Group 2001 ). This discrepancy may be because of the relatively small number of loci used in our study.

Several conclusions follow from these results. First, natural selection at the molecular level has a pronounced effect on the levels of nucleotide heterozygosity in humans. Even if the total number of sites under selection is relatively modest, it is clear that the effects on linked, neutral variation can be substantial. It remains to be seen whether the patterns depicted in figure 1 are driven mainly by positive selection and associated genetic hitchhiking, purifying selection, or some combination of both. Background selection and genetic hitchhiking are not mutually exclusive, and it seems likely that both processes may be contributing to observed patterns ( Kim and Stephan 2000 ). Second, these results suggest that the density of genes is a reasonable indicator of the potential for selection and that genes are likely the targets of selection in many cases. However, the high degree of sequence conservation between human and mouse in intergenic regions suggests that many of these intergenic regions may also be functional, possibly containing important cis-regulatory elements ( Shabalina et al. 2001 ). The degree to which the densities of genes and cis-regulatory elements covary is therefore an interesting question for further investigation. Finally, our results indicate that levels of human nucleotide polymorphism can be predicted with reasonable precision, given the knowledge about local recombination rate and gene density. Because recombination rate and gene density can now be measured throughout the human genome, this predictive ability could assist efforts to map genes underlying complex diseases.

Polymorphism in number of chromosomes? - Biology

Please note that this Glossary is work in progress. Do you encounter missing terms or want to suggest definitions, please let us know.

  • 3’rule for all descriptions the most 3’ position possible of the reference sequence is arbitrarily assigned to have been changed. When ATTTG changes to ATTG HGVS describes this as a change of the T at position 4 (not the T at position 2 or 3)
  • allele variant forms of the same gene (MESH) HGVS: a series of variants on one chromosome. descriptions see RecommendationsDNA, RNA or protein.
  • amino acid a letter from the protein code (see Standards).
  • cap site first nucleotide of a transcript (5’ end) to which a specially altered nucleotide is added.
  • break point the site where two sequences which are in different positions in the reference sequence are joined as a consequence of genomic rearrangement (Structural Variant)
  • cDNA cDNA, “copy DNA” or “complementary DNA”, is the DNA copy of a single stranded RNA molecule synthesized using the enzyme reverse transcriptase (Wikipedia, MESH). NOTE: cDNA is not the same as “coding DNA” (see below).
  • CDS coding DNA sequence, a sequence translated in to an amino acid sequence (protein).
  • chimerism the occurrence in one individual of two or more cell populations, derived from different zygotes, with different sequences (based on MESH). Opposite of mosaicism. descriptions see General/Charcters used.
  • cis two variants are “in cis” when they are on the same allele (DNA molecule, chromosome).
  • CNV copy number variant (CNV), a variant in a genome where the number of copies of a large stretch of DNA differs from that in the reference genome a copy can be missing (deleted) or be present more then once (duplicated, triplicated, …, or amplified). NOTE: a “large stretch” is not defined precisely but usually covers at least an exon of a gene or 1,000 nucleotides or more. alias CNP (copy number polymorphism)
  • coding DNA the segments of a genome or segment of a transcript (RNA molecule) which codes for a protein.
  • coding DNA reference sequence a DNA reference sequence (see Reference Sequence), based on a protein-coding transcript of a gene, which can be used for nucleotide numbering using the “c.” prefix. Such a reference sequence includes the coding DNA sequence (CDS) and the 5’ and 3’ UTR regions. NOTE: a coding DNA reference sequence is not a cDNA sequence (see above)
  • complex HGVS: a sequence change where, compared to a reference sequence, a range of changes occur that can not be described as one of the basic variant types (substitution, deletion, duplication, insertion, conversion, inversion, deletion-insertion, or repeated sequence).
  • compound heterozygote used in cases of autosomal recessive disease where the disease-causing variants on both alleles at a given locus are not identical (opposite of homozygous)
  • conversion HGVS-DNA: a sequence change where, compared to a reference sequence, a range of nucleotides are replaced by a sequence from elsewhere in the genome. NOTE: conversion variants are described as a Deletion-Insertion (see DNA or RNA).
  • Crick strand see plus (+) strand.
  • deletion
    • one or more letters of the DNA code are missing (deleted). A deletion is indicated using a “del”
    • HGVS-DNA: a sequence change where, compared to a reference sequence, one or more nucleotides are not present (deleted). descriptions see RecommendationsDNA, RNA or protein.
    • one or more letters in the DNA code are missing and replaced by several new letters
    • HGVS-DNA: a sequence change where, compared to a reference sequence, one or more nucleotides are replaced by one or more other nucleotides and which is not a substitution, inversion or conversion.. descriptions see RecommendationsDNA, RNA or protein.
    • one or more letters of the DNA code are present twice (doubled, duplicated)
    • HGVS-DNA: a sequence change where, compared to a reference sequence, a copy of one or more nucleotides are inserted directly 3’ of the original copy of that sequence. NOTE: diagnostic assays (like MLPA) usually detect an additional copy of a specific sequence. Whether the additional copy is a duplication or an insertion remains to be determined. descriptions see RecommendationsDNA, RNA or protein.
    • one or more letters in the DNA, RNA or amino acid code are new (have been inserted)
    • HGVS-DNA: a sequence change where, compared to the reference sequence, one or more residues are inserted and where the insertion is not a copy of a sequence immediately upstream. descriptions see RecommendationsDNA, RNA or protein.
    • a variant in which a codon is changed to one directing the incorporation of a different amino acid (based on MESH).
    • HGVS: a variant in a protein sequence where compared to the reference sequence one amino acid is replaced by another amino acid.
    • HGVS: confusing term, do not use, use variant (see Basics)
    • biology: a change in the sequence
    • medicine: a sequence variant associated with a disease phenotype.
    • a variant that changed an amino acid-specifying codon to a stop codon (termination codon, based on MESH).
    • HGVS: a variant in a protein sequence where compared to the reference sequence an amino acid is replaced by a translational stop codon (termination codon).
    • polymorphism NOTE: please do not use this term, see Terminology.
      • HGVS: confusing term, do not use, use variant (see Basics)
      • biology: a sequence variant present in the population at a frequency of 1% or higher
      • medicine: a sequence variant not associated with a disease phenotype
      • a variant in a DNA sequence that does not change the amino acid sequence of the encoded protein (based on MESH).
      • HGVS: an amino acid residue in a protein sequence where compared to the reference sequence the DNA sequence changed but not the encoded amino acid.
      • one letter of the DNA, RNA or amino acid code is replaced (substituted) by one other letter
      • HGVS-DNA: a sequence change where, compared to a reference sequence, one residue is replaced by one other residue. descriptions see RecommendationsDNA, RNA or protein.
      • a chromosome abnormality characterized by chromosome breakage and transfer of the broken-off portion to a non-homologous chromosome (based on MESH)
      • HGVS: a sequence change where, compared to a reference sequence, from a specific nucleotide position (the break point) all nucleotides upstream derive from another chromosome then those down stream NOTE: a translocation occurs when two chromosomes break and the fragments rejoin with the non-homologous chromosome. A full description of a (reciprocal) translocation consists of 2 parts, one describing the first junction, the second describing the other junction (e.g. the chromosome 4X junction and the chromosome X4 junction)
      • translocation, balanced a translocation with an even exchange of DNA sequences and no segments deleted or duplicated
      • translocation, unbalanced a translocation with an uneven exchange of DNA sequences and segments being deleted or duplicated



      External Links

      • Human Genome Variation Society
      • Human Variome Project
      • Human Genome Organisation

      Contact Us

        Discussions regarding HGVS nomenclature are necessary in order to further improve them. What is listed on these pages represents the current consensus of the recommendations. We invite everybody to send us question, comments or examples of cases that are not yet covered, with a suggestion of how to describe these ( E-mail:VarNomen @ For specific questions, do not forget to mention the reference sequence used!
        Follow us on Facebook

      Watch the video: Chromosome Numbers During Division: Demystified! (August 2022).