A gene is a locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions and/or other functional sequence regions. The physical development and phenotype of organisms can be thought of as a product of genes interacting with each other and with the environment.A concise definition of a gene, taking into account complex patterns of regulation and transcription, genic conservation and non-coding RNA genes, has been proposed by Gerstein et al. "A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products".
Colloquially, the term gene is often used to refer to an inheritable trait which is usually accompanied by a phenotype as in ("tall genes" or "bad genes") -- the proper scientific term for this is allele.
In cells, genes consist of a long strand of DNA that contains a promoter, which controls the activity of a gene, and coding and non-coding sequence. Coding sequence determines what the gene produces, while non-coding sequence can regulate the conditions of gene expression. When a gene is active, the coding and non-coding sequence is copied in a process called transcription, producing an RNA copy of the gene's information. This RNA can then direct the synthesis of proteins via the genetic code. But some RNAs are used directly, for example as part of the ribosome. These molecules resulting from gene expression, whether RNA or protein, are known as gene products.
Genes often contain regions that do not encode products, but regulate gene expression. The genes of eukaryotic organisms can contain regions called introns that are removed from the messenger RNA in a process called splicing. The regions encoding gene products are called exons. In eukaryotes, a single gene can encode multiple proteins, which are produced through the creation of different arrangements of exons through alternative splicing. In prokaryotes (bacteria and archaea), introns are less common and genes often contain a single uninterrupted stretch of DNA, called a cistron, that codes for a product. Prokaryotic genes are often arranged in groups called operons with promoter and operator sequences that regulate transcription of a single long RNA. This RNA contains multiple coding sequences. Each coding sequence is preceded by a Shine-Dalgarno sequence that ribosomes recognize.
The total set of genes in an organism is known as its genome. An organism's genome size is generally lower in prokaryotes, both in number of base pairs and number of genes, than even single-celled eukaryotes. However, there is no clear relationship between genome sizes and complexity in eukaryotic organisms. One of the largest known genomes belongs to the single-celled amoeba Amoeba dubia, with over 670 billion base pairs, some 200 times larger than the human genome. The estimated number of genes in the human genome has been repeatedly revised downward since the completion of the Human Genome Project; current estimates place the human genome at just under 3 billion base pairs and about 20,000–25,000 protein-coding genes, with a article from 2007 giving a number of 20,488 plus perhaps 100 more yet to be discovered. The gene density of a genome is a measure of the number of genes per million base pairs (called a megabase, Mb); prokaryotic genomes have much higher gene densities than eukaryotes. The gene density of the human genome is roughly 12–15 genes per megabase pair.
Colloquially, the term gene is often used to refer to an inheritable trait which is usually accompanied by a phenotype as in ("tall genes" or "bad genes") -- the proper scientific term for this is allele.
In cells, genes consist of a long strand of DNA that contains a promoter, which controls the activity of a gene, and coding and non-coding sequence. Coding sequence determines what the gene produces, while non-coding sequence can regulate the conditions of gene expression. When a gene is active, the coding and non-coding sequence is copied in a process called transcription, producing an RNA copy of the gene's information. This RNA can then direct the synthesis of proteins via the genetic code. But some RNAs are used directly, for example as part of the ribosome. These molecules resulting from gene expression, whether RNA or protein, are known as gene products.
Genes often contain regions that do not encode products, but regulate gene expression. The genes of eukaryotic organisms can contain regions called introns that are removed from the messenger RNA in a process called splicing. The regions encoding gene products are called exons. In eukaryotes, a single gene can encode multiple proteins, which are produced through the creation of different arrangements of exons through alternative splicing. In prokaryotes (bacteria and archaea), introns are less common and genes often contain a single uninterrupted stretch of DNA, called a cistron, that codes for a product. Prokaryotic genes are often arranged in groups called operons with promoter and operator sequences that regulate transcription of a single long RNA. This RNA contains multiple coding sequences. Each coding sequence is preceded by a Shine-Dalgarno sequence that ribosomes recognize.
The total set of genes in an organism is known as its genome. An organism's genome size is generally lower in prokaryotes, both in number of base pairs and number of genes, than even single-celled eukaryotes. However, there is no clear relationship between genome sizes and complexity in eukaryotic organisms. One of the largest known genomes belongs to the single-celled amoeba Amoeba dubia, with over 670 billion base pairs, some 200 times larger than the human genome. The estimated number of genes in the human genome has been repeatedly revised downward since the completion of the Human Genome Project; current estimates place the human genome at just under 3 billion base pairs and about 20,000–25,000 protein-coding genes, with a article from 2007 giving a number of 20,488 plus perhaps 100 more yet to be discovered. The gene density of a genome is a measure of the number of genes per million base pairs (called a megabase, Mb); prokaryotic genomes have much higher gene densities than eukaryotes. The gene density of the human genome is roughly 12–15 genes per megabase pair.