MOLECULAR & CELLULAR
NEUROBIOLOGY
Master Course Cognitive Neuroscience - Radboud
University, Nijmegen
|
|
Chapter 4: Genomics |
The genome | Functional Genomics | Genome-wide association studies (GWAS) | |
Genomics research | Pharmacogenomics | Molecular networks | |
The Human Genome and HapMap Projects | Genetic variations: SNPs and CNVs |
|
The Human Genome Project: purpose and goal
The Human Genome Project, the first large international effort in the history of biological research, was initiated on October 1, 1990, to be completed in the year 2005. However, with improvements in technology and competition from the private sector, the timetable was accelerated. A rough draft of 90% was completed in 2000, and the complete sequence became available in 2003. The Human Genome Project sequenced the DNA blueprint for the development of a single fertilized egg into a complex organism. However, while the overall objective was to sequence the human genome, other goals were completed along the way that markedly accelerated the efforts of all investigators involved in biological or medical research. The first goal was to develop a genetic map. This meant developing markers (unique DNA sequences) along each chromosome that would have a readily identifiable chromosomal position to provide highly informative signposts for the identification of nearby genes. This goal provided thousands of markers spaced 5 to 10 million base pairs apart, spanning the entire human genome, leading to the creation of a genetic “road map” for each chromosome. As will become evident in a future section of this text, it is the use of this genetic map, with DNA sequences (markers) of known positions (loci) along each chromosome, that enables the mapping of a gene’s chromosomal location by genetic linkage analysis. The tool of genetic linkage analysis led to the acceleration of mapping the position of numerous genes responsible for diseases. Currently over 1500 disease-causing genes are known, due to the more rapid identification of genes facilitated by the Human Genome Project. The policy of the Human Genome Project is that the entire human DNA sequence, including all identified genes, will be available to the public. Each gene, as it is sequenced, is entered into a publicly accessible database and available at no cost. In the United States, GenBank (at http://www.ncvi.nlm.nih.gov) is run by the National Center for Biotechnology Information (NCBI) and serves as the public repository of DNA sequence information. The results of the efforts of the publicly funded Human Genome Project consist of not only DNA sequences of the various genes but also the intervening sequences. Another goal was to develop a physical map of regions of the DNA that are expressed as genes. These markers are referred to as expressed sequence tags (ESTs) and contain short sequences of 200 to 300 bp. These sequences are unique and represent a fragment of a yet to be fully characterized specific gene. ESTs are generated by extraction of all of the mRNAs in a cell type, which represents all of the genes expressed at that time in that cell. The mRNA can be converted to cDNA with the enzyme reverse transcriptase and the sequences amplified by the polymerase chain reaction (PCR), from which unique sequences are selected and entered into GenBank as ESTs. The sequences of these ESTs are then matched to the plethora of sequences available in the DNA sequence repository. Thus, ESTs mapped to their chromosomal locations can be used as markers to identify novel genes responsible for disease. The development of this physical map has tremendously accelerated the efforts of investigators to identify novel genes, relevant to normal physiology or disease. These ESTs serve as candidate genes if a locus harboring a disease gene is mapped to a region; the ESTs in the region are potential candidate genes and greatly facilitate the identification of the gene of interest. |
Information from the draft human genome sequence
Click Public HGP mapping for a movie.
Click Shotgun sequencing & dealing with repeat sections for a movie.
By the numbers |
|
|
How it is arranged |
|
|
|
How the human compares with other organisms |
|
Click Chimp & humans diverge from a common ancestor for an animation. Click
Comparison human &
Neanderthaler
for an animation. |
|
Variations and mutations |
|
|
|
Applications, future challenges |
|
Deriving meaningful knowledge from the DNA sequence will define research through the coming decades to inform our understanding of biological systems. This enormous task will require the expertise and creativity of tens of thousands of scientists from varied disciplines in both the public and private sectors worldwide. The draft sequence already is having an impact on finding genes associated with disease. A number of genes have been pinpointed and associated with breast cancer, muscle disease, deafness, and blindness. Additionally, finding the DNA sequences underlying such common diseases as cardiovascular disease, diabetes, arthritis, and cancers is being aided by the human variation maps (SNPs) generated in the Human Genome Project in cooperation with the private sector. These genes and SNPs provide focused targets for the development of effective new therapies. One of the greatest impacts of having the sequence may well be in enabling an entirely new approach to biological research. In the past, researchers studied one or a few genes at a time. With whole-genome sequences and new high-throughput technologies, they can approach questions systematically and on a grand scale. They can study all the genes in a genome, for example, or all the transcripts in a particular tissue or organ or tumor, or how tens of thousands of genes and proteins work together in interconnected networks to orchestrate the chemistry of life.
|
|
Anticipated benefits |
|
|
|
Genomics timeline
1869 |
DNA first isolated |
1994 |
First GM food on the market: Flavr Savr tomato |
1909 |
Word gene is coined |
1996 |
Yeast genome sequenced |
1952 |
Genes are made of DNA |
1996 |
First mammal cloned - Dolly |
1953 |
DNA double helix described |
1997 |
E. coli genome sequenced |
1961 |
mRNA isolated |
1998 |
Roundworm C. elegans genome sequenced |
1966 |
Genetic code cracked |
2000 |
Fruit fly genome sequenced |
1972 |
First animal gene cloned |
2000 |
90% of human genome sequenced |
1981 |
First transgenic mice and fruit flies |
2003 |
Complete human genome sequenced |
1983 |
First disease gene mapped - Huntington |
||
1987 |
First human genetic map |
Translation of genomic information to future clinical practice |
As the annotation of the human genome becomes stable, a user-friendly, distilled view can be developed, as in the figure above. The diagram (a) of a chromosome 3 region (12,300–12,450 kb) contains the PPAR-g gene structure (dark blue) with an alternative promoter (light blue), hypothetical noncoding functional regions (green shaded boxes), and functional variants (red). Note that introns in the gene structure are scaled down relative to the exons. Zooming in on two sequence segments (b) shows the translated sequence with functional variants highlighted in blue (nucleotide changes) and pink (amino-acid changes). Amino-acid numbering includes the propeptide sequence. The variants (c, pink) can be viewed in the monomer protein structure (grey) in a linked database. Also shown is the binding position of an antidiabetic thiazolidinedione drug (blue), part of the other monomeric unit (green) of the dimeric receptor, and the ligand (yellow). Using linked information from a range of sources, a summary of the known, modelled or predicted biological consequences (such as biochemical, structural, medical or pharmacological) could be curated (and updated regularly) for each functional variant in tabular form (d). A small subset of this information would define the disease or drug outcome or side effect associated with each variant, would constitute specific risk information of value in clinical assessment, and would be exported (red outlined boxes). For maximum usefulness, therefore, the exported information would be subject to stringent filters and would include only data for which the medical relevance was well established for each particular disease discipline. For example, variants of uncertain significance would be excluded from the filtered risk information, although all data would be available in the public domain. All the information in a–d would be curated in the public domain. The use of personal genetic information in a clinical setting would be initiated or consented to by an individual. The individual sequence acquired could be as little as one or more individual genotypes, or as much as a complete genome sequence. The information would be private and owned by the individual, and might be stored electronically, protected by a high-security code requiring unique personal identifiers (such as multiple fingerprint identification) for access only with consent of the individual (e). The information might be taken either before consultation (as illustrated here) or afterwards, and in either case would be subject to counselling by the practitioner and consent by the individual. A specific investigation would be initiated by a consultation (f). The personal genetic information would then be supplied by the individual, for interpretation with respect to an agreed set of variants and/or a specific phenotype. The practitioner would use the available risk information concerning each variant to provide a genetic assessment for the individual (g). The top line refers to the variant featured in d and f; the second line is a hypothetical entry for a variant on another chromosome and does not represent a known variant. In the case illustrated, the individual has the heterozygous genotype TC at position 3: 12,450,610. This corresponds to having both Pro 495 and Ala 495 forms of the protein PPAR-g. This genotype confers an increased risk of insulin-resistant diabetes on the individual, and also resistance to the thiazolidinedione class of antidiabetic drugs. Combining this with risk information for other genotypes would help to inform subsequent clinical decisions (h). |
The HapMap project While the Human Genome Project was completed in 2003, other large-scale human genome projects continue. The sequence of the human genome differs by only 0.1% among human beings. This one-tenth of 1%, however, translates into 3 million bases. These 3 million bases are now considered to be responsible for essentially all of the human variation including predisposition or resistance to diseases. Thus, it became evident that identifying the sequence responsible for human variation would represent a major quest for the next decade. A great deal of human variation appears to be due to single-nucleotide polymorphisms (SNPs), which are distributed throughout the human genome occurring at a frequency on average of about one SNP per 1000 base pairs. While identifying the SNPs responsible for human variation and the mechanism whereby this sequence induces the change is of crucial importance, it is perhaps of even more immediate importance to identify those SNPs that predispose to disease. Their potential to facilitate diagnosis, prevention, and treatment could be enormous. The difficulty lies in how to identify those SNPs that predispose to disease. In searching for SNPs that predispose to disease, it is quite a different task than identifying mutations responsible for single-gene disorders. A particular SNP is neither necessary nor required for a particular disease and thus contributes only a small percentage of the predisposition to the disease. Inheriting several of these SNPs may give you an accumulative effect as expressed in the phenotype of a polygenetic disease. The diseases that ultimately must be understood are those diseases due to multiple genes that interact significantly with the environment such as cardiac diseases, cancer, and mental illness. In an effort to facilitate future studies identifying SNPs and their related phenotype in polygenetic diseases, a consortium was formed consisting of Canada, Japan, United Kingdom, China, Nigeria, and United States to sequence and identify SNPs. The overriding question was to determine whether SNPs were coinherited in blocks and, hence, the term haplotype and the HapMap Project. The results were published and do indeed indicate that several of the SNPs are coinherited as blocks and exert a combined effect and thus one could select SNPs that are tagged to other SNPs, making it practical to scan the genome utilizing 300,000 to 500,000 SNPs as opposed to several million. While each human being has only 3 million SNPs, in the general population it is estimated there are about 17 million. It would now appear that 500,000 SNP chips can be used for genome-wide scans, which significantly decreases the cost compared to having to utilize 2 or 3 million SNPs. One of the difficulties that continues to remain a challenge is the low frequency of occurrence of these SNPs. It would appear that many of the SNPs occur at a frequency of less than 5%, which makes detection by current technology very difficult. Common SNPs that occur with frequency of 5 or 10% can, however, be detected utilizing genome-wide scans with 500,000 SNPs as markers. It appears that probably only 50,000 to 100,000 SNPs are responsible for providing significant change in humans since most SNPs do not affect coding regions, although the percentage of SNPs present in noncoding promoter regions that may markedly influence transcription remains to be determined. See also under "Genetic variations: SNPs and CNVs". |
|
Next page: Functional genomics | Go back to: Genomics research |
|