MOLECULAR & CELLULAR NEUROBIOLOGY 
Master Course Cognitive Neuroscience - Radboud University, Nijmegen

 

INDEX

INTRODUCTION CELLS AND WITHIN CELLS IN A NUTSHELL GENOMICS MOLECULAR BIOLOGICAL RESEARCH METHODOLOGY NEURODEVELOPMENT  

 

Chapter 4: Genomics

  The genome Functional Genomics Genome-wide association studies (GWAS)
  Genomics research Pharmacogenomics Molecular networks
  The Human Genome and HapMap Projects Genetic variations: SNPs and CNVs  

 

 

Molecular networks

Common human diseases originate from a complex interplay between changes in DNA (both rare and common variations) and a broad range of factors such as diet, age, gender and exposure to environmental toxins/stressors. Given what must be considered a deluge of data of many different types flooding life sciences and biomedical research today, including genome-wide single-nucleotide polymorphism (SNP) genotyping data, whole-genome transcription data, next-generation DNA sequencing data, RNA sequencing data, chromatin immunoprecipitation (ChIP) sequencing data and image data, it is now time to begin addressing how these large-scale, high-dimensional data sets can be integrated to better understand the molecular networks underlying physiological states associated with disease. For this purpose, DNA variation, molecular profiling and clinical data are integrated in order to construct causal probabilistic networks of disease, providing a more comprehensive view of disease than can be achieved by examining the different data dimensions on their own. The predictive networks produced from this type of integrative modelling can help link molecular states to physiological ones, providing an alternative path for understanding how molecular states drive complex disease processes.

Genome-wide association studies (GWAS) provide insights into human diseases

Roughly three billion nucleotides make up the human genome, so the number of nucleotide changes that can affect the activities of genes is effectively infinite with respect to our ability to determine the effects of combinations of such changes experimentally. Therefore, exploiting naturally occurring DNA variation in human populations is among the most attractive approaches to inferring the constellation of genes that affect disease risk. For most diseases, changes in DNA that correlate with disease can be inferred as tagging or directly representing causal components of disease. Therefore, DNA variation directly elucidates disease aetiology and is extremely useful (Figure 1a). GWAS (see also under GWAS) are now well proven to uncover genetic loci that affect disease risk or progression.

To understand the behaviour of any one gene in the context of human disease, individual genes must be understood in the context of molecular networks that define the disease states. In fact, several studies have now shown that for single diseases or traits such as height, tens or even hundreds of genes may be involved but may not be randomly distributed with respect to biological function.

Constructing networks that underlie core biological processes associated with disease makes it possible to identify the functional units that respond to genetic perturbations and then in turn affect disease risk (Figure 1c). In this way, any given gene can be studied in the context of many different networks to learn whether one or more of the networks in which a given gene operates influences physiological states associated with the disease. Such mappings not only allow the identification of causal relationships among genes and between genes and more complex traits such as disease but also more generally allow the construction of predictive gene networks.

 

 

 

 

 

 

 

 

 

Figure 1. Hierarchy of causal relationships. a, Classic genetic association approaches seek to identify variations in DNA that correlate with disease state or with quantitative traits associated with disease. The attraction of this approach is the identification of the genetic causes of disease. b, Changes in DNA on their own do not lead to disease but, instead, lead to changes in molecular traits that go on to affect disease risk. By layering in molecular phenotypes as intermediate phenotypes, causal relationships between genes and disease can be established directly. c, Disease gene networks sense constellations of genetic and environmental perturbations. Therefore, a more realistic model is one in which constellations of genetic and environmental perturbations affect molecular states of networks that in turn affect disease risk.

 

Core subnetworks associated with disease provide a path directly linking molecular biology to physiology, and it is this link that may ultimately lead to a more significant clinical impact (Figure 2). Networks have now been modelled both within and between multiple tissues that are relevant to disease. The identification of subnetworks interacting between islet, adipose, liver, muscle and brain tissues has highlighted the importance of using a network framework directly to model physiological states associated with diabetes. Modelling cross-tissue networks has highlighted coherent subnetworks that were not part of any of the single-tissue networks but, instead, specific to cross-tissue interactions, showing that modelling molecular interactions operating between tissues is critical if we hope to understand physiological states associated with disease.

 

 

 

 

 

Figure 2. Linking molecular biology to physiology through molecular networks. a, Before the molecular biology revolution, disease was studied primarily in the context of physiology. b, As a result of the molecular biology revolution, physiology has played a less prominent role in the study of the molecular bases of disease, given the reductionist push to associate molecular changes in a given gene (affecting protein levels, activity or function) directly with changes in disease states. c, The complexity of molecular biology — given the ability to monitor DNA variation, RNA variation, metabolite variation and protein variation in populations on a comprehensive scale — has driven a systems view of disease, in which networks of interacting molecular entities are constructed to define physiological states of the system associated with disease. In this way, the molecular networks allow a direct link between molecular biology and clinical medicine by connecting molecular biology to physiology.

Whereas classic molecular biology provided very narrow views connecting molecular entities to disease, today's technologies allow the generation of comprehensive snapshots of living systems, which in turn allows a more systems-level view of the molecular states underlying physiological states associated with disease. In single experiments, we can now generate terabytes of genotype, sequence, gene expression, physiological and imaging data. The degree to which any one of these different data types informs our view of disease may vary, but these data types provide complementary views that are useful individually and potentially exceptionally valuable when considered collectively.

Disease-associated networks comprise hundreds of genes interacting in complex ways that collectively associate with physiological states such as fat mass, insulin levels and atherosclerotic-lesion size. Such networks may be indicated to cause variations in disease-associated traits and can also respond to (or sense) genetic and environmental variations that influence disease risk.

 

 

 

 

 

 

 

 

Different types of genetic variations are mapped to a phenotype network; strongly interconnected clusters (dark gray) are identified among disease-associated genes.

Perspectives

The disease-associated molecular networks that can be constructed today are necessarily based on grossly incomplete sets of data. Even given the ability to assay DNA and RNA variation in whole populations in a comprehensive manner, the information is not complete, because we are far from completely characterizing rare variation, DNA variation other than SNP and copy number, variation in non-coding RNA levels and variation in the different isoforms of genes in any sample, much less in entire populations. Beyond DNA and RNA, it is not possible with existing technologies to measure all protein-associated traits or all the interactions between proteins and DNA/RNA, metabolite levels and other molecular entities important to the functioning of living systems. Furthermore, the types of high-dimensional data we are able to generate routinely today in populations represent only a snapshot at a single time point, which may allow the identification of the functional units of the system under study and how these units relate to one another but does not allow a complete understanding of how the functional units are put together or the mechanistic underpinnings of the complex set of functions carried out by individual cells, by entire organs and by whole systems comprising multiple organs.

Technological advances, however, allow the generation of increasingly higher dimensional data, so we continue to progress towards a more complete understanding of human disease. The next-generation sequencing technologies are already having a major impact on DNA sequencing, identifying rare variations in tumour tissues associated with different cancer types. In addition, subsequent generations of sequencing technologies are on the horizon and promise to deliver the sequence of entire human genomes in days and at a reasonable cost. Sequencing technologies can also be used to identify patterns of methylation, to fully characterize the transcriptome and to identify transcripts that are being actively translated.The advances of the sequencing revolution therefore stand ready to provide unprecedented snapshots of complex systems that will allow a more accurate network view, which in turn will lead to models of disease that have greater predictive power.

The primary aims of generating and mining large-scale biological data sets are to learn the fundamental rules that govern complex living systems and to derive, as a result, predictive models of their behaviour. Without sophisticated mathematical algorithms capable of appropriately integrating the large-scale data, and without high-performance computing environments in which to apply these algorithms, it will be difficult to build generally predictive models. Information-systems support services will become increasingly critical both for building predictive models and for representing complex states of knowledge and making such knowledge accessible to researchers so that they may refine and correct the models of disease. Recent successes in programming machines to mine complex data to derive the fundamental laws of motion perhaps represent a glimpse into the future of biology, in which machines may be able to derive fundamental rules in complex living systems, given large-scale data sets.

 

Gene networks

 

Networks generally provide a convenient framework for exploring the context within which single genes operate. Networks are simply graphical models that comprise nodes and edges and are convenient for visualizing complex mathematical models that describe how variables of a system associate with one another in different contexts of interest. For gene networks associated with biological systems, the nodes in the network typically represent genes, gene products or other important molecular entities, and an edge between any two nodes indicates a relationship between the corresponding genes, gene products or other molecular entities.

Cells comprise many tens of thousands of proteins, metabolites, RNAs and DNAs, all interacting in complex ways. In turn, complex biological systems comprise many types of cells operating within and between the many types of tissue that make up different organ systems, all of which interact in complex ways to give rise to a vast array of phenotypes that manifest themselves in living systems. Modelling the extent of such relationships between molecular entities, between cells, and between organ systems is a daunting task. Networks are a convenient framework in which to represent the relationships among these different variables. In the context of biological systems, a network can be viewed as a graphical model that represents relationships among DNAs, RNAs, proteins, metabolites and higher-order phenotypes such as disease state. In this way, networks provide a way to visualize extremely large-scale, complex relationships among molecular and higher-order phenotypes in any given context

Biological networks comprise nodes, which represent molecular entities that are observed to vary in the population under study (for example DNA variations, RNA levels, protein states or metabolite levels). Edges between the nodes represent relationships between the molecular entities, and these edges can either be directed, indicating a cause–effect relationship, or undirected, indicating an association or interaction. For example, a DNA node in the network representing a given locus that varies in a population of interest may be connected to a transcript-abundance trait, indicating that changes at the particular DNA locus induce changes in the levels of the transcript. The potentially millions of such relationships represented in a network define the overall connectivity structure, or topology, of the network. The more classic pathway view represents molecular processes on an individual level, whereas networks represent global (population-level) metrics describing variations between individuals in a population of interest; these variations in turn define the coherent biological processes in the tissue or cells associated with the network.

 

See also: Bioinformatics - pathway analysis

 


Next page: Molecular biology and Recombinant DNA technology Go back to: GWAS