Course Outline

Day 1 – Monday, June 3

9:15 – 10 a.m.

Refreshments, networking and computer setup

10 a.m. – noon

Lecture 1: Principles of GWA Analysis (Speaker: Dr. Jo Knight)

Jo Knight will give a lecture on GWA analysis accompanied by a practical. Topics covered will include the principles of association studies, types of data, quality control issues (e.g., population stratification and missingness) and visualization of results. Most of the practical will be carried out using plink and R.

1 – 4 p.m.

Lecture 2: Rare Variant Analysis I (Speaker: Dr. Gao Wang)

Overview of rare variant analysis in families and unrelated samples:
− Pipeline for cleaning and analyzing sequence data;
− Methods for rare variant analysis;
− Analysis of exome chip data.

Lab Practical
: Power calculations and association of rare variants using VASA software

Day 2 – Tuesday, June 4 

9 a.m. – noon

Lecture 3: From Hypothesis Free GWA Study to Hypothesis Driven GWA Study (Speaker: Dr. Lei Sun)

It is often the case that a GWA study successfully identifies one or a few susceptibility loci but the associated variants account for only a small proportion of the heritability. In the literature, efforts on identifying the missing heritability include, for example, analyzing imputed un-genotyped SNPs and copy-number variations (CNVs), exploring GxG interactions, and more recently generating next generation sequencing (NGS) data and studying rare variants. This lecture will focus on the hypothesis-driven GWA study (GWAS-HD) analytical approach that mines the existing GWA study data, improving power by incorporating available biological knowledge into the prioritization and interpretation of the initial GWA study results.

We start with a case study of identifying susceptibility loci for meconium ileus in Cystic Fibrosis (CF) patients (Sun et al., 2012). We show that the traditional GWA study (n=3763, Illumina 610 platform) was able to identify two genes (SLC6A14 and SLC26A9) associated with meconium ileus, however these low hanging fruits only account for ~5 per cent of the phenotypic variation. We then discuss the central issue of multiple hypothesis testing, including an overview of family-wise error rate and false discovery rate. Next, we revisit the CF example, introduce the GWAS-HD framework and discuss relevant statistical techniques including the stratified FDR control (Sun et al., 2006) and weighted p-value method (Roeder et al., 2006), and related software package, SFDR (Yoo et al., 2010).  Using these approaches, GWAS-HD analyzes all GWAS SNPs available but re-ranks the SNPs at the genome-wide level based on the weights determined by the pre-specified hypothesis. GWAS-HD then focuses on the set of prioritized SNPs only, and we discuss various analytical strategies for joint analyzing multiple (correlated) SNPs, including LASSO (Tibshirani, 1996), the aggregate risk score method (The International Schizophrenia Consortium, 2009) and other approaches. The application to the CF case study shows that the GWAS-HD framework can yield considerable amount of additional information than the standard GWA study approach and provide overall statistical evidence for the biological hypothesis itself. We conclude the lecture with discussions and take-home messages.


· Sun et al. (2012). Multiple apical plasma membrane constituents are associated with susceptibility to meconium ileus in individuals with cystic fibrosis.Nature Genetics 44:562-569.

· Sun, L, Craiu RV, Paterson AD and Bull SB (2006).Stratified false discovery control for large-scale hypothesis testing with application to genome-wide association studies. Genetic Epidemiology 30:519-530.

· Roeder, K., Bacanu, S.-A., Wasserman, L. and Devlin, B. (2006). Using linkage genome scans to improve power of association in genome scans. Am. J. Hum. Genet. 78 243–252.

· Yoo, YJ., Bull SB., Paterson AD., Waggott D. The DCCT/EDIC Research Group, Sun L (2010).Were genome-wide linkage studies a waste of time? Exploiting candidate regions within genome-wide association studies. Genetic Epidemiology 34:107-118.

· Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B, 58, 267–288.

· The International Schizophrenia Consortium (2009). Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460:748-752.

Lab Practical: TBA

1 – 4 p.m.

Lecture 4: From GWA Study to Prediction (Speaker: Dr. Cornelia van Duijn)

GWA studies are rapidly uncovering an increasing number of loci involved in complex diseases. With some exceptions such as age-related macular degeneration, for many traits includingdyslipidemia and Alzheimer's disease, it remains difficult to predict risks for the future based on these common variants. At present, the utility of genome tests is limited, predominantly because they lack predictive ability and clear benefits for disease prevention that are specific for genetic risk groups. In the near future, personal genome tests will likely be based on whole genome sequencing, but will these technological advances increase the utility of personal genome testing?

This lecture will discuss the utility of testing, which depends on the predictive ability of the test, the likelihood of actionable test results, and the options available for the reduction of risks. For monogenic forms of disease, it will be a challenge to recognize new causal variants among all rare variants that are found using sequencing. For complex diseases, the predictive ability of genetic tests will be mainly restricted by the heritability of the disease, but also by the genetic complexity of the disease etiology, which determines the extent to which the heritability can be understood. Given that numerous genetic and non-genetic risk factors may contribute to complex diseases, the predictive ability of genetic models will likely remain modest for most people. However, an important exception may be those with extreme phenotypes, e.g., extremely high levels of lipids, which are clinically most relevant. Another avenue for improving predictions will be identification of biomarkers using new -omic technology. A powerful approach will be to build the search for biomarkers on genetic studies. 

Day 3 – Wednesday, June 5 

9 a.m. – noon

Lecture 5: Penalized Regression and Maternal Effects (Speaker: Dr. Heather Cordell)
− Maternal and maternal-fetal effects and parent-of origin effects using different study designs (trios, case and control mother duos, large pedigrees) and a discussion of different software for assessing these effects;
− Using penalized regression in GWA and candidate gene studies including fine mapping and selection of SNPs within associated pathways.

Lab Practical: Analysis for maternal/fetal effects and imprinting.  Analysis using penalized regression.

1 – 4 p.m.

Lecture 6: 1,000 Genomes Imputation (Speaker: Dr. Bryan Howie)
An overview of the rationale, application and method of 1,000 Genomes-based imputation:
– Introduction to statistical methods for genotype imputation;
– Examples of how imputation can improve power and resolution in GWAS;
– Discussion of current and future reference panels for imputation;
– Best practices for imputation with data from the 1,000 Genomes Project;
– Strategies for quality control and association analysis of imputed genotypes.

Lab Practical: Example imputation analysis -- downloading software and 1,000 Genomes data, running imputation, and evaluating output files.

Day 4 – Thursday, June 6 

9 a.m. – noon

Lecture 7: Rare Variant Analysis II (Speaker: Dr. Andrew Morris)
GWA studies have been successful in identifying loci contributing effects to a range of complex human traits.  The majority of reproducible associations within these loci are with common variants, each of modest effect, which together explain only a small proportion of heritability.  It has been suggested that much of the unexplained genetic component of complex traits can thus be attributed to rare variation.  However, GWA study genotyping chips have been designed primarily to capture common variation, and thus are underpowered to detect the effects of rare variants. 

In this lecture, I demonstrate, by simulation, that imputation from an existing scaffold of genome-wide genotype data up to high-density reference panels has the potential to identify rare variant associations with complex traits, without the need for costly re-sequencing experiments.  This approach is then applied to genome-wide association studies of seven common complex diseases, imputed up to publicly available reference panels, and identifies genome-wide significant evidence of rare variant association in PRDM10 with coronary artery disease and multiple genes in the MHC with type 1 diabetes.  The results of these analyses highlight that GWA studies have the potential to offer an exciting opportunity for gene discovery through association with rare variants, conceivably leading to substantial advancements in our understanding of the genetic architecture underlying complex human traits.

Lab Practical: Students will test for association of type 1 diabetes with imputed rare variants within genes across the MHC, using data from The Wellcome Trust Case Control Consortium. Students will investigate the impact of: (i) the minor allele frequency threshold for inclusion of rare variants in the analysis; (ii) filtering rare variants on the basis of annotation; and (iii) gene boundary definition.

1 – 4 p.m.

Lecture 8: Epigenetic Measures (Speaker: Dr. Rafal Kustra)
Epigenetic modification refers to extra-DNA sequence changes to the organism's DNA which are now being studied together with phenotypes. By far the most common epigenetic modification studied is that of methylation, specifically Cytosine methylation, which this lecture is exclusively based on. Methylation is known to affect the regulation of gene expression and various experiments strongly suggest that site-specific methylation is partially heritable. This has led to explosion of experiments that search for 'epimutations' associated with clinical (and other) phenotypes, including diseases that are known to be heritable but for which few classical mutations have been found so far. This session will introduce few popular designs of genome-wide methylation studies and talk about statistical issues specific to their analysis. I will concentrate on studies based on tiling arrays that interrogate the whole genome with fairly regular density. Two basic designs allow the tiling arrays to measure either methylated or unmethylated fraction at each region, and we will discuss the difference in analysis between the two. One of the most important issues in analyzing such data is the sequence-specific effects – each probe on the array has a bias associated with its sequence – and we will discuss methods to correct for such biases. A similar, but more specific problem is related to the density of CG bisequences in a region which is a very strong confounder in methylation studies, and depending on study design may be very hard to deal with during analysis. Finally we will also discuss an attractive features of epigenomic studies – the availability of “gold standard” experiment which, while cost prohibitive to apply genome-wide, can directly confirm the results of primary analysis.

Software libraries based on R/Bioconductor will also be introduced and demonstrated, including rMAT and Starr. While there are few epigenetic-specific packages, some designed for generic tiling arrays or for genome-wide Chromatin ImmunoPrecipitation (ChIP) experiments are often adopted for epigenomic data analysis, even those performed with restriction enzymes. During the lecture a recently published dataset will be introduced to showcase some highlights from the analysis and some results will be presented to illustrate the concepts.

Lab Practical

Day 5 – Friday, June 7 

9 a.m. – noon

Lecture 9: Pathway Analysis (Dr. Gary Bader)

This lecture will cover pathway analysis methods that help interpret large gene lists identified by GWA studies and other genomics studies. Fundamentals of working with large gene lists, pathways and databases containing prior information about gene-gene interactions and biological pathways will be covered. 

Lab Practical: The lab component will cover a typical pathway analysis workflow.

1 – 4 p.m.

Lecture 10: Closing Lecture (Speaker: Dr. Katrina Goddard)

4 p.m.

Closing Remarks (Dr. Lyle Palmer)

Lab wrap up (outstanding questions, etc.)

Course evaluation