PB HLTH 240C — Syllabus

This course provides an introduction to computational statistics, with emphasis on statistical methods and software for addressing high-dimensional inference problems that arise in current biological and medical research.
Topics of interest, to be surveyed in terms of both statistical methodology and software implementation, include:

  • numerical and graphical summaries of data;
  • loss-based estimation with cross-validation: parametric and non-parametric density estimation and regression (e.g., maximum likelihood estimation, class prediction), variable selection;
  • the expectation-maximization (EM) algorithm;
  • smoothing: robust local regression, kernel density estimation, splines;
  • cross-validation;
  • the bootstrap;
  • Monte-Carlo procedures: Markov chain Monte-Carlo (MCMC), importance sampling;
  • hidden Markov models (HMM);
  • cluster analysis;
  • multiple hypothesis testing;
  • the design of in silico experiments.

The course also discusses statistical computing resources, with emphasis on the R language and environment (www.r-project.org).

The statistical methods and software are motivated by and illustrated on data structures that arise in current high-dimensional inference problems in biology and medicine.

Back to course website