PB HLTH 240C — Syllabus
This course provides an introduction to computational statistics, with emphasis on statistical methods and software for addressing high-dimensional inference problems that arise in current biological and medical research.
Topics of interest, to be surveyed in terms of both statistical methodology and software implementation, include:
- numerical and graphical summaries of data;
- loss-based estimation with cross-validation: parametric and non-parametric density estimation and regression (e.g., maximum likelihood estimation, class prediction), variable selection;
- the expectation-maximization (EM) algorithm;
- smoothing: robust local regression, kernel density estimation, splines;
- the bootstrap;
- Monte-Carlo procedures: Markov chain Monte-Carlo (MCMC), importance sampling;
- hidden Markov models (HMM);
- cluster analysis;
- multiple hypothesis testing;
- the design of in silico experiments.
The course also discusses statistical computing resources, with emphasis on the R language and environment (www.r-project.org).
The statistical methods and software are motivated by and illustrated on data structures that arise in current high-dimensional inference problems in biology and medicine.