Quantitative Course Resources

Click here to download pdf of all resources below, and to search for all courses visit this link.



Courses marked with an asterisk (*) count for BBS credit


Biostatistics, Programming, and Computer Science


*Biomedical Informatics (BMI) 713 / Genetics  229 Computational Statistics for Biomedical Sciences



Course Description: Analysis of large datasets has become an integral part of biological and biomedical sciences. This course will provide a practical introduction to data analysis, with high-throughput sequencing data as the main source of examples. In the first half, it will cover basic statistical concepts and techniques, including hypothesis testing, nonparametric methods, principal component analysis, cor- relation analysis, and linear regression. In the second half, it will cover several advanced topics, focusing on issues that one encounters in the literature but are seldom covered in introductory statistics courses. To carry out statistical tests and visualize data, students will learn R, a powerful programming language for statistical computing and graphics. The class will be a combination of lecture and computer lab. We will use recent literature to motivate the statistical methods, and assignments will frequently include attempts to reproduce published findings.


Prerequisites: No previous knowledge in statistics or programming is assumed. However, those with little or no programming experience may need to devote  extra time. Additional sessions will also be provided for those interested in learning Python, a widely used programming language.


*BST 282 Introduction to Computational Biology and Bioinformatics (formerly Bio 512)



Basic problems, technology platforms, algorithms and data analysis approaches in computational biology. Algorithms covered include dynamic programming, hidden Markov model, Gibbs sampler, clustering and classification methods.


This course is targeted at students with some statistics and computer programming background who have an interest in exploring genomic data analysis and algorithm development as a potential future direction.


Course restricted: Biostatistics students only (or instructor permission). If you are not a BIO student but took STAT110 and CS50 (FAS courses), please contract the Registrar's Office for an override.


*STAT 110 Introduction to Probability



A comprehensive introduction to probability. Basics: sample spaces and events, conditional probability, and Bayes' Theorem. Univariate distributions: density functions, expectation and variance, Normal, t, Binomial, Negative Binomial, Poisson, Beta, and Gamma distributions. Multivariate distributions: joint and conditional distributions, independence, transformations, and Multivariate Normal. Limit laws: law of large numbers, central limit theorem. Markov chains: transition probabilities, stationary distributions, convergence.


*STAT 111 Introduction to Theoretical Statistics



Basic concepts of statistical inference from frequentist and Bayesian perspectives. Topics include maximum likelihood methods, confidence and Bayesian interval estimation, hypothesis testing, least squares methods and categorical data analysis.


*STAT 115/215 Introduction to Computational Biology and Bioinformatics



The course will cover basic technology platforms, data analysis problems and algorithms in computational biology. Topics include sequence alignment and search, high throughput experiments for gene expression, transcription factor binding and epigenetic profiling, motif finding, RNA/protein structure prediction, proteomics and genome-wide association studies. Computational algorithms covered include hidden Markov model, Gibbs sampler, clustering and classification methods.


Good quantitative skills, strong interest in biology, willingness and diligence to learn programming.


215 meets with 115 class, but graduate students are required to do more coding, complete a research project and submit a written report during reading period in addition to completing all work assigned for Statistics 115.


*ES150 Introduction to Probability with EngineeringApplications



This course introduces students to probability theory and statistics, and their applications to physical, biological and information systems. Topics include: random variables, distributions and densities, conditional expectations, Bayes' rules, laws of large numbers, central limit theorems, Markov chains, Bayesian statistical inferences and parameter estimations. The goal of this course is to prepare students with adequate knowledge of probability theory and statistical methods, which will be useful in the study of several advanced undergraduate/graduate courses and in formulating and solving practical engineering problems.


*BST 281 Genomic Data Manipulation



Introduction to genomic data, computational methods for interpreting these data, and a survey of current functional genomics research. Covers biological data processing, programming for large datasets, high-throughput data (sequencing, proteomics, expression, etc.), and related publications. 

This course is targeted at students in experimental biology programs with an interest in understanding how available genomic techniques and resources can be applied in their research. 


*BST 210 Applied Regression Analysis



Topics include model interpretation, model building, and model assessment for linear regression with continuous outcomes, logistic regression with binary outcomes, and proportional hazards regression with survival time outcomes. Specific topics include regression diagnostics, confounding and effect modification, goodness of fit, data transformations, splines and additive models, ordinal, multinomial, and conditional logistic regression, generalized linear models, overdispersion, Poisson regression for rate outcomes, hazard functions, and missing data. The course will provide students with the skills necessary to perform regression analyses and to critically interpret statistical issues related to regression applications in the public health literature.


*Math 19B Linear Algebra, Probability and Statistics for the Life Sciences



Probability, statistics and linear algebra with applications to life sciences, chemistry, and environmental life sciences. Linear algebra includes matrices, eigenvalues, eigenvectors, determinants, and applications to probability, statistics, dynamical systems. Basic probability and statistics are introduced, as are standard models, techniques, and their uses including the central limit theorem, Markov chains, curve fitting, regression, and pattern analysis.


*Math 21B Linear Algebra and Differential Equations



Matrices provide the algebraic structure for solving myriad problems across the sciences. We study matrices and related topics such as linear transformations and linear spaces, determinants, eigenvalues, and eigen vectors. Applications include dynamical systems, ordinary and partial differential equations, and an introduction to Fourier series.


*MCB 112 Biological Data Analysis



Biology has become a computational science, requiring analysis of large data sets from genomics, imaging, and other technologies. This course teaches computational methods in biological data analysis, using an empirical and experimental framework suited to the complexities of biological data, emphasizing computational control experiments. The course is primarily aimed at biologists learning computational methods, but is also suited for computational statistical scientists learning about biological data.


*MIT 6.047/878 Computational Biology



Covers the algorithmic and machine learning foundations of computational biology combining theory with practice. We cover both foundational topics in computational biology, and current research frontiers. We study fundamental techniques, recent advances in the field, and work directly with current large-scale biological datasets.



Biological sequence analysis, hidden Markov models, gene finding, comparative genomics, RNA structure, sequence alignment, hashing



Gene expression, clustering/classification, EM/Gibbs sampling, motifs, Bayesian networks, microRNAs, regulatory genomics, epigenomics



Gene/species trees, phylogenomics, coalescent, personal genomics, population genomics, human ancestry, recent selection, disease mapping


In addition to the technical material in the course, the term project provides practical experience: (1) writing an NIH-style research proposal, (2) reviewing peer proposals, (3) planning and carrying out independent research, (4) presenting research results orally in a conference setting, and (5)writing results in a journal-style scientific paper. You will work on a project of your choice with regular feedback and advice from a mentor, your peers, and the teaching staff.


*Stat 121a Data Science



Learning from data in order to gain useful predictions and insights. This course introduces methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data management to be able to access big data quickly and reliably; exploratory data analysis to generate hypotheses and intuition; prediction based on statistical methods such as regression and classification; and communication of results through visualization, stories, and interpretable summaries. Built around three modules: prediction and elections, recommendation and business analytics, and clustering and text analysis.


*Stat 139 Statistical Sleuthing through Linear Models



A serious introduction to statistical inference with linear models and related methods. Topics include t-tools and nonparametric alternatives (including bootstrapping and permutation-based methods), multiple-group comparisons, analysis of variance, linear regression, model checking and refinement, and causation versus correlation. Emphasis on thinking statistically, evaluating assumptions, and developing tools for real-life applications.


*Stat 149 Statistical Sleuthing through Generalized Linear Models



Sequel to Statistics 139, emphasizing common methods for analyzing continuous non-normal and categorical data. Topics include logistic regression, log-linear models, multinomial logit models, proportional odds models for ordinal data, Gamma and inverse-Gaussian models, over-dispersion, analysis of deviance, model selection and criticism, model diagnostics, and an introduction to non-parametric regression methods.


*SCRB 152 Asking Cells Who They Are: Computational Transcriptomics Using RNA-Seq



This course is a hands-on introduction to computational analysis of RNA sequencing data as a measure of genome-wide transcription. We will cover methods spanning the spectrum of RNA-Seq analysis: starting from raw sequencing reads, obtaining gene expression measures, and interpreting biological significance by differential expression analyses, clustering, and visualization. Coursework will consist of programming assignments in Python exploring real datasets. The course will emphasize skills applicable to independent biological research.


*Systems Biology 200 Dynamic and Stochastic Processes in Cells



Rigorous introduction to (i) dynamical systems theory as a tool to understand molecular and cellular biology (ii) stochastic processes in single cells, using tools from statistical physics and information theory.


*HST 508/Biophysics 170



This course provides a foundation in the following four areas: evolutionary and population genetics; comparative genomics; structural genomics and proteomics; and functional genomics and regulation.


*SEAS AC209a Intro to Data Science

*SEAS AC209b Advanced topics in Data Science

*CS 50 Introduction to Computer Science

*MIT 6.00 Introduction to Computer Science and Programming

*Cell Biology 302qc. Advanced Experimental Design for Biologists


*Neurobiology 206qc. Bootcamp in Quantitative Methods

*Genetics 303qc. Current Tools for Gene Analysis



Harvard Chan Bioinformatics Core Workshops

HarvardX/EdX Statistics and R for Life Sciences


Introduction to Statistics

Free online course from udacity.com

Course does not have a start date; students start class on their own time
Course is self-paced


Mathematical Biostatistics



7 week course, 3-5 hours a week of work
Course includes use of R statistical programming language
(This course was taken by multiple students from the previous year)


Hopkins – JHUSPH Open Courseware: Introduction to Biostatistics

4 week, 10 lecture series with practice problem sets associated with each lecture http://ocw.jhsph.edu/index.cfm/go/viewCourse/course/IntroBiostats/coursePage/schedule/


Statistics: Making Sense of Data

Free online course from Coursera Pending start dates



Computing for Data Analysis

4 weeks, 3-5 hours a week of work Course is free


Learn to Program: The Fundamentals

No courses currently planned, but future sessions can be added to a watch list https://www.coursera.org/course/programming1


Learn to Program: Crafting Quality Code

Focus on writing quality Python code that runs correctly and efficiently.
No courses currently planned, but future sessions can be added to a watch list https://www.coursera.org/course/programming2


Introduction to Computer Science – Programming Methodology

Khan Academy Linear Algebra

HMS Institute for Quantitative Social Science



*MIT 6.0001  Introduction to Computer Science and Programming using Python

*MIT 6.0002 Introduction to Computational Thinking and Data Science

SEAS January ComputeFest Workshop

Free workshops
Covers basics of computer science, R as well as Python


Intro to Computer Science

Free course & paid course available
Covers basics of computer science as well as Python https://www.udacity.com/course/cs101


Google’s Python Class

All this material makes up an intensive 2-day class
The videos are organized as the day-1 and day-2 sections Class is free


The Hitchhiker’s Guide to Python

Python for Data Analysis

Available for purchase on Amazon, $23.99



Learn the fundamentals of programming to build web apps and manipulate data Course is free


Dataquest Python



Intro to Data Science with R

Offers a 2 day or 3 day course options https://www.rstudio.com/training/curriculum/intro-to-data-science.htm


Roger Peng, Introducing R and basic programming concepts. Computing for Data Analysis

Week 1: http://www.youtube.com/playlist?list=PLjTlxb-wKvXNSDfcKPFH2gzHGyjpeCZmJ
Week 2: http://www.youtube.com/playlist?list=PLjTlxb-wKvXNnjUTX4C8IeIhPBjPkng6B
Week 3: http://www.youtube.com/playlist?list=PLjTlxb-wKvXOzI2h0F2_rYZHIXz8GWBop
Week 4: http://www.youtube.com/playlist?list=PLjTlxbwKvXOdzysAE6qrEBN_aSBC0LZS
 John Hopkins
4 weeks, 3-5 hours a week of work
Class is free


The R Book by Michael Crawly

Available for purchase on Amazon, $73.72 + free shipping http://www.amazon.com/The-Book-Michael-J-Crawley/dp/0470973927


The Art of R Programming by Norman Matloff

Available for purchase on Amazon, $25.35 + free shipping http://www.amazon.com/The-Art-Programming-Statistical-Software/dp/1593273843


Google’s R Style Guide

R Programming

R Programming – Research Technology Consulting


Course is free
Highly recommended by past BIRT fellows



Coursera Introduction to R



Introduction to Databases

Introduction to Databases is being launched on the new edX-based platform in June, but can still be accessing through the link provided




SQL Tutorial

SQL tutorial will teach you how to use SQL to access and manipulate data in: MySQL, SQL Server, Access, Oracle, Sybase, DB2, and other database systems http://www.w3schools.com/sql/default.asp


MySWL Crash Course

© 2016 President and Fellows
of Harvard College