Appendix A Statistical and mathematical fundamentals

Phylogenetic biology draws on a variety of themes in statistics and mathematics. Given the diverse backgrounds of people coming to phylogenetic biology, one of the challenges of introducing people to this field is ensuring a foundation in these fundamentals. Many people have a solid working grasp of many of these topics, others need a refresher, and some have never seen them.

Throughout the text I have covered some of these fundamentals, but here I provide additional resources.

A.1 Probability theory and General statistics

I put together this short summary of statistical concepts commonly encountered in biology - https://bitbucket.org/caseywdunn/statistics/raw/master/statistics_in_biology.pdf. It is short and technical, with a focus on the relationship between concepts, such as the exponential family of distributions.

A.2 Bayes theorem

Grant Sanderson of 3Blue1Brown provides an excellent introduction to Bayes Theorem and conditional probability in this video - https://youtu.be/HZGCoVF3YvM. Note that he uses \(E\) for Evidence rather than \(D\) for Data as is common practice in phylogenetic applications.

This video provides additional perspective, including its application to medical testing - https://youtu.be/R13BD8qKeTg.

The MCMC robot by Paul Lewis is an excellent interactive introduction to MCMC sampling - https://phylogeny.uconn.edu/mcmc-robot/.

A.3 Linear algebra

Linear algebra is a rich field of math that springs from a surprisingly simple starting point - linear combinations of vectors, often represented as rows and columns of matrices. From basic operations such as vector addition and scalar multiplication, one can build matrix operations and many sophisticated data analyses. Many familiar data analysis methods, such as regression, solving systems of equations, Principal Component Analyses (PCA), and common clustering methods, are built from linear algebra under the hood. So are many aspects of phylogenetic analysis, including the models of evolution themselves and the representation of the phylogenetic trees in matrix forms that computers can easily manupulate.

Though linear algebra is fundamental to many topics at the intersection of science, mathematics, and computation, most science students have far less exposure to this critical field than to statistics, calculus, and other quantitative topics. This is too bad, as even a basic working knowledge of linear algebra can provide great insight into the tools scientists use daily and allow us to better apply, implement, and extend them.

The following resources are a good place to start in your own studies of linear algebra: