Statistical Rethinking with Tidymodels

version 0.0.0.9

This book is an attempt to re-express the code in the 2023 edition of McElreath’s course, ‘Statistical Rethinking.’ His models are re-fit with tidymodels using brms, plots are redone with ggplot2, and the general data wrangling code predominantly follows the tidyverse style. It borrows extensively from a previous version of this book authored by Solomon Kurz.
Author

James H Wade

Published

January 8, 2023

Preface

Why this book?

This book is based on the second edition of Richard McElreath’s text, Statistical rethinking: A Bayesian course with examples in R and Stan1 and A Solomon Kurz’s Statistical rethinking with brms, ggplot2, and the tidyverse (plus corresponding bookdown)2. Kurz contributions show how to fit the models with Paul Bürkner’s brms package35, which makes it easy to fit Bayesian regression models in R6 using Hamiltonian Monte Carlo. Kurz also prefers plotting and data wrangling with the packages from the tidyverse7,8, as do I. Most of the prose is either quoted from McElreath or written by Kurz. The purpose of this version is three fold:

  1. Provide a learning space for me to explore Bayesian data analysis during the course of the 2023 offering of the Statistical Rethinking course
  2. Update the example code to use a tidymodels approach
  3. Convert the book from RMarkdown to Quarto

My assumptions about you

If you’re looking at this project, I’m assuming you’re a graduate student, academic, or researcher of some sort and that you have some foundation in statistics. If you’re rusty, consider checking out the free texts by Çetinkaya-Rundel and Hardin9 or Navarro10 before diving into Statistical Rethinking. I’m also assuming you understand the rudiments of R and have at least a vague idea about what the tidyverse is. If you’re totally new to R, consider starting with Hands on Programming with R by Garret Grolemund11 or R programming for data science by Roger Peng12. For an introduction to the tidyvese-style of data analysis, the best source I’ve found is Grolemund and Wickham’s R for Data Science (R4DS)13, which I extensively link to throughout this project. Another nice alternative is Baumer, Kaplan, and Horton’s Modern Data Science with R14.

Please note that you do not need to be totally fluent in statistics or R. Otherwise why would you need this project, anyway? The most important key to learning is intrinsic motivation to learn the material - you have to want to learn it. If you’re curios, willing to try, and persistently tinker, this is for you. Something else that can be a huge help is a project to apply your new skills as you learn them. This is hands on learning! Don’t be afraid to make mistakes. I will make plenty throughout. Let’s learn together!

How to use and understand this project

This project is not meant as a stand alone book. It’s a supplement to the second edition of McElreath’s text and follow the structure of his text, chapter by chapter, translating his analyses into tidymodels using brms along with tidyverse styling. Some sections in the text are composed entirely of equations or prose, leaving us nothing to translate. When we run into those sections, the corresponding sections in this project will sometimes be blank or omitted, though I do highlight some of the important points in quotes and prose of my own. So I imagine students might reference this project as they progress through McElreath’s text. I also imagine working data scientists might use this project in conjunction with the text as they flip to the specific sections that seem relevant to solving their data challenges.

I reproduce the bulk of the figures in the text, too. The plots in the first few chapters are the closest to those in the text. Kurz is passionate about data visualization and liked to play around with color palettes, formatting templates, and other conventions quite a bit. As a result, the plots in each chapter have their own look and feel. For more on some of these topics, check out:

In this project, I use a handful of formatting conventions gleaned from R4DS, The tidyverse style guide18, and R markdown: The definitive guide19.

  • R code blocks and their output appear in a gray background. E.g.,
2 + 2 == 5
[1] FALSE
  • R and the names of specific package (e.g., brms) are in boldface font.
  • Functions are in a typewriter font and followed by parentheses, all atop a gray background (e.g., brm()).
  • When I want to make explicit the package a given function comes from, I insert the double-colon operator :: between the package name and the function (e.g., tidybayes::median_qi()).
  • R objects, such as data or function arguments, are in typewriter font atop gray backgrounds (e.g., chimpanzees, .width = .5).
  • You can detect hyperlinks by their typical blue-colored font.
  • In the text, McElreath indexed his models with names like m4.1 (i.e., the first model of [Chapter 4][Geocentric Models]). I primarily followed that convention, but replaced the m with a b to stand for the brms package.

R setup

To get the full benefit from this book, you’ll need some software. Happily, everything will be free (provided you have access to a decent personal computer and an good internet connection).

First, you’ll need to install R, which you can learn about at https://cran.r-project.org/.

Though not necessary, your R experience might be more enjoyable if done through the free RStudio interface, which you can learn about at https://rstudio.com/products/rstudio/.

Once you have installed R, execute the following to install the bulk of the add-on packages. This code also installs the package manager package pak. check_installed() from rlang will ask you if you want install a package if you do not already have them.

# install helper packages to help with package installations
if (!require(rlang)) {
  install.packages("rlang")
}

library(rlang)

packages <- c("ape", "bayesplot", "brms", "broom", "dagitty", "devtools",
              "flextable", "GGally", "ggdag", "ggdark", "ggmcmc",
              "ggrepel", "ggthemes", "ggtree", "ghibli", "gtools",
              "invgamma", "loo", "patchwork", "posterior", "psych",
              "rcartocolor", "Rcpp", "remotes", "santoku",
              "StanHeaders", "statebins", "tidybayes", "tidyverse",
              "viridis", "viridisLite", "wesanderson")

check_installed("pak")
check_installed(packages)

The rethinking package by McElreath is used extensively. Before you install it, you first need to install cmdstanr.

cmdstanr Install

The cmdstanr developers recommend running this is a fresh R session or at least restarting your current session before installing.

install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
# you likely them need to install cmdstan
cmdstanr::install_cmdstan()

rethinking Installation

With cmdstanr and cmdstan installed, you are now ready to install rethinking. To do so, run the following code:

library(pak)
pak("rmcelreath/rethinking")

Other Package Installations

A few of the other packages are not officially available via the Comprehensive R Archive Network (CRAN; https://cran.r-project.org/). You can download them directly from GitHub by executing the following.

pak("EdwinTh/dutchmasters")
pak("gadenbuie/ggpomological")
pak("GuangchuangYu/ggtree")
pak("UrbanInstitute/urbnmapr")

It’s possible you’ll have problems installing some of these packages. Here are some likely suspects and where you can find help:

Updates and Versions

This section will be updated as the book evolves.

Version 0.0.0.9000

This is the development version of the book. It is not ready for consumption by any but the most eager learners.

How to Contribute

If you have insights on how to improve any of this work, please share your thoughts on GitHub at https://github.com/JamesHWade/statistical-rethinking-with-tidymodels/issues. The contribution guidelines for this book are listed at https://github.com/JamesHWade/statistical-rethinking-with-tidymodels/CONTRIBUTING.md.

Thank you’s are in order

Here are the contributors that Kurz mentions in the brms edition of this book that I include here as well. For an up-to-date list of contributors, see the GitHub contributors page.

As Kurz said: science is better when we work together.

License and citation

This book is licensed under the Creative Commons Zero v1.0 Universal icense. You can learn the details, here. In short, you can use my work. Just make sure you give me the appropriate credit the same way you would for any other scholarly resource. Here’s the citation information:

@book{wadeStatisticalRethinkingwithTidymodels2023,
  title = {Statistical Rethinking with Tidymodels},
  author = {Wade, James H.},
  year = {2023},
  month = {Jan},
  edition = {version 0.0.0.9000},
  url = {https://jameshwade.quarto.pub/statistical-rethinking-with-tidymodels/}
}

You can do this, too

This project is powered by Quarto. It’s easy to do, and there is a robust community of developers eager to help you with issues if you have any. Check out the GitHub Discussion page or the Posit Community for Quarto help.

The source code of the project is available on GitHub at https://github.com/JamesHWade/statistical-rethinking-with-tidymodels.