Despite all the hype around Deep Learning Models, and AI as a Service APIs, there’s still a need for Data Scientists to explain – in simple terms – what factors influence a given prediction. And even more importantly, sometimes we want to construct a model that represents real world process, rather than have input values feed into a programmatically optimized series of neural networks and produce a predicted value.
I am not attempting to argue Deep Learning is not effective. Deep Learning is the best tool we have right now for predicting outcomes. However, prediction alone does not necessarily lead to a better human understanding of what influences said prediction. In many cases a human can’t understand why a Deep Learning model calculates its predictions because by the time we understand the current model, the Deep Learning routine has already updated based on new information. Case in point, Reinforcement Learning routines are always adapting to the decisions of Actors in near real time. We typically do not judge the efficiency of a Reinforcement Learning model based on the decisions it makes, but on the associated effects of the human actors interacting with the model – are players in the battle arena giving up when playing against bots? Are chatbot responses receiving poor reviews?
Bayesian Analytics or Bayesian Analysis uses a Bayesian Approach to create human understandable insights from models. “But Bayesian Statistics is so hard? It has lots of weird symbols and probabilities and stuff?” Yes, learning Bayesian Statistics can be a challenge, but the notation is no more complex than any other type of statistics, most of us were just taught frequentist stats first so it seems more intuitive.
Regardless, an easy way to learn Bayesian Analysis is using this book: https://www.amazon.com/Doing-Bayesian-Data-Analysis-Tutorial/dp/0124058884/ followed by this book: https://www.amazon.com/Bayesian-Analysis-Chapman-Statistical-Science-dp-1439840954/dp/1439840954/
Between Kruschke and Gelman’s book, you can get a strong foundation on using Bayesian statistics for analysis. (See Andrew Gelman’s recent review of the Santa Clara SARS-COV-2 antibody prevalence study for why a strong foundation of Bayesian statistics is important for analysis: https://statmodeling.stat.columbia.edu/2020/04/19/fatal-flaws-in-stanford-study-of-coronavirus-prevalence/)
Unfortunately, both Kruschke and Gelman use R rather than Python in their examples. Fortunately, the MCMC sampling applications BUGS, JAGS, and Stan are not actually an R or Python program, R and Python merely call their APIs. So, setting up Python to use Stan, for example, is no harder than using R.
My process for setting up a local virtual environment is below. Please note, if you just want to get started, Google Colab is much easier. As an example, see the notebook provided by Ethan Steinburg in the comments of Gelman’s article: https://colab.research.google.com/drive/110EIVw8dZ7XHpVK8pcvLDHg0CN7yrS_t
For a local environment, it’s a little more complex, but not too bad.
Pystan’s repo documentation isn’t bad either: https://github.com/stan-dev/pystan; in this article I’m providing a supplement with my typical workflow.
This configuration assumes you have Anaconda installed and are able to set up a virtual environment on your machine.
Once you have Anaconda installed and accessible via command line, simply run the following commands for the first time you use the environment:
conda create -n stan_env python==3.7 numpy scipy matplotlib libpython m2w64-toolchain -c conda-forge -c msys2
conda activate stan_env
python -m pip install pystan arviz scikit-learn statsmodels plotly seaborn nbformat
The first line creates a python3 environment with the necessary packages required for pystan installation.
The next activates the environment.
The third installs pystan and packages I often use for analysis.
Additionally, you probably want to use Jupyter Lab for development, so here are some additional configurations, again only necessary the first time you activate the environment.
pip install --user ipykernel
python -m ipykernel install --user --name=stan_env
conda install ipywidgets
conda install -c conda-forge nodejs
jupyter labextension install jupyterlab-plotly
These commands install necessary widget for visualization, nodejs for rendering the widgets, and the plotly extension for interactive visuals.
Now you should be ready to launch Jupyter in your new pystan environment!
Make sure you have the stan_env virtual environment active by typing…
conda activate stan_env
… in your terminal / command line / powershell
Then type “Jupyter Lab” (after enabling the virtual environment).
Once Jupyter Lab loads attempt to execute “import pystan”, if there are no errors, congrats! You now have a functional Pystan Jupyter Notebook!
Next time you need to use the notebook, you only need to type
conda activate stan_env
And you are ready to launch your Jupyter Lab or Jupyter Notebook.