A Bayesian and a Frequentist Discuss a Simple Result

Joe Thorley · 2017-04-19 · 5 minute read

Background

The following scenario and dialogue reflect my current understanding and relationship to the Bayesian and Frequentist paradigms.

Scenario

Our scene opens on two dishevled statisticians staring longingly at a bank vault full of $1 coins.

After randomly selecting a coin from the vault, and confirming that it appears perfectly ordinary, the two companions toss it nine times and observe one head.

Dialogue

Bayesian: What do you consider the best estimate of the probability of throwing a head to be?

Frequentist: I consider it to be 0.11 because the data are most likely if the probability is 1 in 9.

Bayesian: I’m sorry but I just can’t accept that conclusion. Based on my prior knowledge of the world, I consider it extremely unlikely that this coin will only throw heads… Anyway why are you interested in simply choosing the value that maximizes the probability of the data given the value. I’m interested in knowing the probability of the value given the data.

Frequentist: I choose the value that maximizes the likelihood of the data because prior knowledge is subjective.

Bayesian: Of course it is. At first you don’t know much about the world but as you observe it your knowledge changes. It’s how science works.

Frequentist: No science is objective - I don’t want my conclusions to be subjective.

Bayesian: Neither do I particularly, but I can examine the extent to which my conclusions are sensitive to my knowledge. You are just ducking the issue.

Frequentist: No I’m not - I am working in a logically consistent objective framework.

Bayesian: Really? How certain are that the estimate is 0.11?

The frequentist quickly codes up the following model using tmbr.

library(magrittr)
library(purrr)
library(tmbr)

data <- data.frame(Throws = 9, Heads = 1)

model <- "
#include <TMB.hpp>

template<class Type>
Type objective_function<Type>::operator() () {

DATA_VECTOR(Throws);
DATA_VECTOR(Heads);

PARAMETER(bHeads);

vector<Type> eHeads = Heads;

Type nll = 0.0;

for(int i = 0; i < Heads.size(); i++){
  eHeads(i) = invlogit(bHeads);
  nll -= dbinom(Heads(i), Throws(i), eHeads(i), true);
}
return nll;
}"

model %<>% model(gen_inits = function(data) list(bHeads = 0)) %>% 
  analyse(data = data)
# A tibble: 1 × 6
      n     K     logLik       AICc            duration converged
  <int> <int>      <dbl>      <dbl>      <S4: Duration>     <lgl>
1     1     1 -0.9422643 -0.1154714 0.0459170341491699s      TRUE
coef(model, conf_level = 0.95) %>% select(estimate, lower, upper) %>% map_df(plogis)
# A tibble: 1 × 3
   estimate      lower     upper
      <dbl>      <dbl>     <dbl>
1 0.1111111 0.01539349 0.4998535

Frequentist: Easy! The 95% confidence limits are 0.02 and 0.5.

Bayesian: Yes, but how certain are you that this coin is biased towards tails?

Frequentist: Well in 95% of the datasets analysed the confidence interval will include the underlying value so … if this coin is in the confidence interval its biased towards heads.

The Bayesian quickly codes up the equivalent model with a strong prior using jmbr.

library(jmbr)

model <- "model{
  bHeads ~ dnorm(0, 100)

  for(i in 1:length(Heads)) {
    logit(eHeads[i]) <- bHeads
    Heads[i] ~ dbinom(eHeads, Throws[i])
  }
}"

model %<>% model() %>% 
  analyse(data = data)
# A tibble: 1 × 8
      n     K nsamples nchains nsims            duration  rhat converged
  <int> <int>    <int>   <int> <int>      <S4: Duration> <dbl>     <lgl>
1     1     1     2000       4  4000 0.0590369701385498s     1      TRUE
coef(model, conf_level = 0.95) %>% select(estimate, lower, upper) %>% map_df(plogis)
   estimate     lower    upper
      <dbl>     <dbl>    <dbl>
1 0.4921855 0.4445491 0.540532

Bayesian: Well given that I consider the probability of throwing heads for a randomly selected coin to be heavily weighted towards 0.5, I am 95% sure that the probability of throwing heads using this coin is between 0.44 and 0.54.

Frequentist: There you go being subjective.

Bayesian: I’d rather be subjective than obtuse.

Frequentist: Whatever! My methods are faster, you know.

Bayesian: Not really if you use INLA. Although I still prefer MCMC methods because they provide full information on the parameters. Ha - beat that.

Frequentist: Paah - data cloning uses MCMC but is completely invariant to the choice of priors.

Bayesian: Damn!

Frequentist: You know that with enough data we would reach the same conclusions. We could both fit a hierarchical model that incorporates the results from other coins to parameterise the variation among coins - the term that you just plucked out of the air.

Bayesian: That is true. But could you analyse the data from one set of coins and then use the posterior distribution as the prior in the analysis of a subsequent set of coins and so on and arrive at the same conclusions?

Frequentist: You know I can’t!

Bayesian: Well that’s a problem then because each analysis has to start from scratch.

Frequentist: That’s the way I like it.

Bayesian: How much do you think our coin is worth?

Frequentist: A dollar?

Bayesian: Are you kidding me! Here you are asserting that you are confident that this coin, which to the human eye is perfectly ordinarily, is heavily biased towards tails. Surely its worth more than all the other coins in the vault put together? I put it to you that you are a Bayesian in practice.

Frequentist: Whatever. As the Dude put it

Yeah, well, ya know, that’s just, like, your opinion, man.

There is a quiet pause as they both mull over the implications of this statement. Gradually their gaze returns to the remaining coins.

Bayesian: Maybe we should sample a few more before anyone comes back - just to make sure…

Conclusions

Thoughts and reflections appreciated.