A Brief Overview of Hierarchical Bayesian State-Space Models

Introduction

On a call with a potential client today, I told them that I almost exclusively use Bayesian models. I then added that we usually end up including hierarchical structure through one or more random effects and that, if possible, we like our models to be of the state-space type. Later in an email, they asked for further explanation of what hierarchical Bayesian state-space models are and why we use them - which inspired me to write this short post.

Qualifier

This blog posting should in no-way be considered a complete or 100% technically correct explanation of hierachical Bayesian state-space models but rather an attempt to provide a brief overview.

Bayesian Models

Models can be fitted using a frequentist or Bayesian approach. There are at least three advantages to a Bayesian approach:

The estimates are exact for any sample size. This is important. Even with very sparse data the predicted values and associated uncertainties are correct. This is not the case with frequentist models, which require minimum samples sizes in order to produce unbiased estimates.
Calculation of the uncertainties of derived parameters is trivial. In most cases clients aren’t interested in the standard deviation of a random effect in log-transformed space - for example. They simply want to know the percent change in the population abundance due to an impact with 95% certainty. Calculation of the uncertainties associated with derived parameters is straightforward with Bayesian models and frequently very challenging with frequentist models.
Bayesian models are easy to understand, implement and modify when formulated in the BUGS language. Even so called difficult models for which it may not be possible to calculate frequentist estimates are seen for what they are - nothing more than a series of simple deterministic and stochastic relationships.

There are, however, several downsides to adopting a Bayesian approach.

It can take hours of processor time to fit a complex Bayesian model to a big data set.
You need to specify prior probability distributions for the primary parameters. Some people consider this the disadvantage of Bayesian statistics because the shape of the prior probability distributions can affect the estimates. I actually consider it an advantage because it allows one to incorporate prior information into an analysis.

It also takes much longer to code, validate and describe a Bayesian model than call a pre-existing frequentist model (see Petr Keil for an excellent discussion of the joys and frustrations of being a Bayesian) although I experience this as a creative process that allows me to more fully understand my models, data and estimates and therefore the system under study which is why people pay us in the first place.

Hierachical Models

All statistical models contain at least one stochastic relationship. For example, in the case of a linear regression, the stochastic relationship is the normal distribution that links the predicted values to the observed values. Hierarchical models are those with two or more stochastic relationships. Depending on the model structure the additional stochastic relationships (also known as random effects) can be used to:

Make predictions for a larger population. For example, in fish surveys we typically swim a subset of the possible sites. By modeling the distribution from which the sites are drawn we are able to predict the number of fish at other unswum sites. This in turn allows us to estimate the total number of fish in the river.
Account for all sources of variation. Ecological datasets are typically affected by a range of unknown and unwanted extraneous influences whose unmodeled presence can lead to erroneous conclusions. Fortunately, accounting for such influences through for example, spatially and temporally distributed random effects, is straightforward.
Assess variability. The standard deviation of a random effect indicates a factor’s importance as an explanatory variable.

However, in order to model a factor as a random effect the factor levels need to be viewable as representative samples of a larger population. Other limitations include the fact that models can take substantially longer to run in the presence of random effects and estimates can be imprecise if there are only data available for a few levels.

Although random effects can be implemented in a frequentist framework, the fact that it is trivial to add them to a Bayesian model is yet another reason why I am a Bayesian.

State-Space Models

Many datasets we analyse consist of fish counts (hence the company name). However our clients aren’t really interested in how many fish the field crew has counted. They want to know how many fish there are in the river and how those numbers are changing through time. So our task as modelers is to use the observed counts to estimate unobserved latent variables such as the actual abundance. In order to do this, we need to explicitly model the underlying ecological or state process, i.e., how many fish there are, and the observational process, i.e., what proportion of the fish are seen. The main advantages of such are approach are:

Biases in the observation process can be accounted for.
Unobserved population parameters of interest such as abundance, survival and density-dependence can be modeled in the absence of any direct data.

It should be relatively obvious that such models, which belong to the class of state-space models, are a special type of hierarchical model and therefore like the rest of their brethren naturally fitted within a Bayesian framework.

Musings of a Computational Biologist