Musings of a Computational Biologist

Musings of a Computational Biologist https://www.joethorley.io/ Recent content on Musings of a Computational Biologist Hugo -- gohugo.io en Thu, 31 Oct 2019 00:00:00 +0000 Data Analysis in the context of Wildlife Management and Ecology (version 2) https://www.joethorley.io/post/2019/data-analysis/ Thu, 31 Oct 2019 00:00:00 +0000 https://www.joethorley.io/post/2019/data-analysis/ This morning I gave my latest version of my presentation on Data Analysis in the context of Wildlife Management and Ecology to the Haida Gwaii Institute semester students. It included probability, uncertainty, data, models, confidence intervals, significance and decision theory. They had some great questions and interesting thoughts and were very engaging. My presentation can be viewed at https://www.joethorley.io/slides/19-analysis-wildlife#1. Based on some of the questions, I’ve updated the Further Reading list: Censored Water Quality Data https://www.joethorley.io/post/2019/censored/ Wed, 23 Oct 2019 00:00:00 +0000 https://www.joethorley.io/post/2019/censored/ Often the concentration of a water quality parameter below a detection limit (DL) cannot be measured reliably. A common approach to dealing with such non-detects is to substitute them with the value 0, DL/2 or DL which discards information and introduces biases. A more reliable approach is to explicitly model the non-detects using censored probability distributions. Simulated Data Consider the following log-normally distributed parameter (log mean of 0 and log SD of 1). Everything is a Thing https://www.joethorley.io/post/2019/things/ Sun, 29 Sep 2019 00:00:00 +0000 https://www.joethorley.io/post/2019/things/ In Object Process Methodology (OPM) everything is either an object or a process where An object is a thing that exists A process is a thing that can transform an object by creating or destroying it, or by changing its state. In other words, everything is a thing. Significance, Thresholds and Decision-Making https://www.joethorley.io/post/2019/significance/ Mon, 12 Aug 2019 00:00:00 +0000 https://www.joethorley.io/post/2019/significance/ Introduction Our understanding of nature is incomplete. And always will be. Yet we must act. How are we to take rational actions in the face of uncertainty? Or to put the question in more concrete terms – how should we use data to inform actions? Statistical Models As schematically depicted in Figure 1, statistical inference provides a well-defined pathway from data to uncertainty (in the form of posterior probability distributions). Multiplying leads to the Apocalypse https://www.joethorley.io/post/2019/multiplying-apocalypse/ Thu, 25 Jul 2019 00:00:00 +0000 https://www.joethorley.io/post/2019/multiplying-apocalypse/ On a finite planet, population growth is unsustainable. This is a hard truth that for historical and religious reasons people hardly dare utter. In more concrete terms it means that unless we begin simply replacing ourselves then disease, starvation and war will increase the death rate. In Biblical terms it means that multiplying leads to the apocalypse. Exponential Growth https://www.joethorley.io/post/2019/exponential-growth/ Wed, 03 Apr 2019 00:00:00 +0000 https://www.joethorley.io/post/2019/exponential-growth/ Exponential Growth A variable is said to grow exponentially when its rate of change is proportional to its current value \[\frac{dV}{dt} = \mu V\] It is called exponential growth because the variable’s value at any given time ($t$) is an exponential function of time \[V_t = V_0 {\rm e}^{\mu t}\] t <- 1:30 mu <- 0.25 V0 <- 1 Vt <- V0 * exp(mu * t) plot(Vt ~ t, type = "l") Calculating $\mu$ The growth rate $\mu$ which has units of $t^{-1}$ is the instantaneous growth rate. Extra-Poisson Variation with the Negative Binomial Distribution https://www.joethorley.io/post/2018/extra-poisson-nbd/ Mon, 26 Nov 2018 00:00:00 +0000 https://www.joethorley.io/post/2018/extra-poisson-nbd/ Poisson Distribution The Poisson distribution describes the probability of the number of rare independent events given a base rate ($\lambda$). set.seed(101) n <- 1e+05 lambda <- 1.7 rpois <- rpois(n, lambda) hist(rpois, breaks = seq(0, ceiling(max(rpois)), by = 1)) The variance ($\sigma^2$) and mean ($\mu$) of a Poisson distribution are both $\lambda$ round(mean(rpois), 1) ## [1] 1.7 round(var(rpois), 1) ## [1] 1.7 And the dispersion index ($\text{DI}$), which is defined to be $\sigma^2 / \mu$, is 1. A Little Bit of Chaos https://www.joethorley.io/post/2018/chaos/ Sun, 18 Nov 2018 00:00:00 +0000 https://www.joethorley.io/post/2018/chaos/ One of the simplest ecological population models is the exponential growth model \[N_{t+1} = \alpha \cdot N_t\] which indicates the rate of exponential increase (if $\alpha > 1$) or decrease (if $0 \leq \alpha < 1$) or stasis (if $\alpha = 1$). Before going any futher lets put some R code in place to facilitate our exploration. library(ggplot2) update <- function(rule, ..., initial = 100, t = 100, rule_name = substitute(rule)) { for(i in 2:t) initial[i] <- rule(initial[i-1], . Certain vs Uncertain Relationships https://www.joethorley.io/post/2018/uncertainty/ Sat, 17 Nov 2018 00:00:00 +0000 https://www.joethorley.io/post/2018/uncertainty/ If you place $100 in your bank which charges $5 in fees and issues 1% annual simple interest then after 365 days you will have $96 in your account. We can write this mathematically as \[y = x - 5 + x \cdot 0.01\] where $x$ is our initial bank balance and $y$ is our final balance Alternatively if you had placed your $100 on a horse race at odds of 3:1 it is possible that at the end of the race you receive your original stake back plus an additional $300 for a grand total of $400 if your horse wins or $0 if it loses. An Embarrassingly Simple Insight into Polynomial Regression https://www.joethorley.io/post/2018/polynomial/ Fri, 16 Nov 2018 00:00:00 +0000 https://www.joethorley.io/post/2018/polynomial/ Linear regression is of course defined by the certain relationship \[\mu = \alpha + \beta \cdot x\] and the uncertain relationship \[y \sim~ N(\mu, \sigma)\] where $\alpha$ is the intercept and $\beta$ is the slope. For example with $\alpha = 0$ and $\beta = 10$ the deterministic relationship can be represented as follows beta <- 10 x <- 0:10 mu <- beta * x plot(x, mu, type = "l") I was just thinking about how I would like the slope, ie $\beta$, to vary with $x$ and came up with the following certain relationship \[\mu = \alpha + (\beta + \beta_2 \cdot x) \cdot x\] which with \(\beta_2 = -0. Data Analysis in the context of Wildlife Management and Ecology https://www.joethorley.io/post/2018/context/ Mon, 29 Oct 2018 00:00:00 +0000 https://www.joethorley.io/post/2018/context/ Yesterday I gave a presentation to the semester students on Haida Gwaii. They were really engaging and a lot of fun. Our discussions included uncertainty, information, confidence intervals, significance and effect sizes. The presentation I provided can be viewed here. I learnt a lot from them. The only other thing I would like to add is that if anyone is interested in learning more then they should read Statistical Rethinking by Richard McElreath. A Minimalist Hugo Theme https://www.joethorley.io/post/2018/theme/ Thu, 25 Oct 2018 00:00:00 +0000 https://www.joethorley.io/post/2018/theme/ This web site is built using Hugo and Blogdown. The theme is Tanka with minor customizations. The theme is released under GPL-3. ssdtools Presentation https://www.joethorley.io/post/2018/ssdtools/ Tue, 02 Oct 2018 00:00:00 +0000 https://www.joethorley.io/post/2018/ssdtools/ ssdtools is an R package (and associated Shiny app) for analysing Species Sensitivity Distributions. In October 2018, I gave the following presentation at the 45th Annual Canadian Ecotoxicity Workshop (CEW) in Vancouver: Notes on the Negative Binomial Distribution in Ecology https://www.joethorley.io/post/2018/nbd/ Mon, 06 Aug 2018 00:00:00 +0000 https://www.joethorley.io/post/2018/nbd/ There are many different formulations of the negative binomial distribution (NBD). Default Formulation In the default formulation, it is the number of failures to occur in a series of Bernoulli trials before a target number of successes is reached. For example consider the number of failures that occur before 10 successes ($N$) are achieved where the probability of success ($\rho$) is 0.5. With R this can be achieved as follows. Reflections on Canadian Environmental Regulations https://www.joethorley.io/post/2017/review/ Thu, 24 Aug 2017 00:00:00 +0000 https://www.joethorley.io/post/2017/review/ In June 2017, the federal government released a discussion paper that outlines the changes being considered for Canada’s environmental assessment and regulatory processes. The changes are intended to: Regain public trust; Protect the environment; Advance reconciliation with Indigenous peoples and, Ensure good projects go ahead and resources get to market. I would like to provide some reflections from a computational biologist’s perspective on the first two objectives. Firstly, in order to regain the public trust, the results should be reproducible (Peng 2009). Principles of Data Management (for Biologists) https://www.joethorley.io/post/2017/data/ Mon, 14 Aug 2017 00:00:00 +0000 https://www.joethorley.io/post/2017/data/ The following was presented at the Skidegate Council of the Haida Nation office on August 14th 2017. It is also provided below. Principles of Data Management (for Biologists) Introduction Biologists spends $1,000,000s of dollars collecting data with little regard for its management. Study Design Study design should preceed data management Identify question(s) what do we want to know and why? Assess existing data/understanding what do we already know? Effect Sizes with the log Transform https://www.joethorley.io/post/2017/effects/ Wed, 26 Jul 2017 00:00:00 +0000 https://www.joethorley.io/post/2017/effects/ Often when modeling abundance the log transform is used to ensure the expected values are positive. For example \[\log(\mu) = \alpha_0 + \beta_1 * x1 + \beta_2 * x2\] where the observed count is \[y \sim dpois(\mu).\] An additional bonus of using the log transform is that estimates of effect sizes (proportional changes in the expected abundance) complete with confidence intervals (Bradford, Korman, and Higgins 2005) can be calculated from the coefficient table alone using the formula \[\exp(\beta \cdot \Delta) - 1. Languaging Bayesian Models https://www.joethorley.io/post/2017/language/ Thu, 27 Apr 2017 00:00:00 +0000 https://www.joethorley.io/post/2017/language/ I just watched a thoroughly entertaining and thought-provoking talk by Richard McElreath At the end he makes suggestions for a new Bayesian language that avoids such frequentist-centric terms as data, parameter, likelihood and even prior or posterior! This got me thinking… Fundamentally, there are just two types of things in Bayesian modelling. Let us call them variables and relationships. The variables themselves can be known or unknown with unknown variables differing in how uncertain they are. A Bayesian and a Frequentist Discuss a Simple Result https://www.joethorley.io/post/2017/bayesian/ Wed, 19 Apr 2017 00:00:00 +0000 https://www.joethorley.io/post/2017/bayesian/ Background The following scenario and dialogue reflect my current understanding and relationship to the Bayesian and Frequentist paradigms. Scenario Our scene opens on two dishevled statisticians staring longingly at a bank vault full of $1 coins. After randomly selecting a coin from the vault, and confirming that it appears perfectly ordinary, the two companions toss it nine times and observe one head. Dialogue Bayesian: What do you consider the best estimate of the probability of throwing a head to be? Using P-Values with Confidence https://www.joethorley.io/post/2017/pvalues/ Tue, 14 Mar 2017 00:00:00 +0000 https://www.joethorley.io/post/2017/pvalues/ The following was presented at the College of Applied Biology’s annual conference Evidence Matters: Professional Practice in a Post-Truth World in Victoria, BC on March 3rd 2015. It is also provided below with additional text. Using P-Values with Confidence Background The p-value is perhaps the most ubiquitous statistical index. It is also the most misunderstood, and/or misused, and/or misaligned depending on whom you ask. American Statistical Association Wasserstein, R. An Overview of the Statistical Challenges to Understanding the Ecology and Management of Regulated Rivers https://www.joethorley.io/post/2015/rivers/ Thu, 07 May 2015 00:00:00 +0000 https://www.joethorley.io/post/2015/rivers/ The following was presented at the Columbia Mountains Institute’s conference on Regulated Rivers: Environment, Ecology, and Management in Castlegar, BC, which ran from May 6th to 7th, 2015. It is also provided below with additional text. An Overview of the Statistical Challenges to Understanding the Ecology and Management of Regulated Rivers (with additional text) Introduction The statistical challenges associated with understanding the environment, ecology and management of regulated rivers are numerous. Scale Readers Should Not Age Fish https://www.joethorley.io/post/2015/scales/ Wed, 11 Feb 2015 00:00:00 +0000 https://www.joethorley.io/post/2015/scales/ A Well-Established Technique Reading the age of a fish from the rings (annuli) on its scales is a well-established technique with a history dating back over 250 years (Ricker 1975). However, like many well-established methods the general approach has failed to keep pace with modern analytic development. In particular, hierarchical Bayesian methods now make it possible to accurately predict the actual (as opposed to inferred) ages of fish by explictly modeling the biases that confound the relationship between age and annuli numbers. A Brief Overview of Hierarchical Bayesian State-Space Models https://www.joethorley.io/post/2014/state/ Wed, 30 Apr 2014 00:00:00 +0000 https://www.joethorley.io/post/2014/state/ Introduction On a call with a potential client today, I told them that I almost exclusively use Bayesian models. I then added that we usually end up including hierarchical structure through one or more random effects and that, if possible, we like our models to be of the state-space type. Later in an email, they asked for further explanation of what hierarchical Bayesian state-space models are and why we use them - which inspired me to write this short post. Geometric to Arithmetic Mean https://www.joethorley.io/post/2014/geo/ Mon, 31 Mar 2014 00:00:00 +0000 https://www.joethorley.io/post/2014/geo/ Statistical analysts often model positive responses (such as fish densities) using a log-normal distribution. By default the expected values for such models represent the geometric mean ($\mu_G$) but readers are typically most interested in the arithmetic mean ($\mu_A$). The difference can be important particularly because $\mu_G \leq \mu_A$. The geometric mean of a log-normal distribution can be converted to its arithmetic mean using the equation \[ \mu_A = exp(log(\mu_G) +\sigma_L^2 / 2) \] Midge Creek Bull Trout https://www.joethorley.io/post/2011/midge/ Mon, 19 Sep 2011 00:00:00 +0000 https://www.joethorley.io/post/2011/midge/ The following video from Gary Pavan shows bull trout (or bullies as the local anglers call them) from Kootenay Lake that have swum 20 km up Midge Creek in order to spawn. The fish were prevented going any further by a natural falls below which they have aggregated. About https://www.joethorley.io/about/ Mon, 01 Jan 0001 00:00:00 +0000 https://www.joethorley.io/about/ This website is a random collection of my thoughts related to computational biology that are too extensive for Mastodon. I am a Senior Computational Biologist with Poisson Consulting CC BY https://www.joethorley.io/cc-by/ Mon, 01 Jan 0001 00:00:00 +0000 https://www.joethorley.io/cc-by/ Except where indicated otherwise, the contents of this webpage are released under CC BY 4.0. The Hugo-Blogdown theme is released under GPL-3.