class: center, middle, inverse, title-slide # Data Analysis ## in the context of wildlife ecology and management ### Joe Thorley Ph.D. R.P.Bio. ### 2019-10-31 --- ## What is Data Analysis? Data Analysis is about -- - reducing uncertainty (judgement to knowledge) -- - using information (data) -- - given assumed relationships (model) -- Data Analysis is *not* about -- - management (making optimal decisions)! --- ## Regulate Industry? <!-- --> ---
> Judgement-Based Management --- background-image: url(19-analysis-wildlife_files/09_1987_1dollar_rev.jpg) ## Uncertainty is described using Probability --- ## Uncertainty is described using Probability <img src="19-analysis-wildlife_files/figure-html/unnamed-chunk-3-1.svg" style="display: block; margin: auto;" /> --- ## Uncertainty is described using Probability <img src="19-analysis-wildlife_files/figure-html/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" /> --- ## Uncertainty is described using Probability <img src="19-analysis-wildlife_files/figure-html/unnamed-chunk-5-1.svg" style="display: block; margin: auto;" /> --- ## Uncertainty is described using Probability <img src="19-analysis-wildlife_files/figure-html/unnamed-chunk-6-1.svg" style="display: block; margin: auto;" /> --- layout: false ## A Tossed Coin - What is the probability of a heads before I toss the coin? -- - What is the probability after I toss it but before I look at it? -- - What is the probability after I look at it but before I tell you? -- - What is the probability after I tell you it's tails? -- - What is the probability if I then tell you I was lying and it's heads? -- - What is the probability if I'm willing to bet you $10 it's heads? -- - Are you willing to accept odds of 10:1? --- ## PROBABILITY DOES NOT EXIST -- De Finetti (1970) Theory of Probability -- - Probability is a measure of **your** uncertainty about the possible states of the world. -- <img src="19-analysis-wildlife_files/figure-html/unnamed-chunk-7-1.svg" style="display: block; margin: auto;" /> --- ## Probability Distributions <!-- --> --- ## Probability Distributions <!-- --><!-- --> -- `$$\text{Value} \sim N(0, 1)$$` ---
> Data Analysis -- `$$\text{Knowledge} \propto \text{Judgement} \cdot P(\text{Data}|\text{Model})$$` --- background-image: url(19-analysis-wildlife_files/bison-1801981_1920.jpg) ## Data are Coded Observations -- > At easter about an hour after midday I was walking by the river and I saw a female bison, and another female behind it and another male one behind it. I went back at sunset but didn't see anything. -- "04/01/18 1:05 two female and one male bison 44.4280 N 110.5885 W" --- background-image: url(19-analysis-wildlife_files/bison-1801981_1920.jpg) ## Data are Machine-Readable
-- This dataset has at least six problems! -- -- - Numbers are mispunched -- - Two different observers -- - 4th of January -- - Just after middnight -- - In the Gobi Desert -- - Missing zero count! -- *80% of data analysis is data cleansing and tidying!!* --- ## Count and Industry Data
--- class: inverse background-image: url(19-analysis-wildlife_files/haida-gwaii.png) --- class: inverse background-image: url(19-analysis-wildlife_files/haida-gwaii.png) # All Models are Wrong --- class: inverse background-image: url(19-analysis-wildlife_files/haida-gwaii.png) # All Models are Wrong (but some are Useful) -- > The map is not the territory Alfred Korzybski -- > Everything simple is false. Everything which is complex is unusable. Paul Valery -- > There is no such thing as an absolutely True thought. Steven Gray --- ## Data Analysis -- <!-- --> -- `$$\log(\mu) = \beta_0 + \cdots + \Delta \cdot \text{Industry}$$` -- `$$\text{Count} \sim \text{Poisson}( \mu)$$` --- ## Data Analysis <!-- --><!-- --> `$$\log(\mu) = \beta_0 + \cdots + \Delta \cdot \text{Industry}$$` `$$\text{Count} \sim \text{Poisson}(\mu)$$` `$$\Delta \sim N(0, 1)$$` --- ## Data Analysis <!-- --><!-- --> `$$\log(\mu) = \beta_0 + \cdots + \Delta \cdot \text{Industry}$$` `$$\text{Count} \sim \text{Poisson}(\mu)$$` `$$\Delta \sim N(0, 1)$$` `$$\beta_0 \sim N(0, 1)$$` --- ## Data Analysis <!-- --><!-- --> `$$\log(\mu) = \beta_0 + \cdots + \Delta \cdot \text{Industry}$$` `$$\text{Count} \sim \text{Poisson}(\mu)$$` `$$\Delta \sim N(0, 1)$$` `$$\beta_0 \sim N(0, 1)$$` --- ## Data Analysis <!-- --><!-- --> `$$\log(\mu) = \beta_0 + \cdots + \Delta \cdot \text{Industry}$$` `$$\text{Count} \sim \text{Poisson}(\mu)$$` `$$\Delta \sim N(0, 1)$$` `$$\beta_0 \sim N(0, 1)$$` --- ## Data Analysis <!-- --><!-- --> `$$\log(\mu) = \beta_0 + \cdots + \Delta \cdot \text{Industry}$$` `$$\text{Count} \sim \text{Poisson}(\mu)$$` `$$\Delta \sim N(0, 1)$$` `$$\beta_0 \sim N(0, 1)$$` --- ## Data Analysis <!-- --><!-- --> `$$\log(\mu) = \beta_0 + \cdots + \Delta \cdot \text{Industry}$$` `$$\text{Count} \sim \text{Poisson}(\mu)$$` `$$\Delta \sim N(0, 1)$$` `$$\beta_0 \sim N(0, 1)$$` ---
---
> 'uncertainty laundering' Gelman (2016) --- ## Significance <!-- --> --- ## Significance <!-- --> --- ## Significance <!-- --><!-- --> --- ## Significance <!-- --><!-- --> -- The effect of industry on abundance is **not significant**. -- We can't be sure it's having an effect. -- Therefore there is no need to change regulations! --- ## Significance <!-- --> -- Significance depends on effect size **and sample size**. -- Statistical significance `\(\neq\)` ecological significance -- Significance does not take account of the costs/benefits of the various options. -- ---- Significance is a reasonable criteria for model selection. ---
> Decision Theory Choose option that maximizes expected net benefit given the uncertainty. --- ## Decision Theory -- Requires loss function (challenging to develop) -- but criteria are explicit -- and decisions 'optimal'. --- ## Loss Function <!-- --> --- ## Loss Function <!-- --> --- ## Loss Function <!-- --> --- ## Loss Function <!-- --> --- ## Loss Function <!-- --><!-- --> -- <!-- --> --- ## Loss Function <!-- --><!-- --> <!-- --><!-- --> --- ## Summary -- - Uncertainty is personal -- - Clean and tidy data is essential -- - Models are inescapable -- - Statistical significance does not indicate ecological significance -- - Decisions should maximize the expected net benefit given the uncertainty. --- ## Further Reading Amrhein, V., Greenland, S., and McShane, B. 2019. Scientists rise up against statistical significance. Nature 567(7748): 305–307. doi:10.1038/d41586-019-00857-9. McElreath, R. 2016. Statistical rethinking: a Bayesian course with examples in R and Stan. CRC Press/Taylor & Francis Group, Boca Raton. Williams, P.J., and Hooten, M.B. 2016. Combining statistical inference and decisions in ecology. Ecological Applications 26(6): 1930–1942. doi:10.1890/15-1593.1. --- class: center, middle # Thanks! For more information see www.joethorley.io www.poissonconsulting.ca