Detecting trends in low/high abundance species
From the inbox:
What is the best answer to this question posed below as a comment on the technical report we are writing regarding our forest bird trend data? If we have a bias in detecting trends for abundant or common species vs. uncommon or rare species, then I need to state this. I suspect that it is easier to detect a trend for a common species because there are more observations to work with … hence, more difficult for a rare species?
General statistical question – is detection of significant increase more likely than detection of significant decrease due in part to issues of sample size? Given that declining species probably are less common to begin with, wouldn’t it be more difficult to detect significant trends for those species?
It sounds like two different questions to me:
- Is it easier to detect an increase than to detect a decrease?
No, since the problem is symmetric. Take the exact same dataset and reverse the years, if there is a significant increase in one direction, it will be a significant decrease in the opposite direction.
- Is it easier to detect trends in more abundant species?
The answer is complicated. On one hand, an average of one individual increase per year is easier to detect in a rare species than in a common species. On the other hand, an average of a 10% increase in individuals per year is easier to detect in a common species than in a rare species. This is due to the signal-to-noise ratio which is high for the rare species in the former case but high for the common species in the latter. It is not clear to me that either of these are fair comparisons. I'm sure we could determine the break even point which will depend on how rare is rare and how common is common as well as how many observations are taken for each mean (iirc the data point for each year is the mean of all surveys in that season).
Does the abundance matter, or is it only the signal:noise ratio as it would seem. i.e. greater ratio gives greater power regardless of abundance.
My answer was based on assuming a Poisson model for each survey with mean that changes from year to year. This mean would effectively be the abundance.Since the Poisson distribution has a variance that is equal to the mean, the noise (let's define it as the square root of the abundance) is directly related to the abundance. So yes, abundance matters through its effect on the noise.
If abundance starts at 1 and increases 1 per year, then over 9 years the signal is 9 while the noise ranges from 1 to 3. In contrast, if abundance starts at 100 and increases 1 per year, then over 9 years the signal is 9 while the noise is ~10. The former has a signal-to-noise ratio about 3 while the latter is about 1. But this makes sense since a 1 individual increase is much more meaningful if you only started with 1 than if you started with 100. So to try and make the comparison reasonable, let abundance in the latter case increase 100 each year (so that the relative increase in the two scenarios is the same). Now over 9 years the signal is 900 while the noise ranges from 10 to 30. So the signal to noise ratio is ~30. The break even point here is if the high abundance scenario increases by 30 over the 9 years because then the signal would be 30 and the noise would be ~10 and thus the signal-to-noise about 3.
Let's try a simulation and see what happens. My goal here is to create a situation where the pvalue associated with a linear increase in Poisson observations is approximately the same when you start at a Poisson mean of 1 and a Poisson mean of 100.
Consider data over 10 years.
n = 10 x = 1:n
Simulate counts for a rare species and perform a regression on year (x).
set.seed(1) lambda = 1:n y = rpois(n, lambda) summary(lm(y ~ x))
## ## Call: ## lm(formula = y ~ x) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.982 -1.136 -0.173 1.395 2.454 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -1.200 1.276 -0.94 0.37435 ## x 1.436 0.206 6.99 0.00011 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.87 on 8 degrees of freedom ## Multiple R-squared: 0.859, Adjusted R-squared: 0.842 ## F-statistic: 48.8 on 1 and 8 DF, p-value: 0.000114
Simulate counts for a common species (using the same random number seed to make the comparison more direct) and perform a regression on year (x).
set.seed(1) lambda = seq(100, 190, length = n) y = rpois(n, lambda) summary(lm(y ~ x))
## ## Call: ## lm(formula = y ~ x) ## ## Residuals: ## Min 1Q Median 3Q Max ## -20.29 -6.88 3.21 8.74 9.74 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 96.20 7.65 12.58 1.5e-06 *** ## x 9.02 1.23 7.32 8.2e-05 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 11.2 on 8 degrees of freedom ## Multiple R-squared: 0.87, Adjusted R-squared: 0.854 ## F-statistic: 53.6 on 1 and 8 DF, p-value: 8.24e-05
I got a number of 90 rather than 30 increase for the high abundance case, but this seems in the ballpark.
The bottom line with this analysis would be that to detect a one individual increase per year when you started at 1 individual would be as hard as detecting a 3 to 9 individual increase per year when you started at 100 individuals.
All of this was based on a single Poisson observation each year. You take multiple surveys per year and take their mean. Taking the mean is going to decrease the noise by a factor of the square root of the number of surveys you take each year. In addition, there is probably more noise than a Poisson model would give due to weather, time of day, time of year, etc. It is not clear how these would ultimately impact your ability to detect a trend.
Another comment from the reply:
I believe the length of the series on the x-axis also matters a great deal, so that more years gives you a lot more power nonwithstanding signal-to-noise. But, that's a third question.
Agreed. If the increase is consistent then having more years will give you more power.