unmarkedSite occupancy can change over time (depending on your definition of time!)
Examples:
We want to understand what drives these patterns of change, while accounting for imperfect detection.
Solution:
"Dynamic" or multi-season occupancy model (MacKenzie et al. 2003 Ecology)
Model overview diagram
Review parameters of interest
State submodel:
Detection submodel:
Parameters:
$z_{i, 1}$: Initial occupancy state of site $i$
$\psi$: Initial occupancy probability at time $t=1$
Math:
$$ z_{i,1} \sim \mathrm{Bernoulli}(\psi) $$Parameters:
$z_{i,t}$: Occupancy state of site $i$ in time $t$
$\epsilon$: Probability of extinction, i.e. probability site occupied in time $t-1$ is unoccupied in time $t$
$\phi$: Probability of persistence; $1-\epsilon$
$\gamma$: Probability of colonization; i.e. probability non-occupied site becomes occupied in time $t$
Math:
$$ z_{i,t} \sim \begin{cases} \mathrm{Bernoulli}(1 - \epsilon), & \mathrm{if} z_{i, t-1} = 1 \\ \mathrm{Bernoulli}(\gamma), & \mathrm{if} z_{i, t-1} = 0 \end{cases} $$Alternatively:
$$ z_{i,t} \sim \mathrm{Bernoulli}(z_{i,t-1}\cdot(1-\epsilon) + (1-z_{i,t-1})\cdot\gamma) $$When $z_{i,t-1} = 1$:
$$ z_{i,t} \sim \mathrm{Bernoulli}(1\cdot(1-\epsilon) + 0\cdot\gamma) = $$$$z_{i,t} \sim \mathrm{Bernoulli}(1-\epsilon)$$When $z_{i,t-1} = 0$:
$$ z_{i,t} \sim \mathrm{Bernoulli}(0\cdot(1-\epsilon) + 1\cdot\gamma) = $$$$z_{i,t} \sim \mathrm{Bernoulli}(\gamma)$$Parameters:
$p$: Probability of detection
$z_{i,t}$: Occupancy state of site $i$ in time $t$
Data:
$y_{i,j,t}$: Detection/non-detection data at site $i$, time $t$, occasion $j$
Math:
$$ y_{i,j,t} \sim \mathrm{Bernoulli}(p \cdot z_{i,t}) $$When $z_{i,t} = 1$:
$$y_{i,j,t} \sim \mathrm{Bernoulli}(p\cdot1)$$When $z_{i,t} = 0$:
$$y_{i,j,t} \sim \mathrm{Bernoulli}(p\cdot0) = 0$$MacKenzie, D. I., Nichols, J. D., Hines, J. E., Knutson, M. G., & Franklin, A. B. (2003). Estimating site occupancy, colonization, and local extinction when a species is detected imperfectly. Ecology, 84(8), 2200-2207.
Royle, J. A., & Kéry, M. (2007). A Bayesian state‐space formulation of dynamic occupancy models. Ecology, 88(7), 1813-1823.
AHM vol 2, Chapter 4
Loxia curvirostra, European (common) crossbill

library(unmarked)
data(crossbill)
head(crossbill)
| id | ele | forest | surveys | det991 | det992 | det993 | det001 | det002 | det003 | ⋯ | date043 | date051 | date052 | date053 | date061 | date062 | date063 | date071 | date072 | date073 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | ⋯ | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | |
| 1 | 1 | 450 | 3 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 74 | 35 | 52 | 73 | 19 | 56 | 65 | 20 | 46 | 54 |
| 2 | 2 | 450 | 21 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 59 | 20 | 35 | 65 | 15 | 28 | 55 | 19 | 50 | 56 |
| 3 | 3 | 1050 | 32 | 3 | NA | NA | NA | 0 | 0 | 0 | ⋯ | 95 | 24 | 59 | 87 | 23 | 62 | 90 | 22 | 55 | 61 |
| 4 | 4 | 950 | 9 | 3 | 0 | 0 | 0 | 1 | 0 | 0 | ⋯ | 74 | 22 | 60 | 76 | 22 | 57 | 78 | 22 | 49 | 70 |
| 5 | 5 | 1150 | 35 | 3 | 0 | 0 | 0 | 1 | 1 | 1 | ⋯ | 69 | 28 | 51 | 73 | 23 | 42 | 70 | 15 | 43 | 60 |
| 6 | 6 | 550 | 2 | 3 | NA | NA | NA | 0 | 0 | 0 | ⋯ | 64 | 27 | 39 | 55 | 29 | 40 | 52 | 21 | 40 | 49 |
First, select only the columns corresponding to detection/non-detection data
ycols <- grepl("det", names(crossbill))
head(ycols)
y <- as.matrix(crossbill[,ycols])
head(y)
| det991 | det992 | det993 | det001 | det002 | det003 | det011 | det012 | det013 | det021 | ⋯ | det043 | det051 | det052 | det053 | det061 | det062 | det063 | det071 | det072 | det073 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| NA | NA | NA | 0 | 0 | 0 | 1 | 1 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 |
| 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | ⋯ | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
| NA | NA | NA | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
The y-matrix is in "wide" format, we want to convert it to a M x J x T array
M: Sites
J: Sampling occasions
T: Time periods
M <- nrow(y)
J <- 3
T <- ncol(y) / J
y_array <- array(NA, dim = c(M, J, T))
period <- rep(1:T, each = J)
period
for (t in 1:T){
y_array[1:M,1:J,t] <- y[, which(period == t)]
}
print(y_array[1:5,,1:3])
, , 1
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
[3,] NA NA NA
[4,] 0 0 0
[5,] 0 0 0
, , 2
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
[3,] 0 0 0
[4,] 1 0 0
[5,] 1 1 1
, , 3
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
[3,] 1 1 0
[4,] 0 0 0
[5,] 0 0 1
jags_data <- list(M = M, J = J, T = T,
y = y_array
)
str(jags_data)
List of 4 $ M: int 267 $ J: num 3 $ T: num 9 $ y: int [1:267, 1:3, 1:9] 0 0 NA 0 0 NA 0 0 0 0 ...
Initial occupancy state
Math:
$$ z_{i,1} \sim \mathrm{Bernoulli}(\psi) $$BUGS:
for (i in 1:M){
z[i, 1] ~ dbern(psi)
}
Occupancy state at time t
Math:
at $t = 1$:
$$ z_{i,1} \sim \mathrm{Bernoulli}(\psi) $$at $t > 1$:
$$ z_{i,t} \sim \mathrm{Bernoulli}(z_{i, t-1}\cdot(1-\epsilon) + (1-z_{i, t-1})\cdot\gamma) $$BUGS:
for (i in 1:M){
z[i, 1] ~ dbern(psi)
for (t in 2:T){
z[i, t] ~ dbern(z[i, t-1] * (1 - eps) + (1 - z[i, t-1]) * gam)
}
}
Detection process
Math:
$$ y_{i, j, t} \sim \mathrm{Bernoulli}(p \cdot z_{i, t}) $$BUGS:
# Detection
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
y[i,j,t] ~ dbern(p * z[i, t])
}
}
}
cat("
model{
# State
for (i in 1:M){
z[i, 1] ~ dbern(psi)
for (t in 2:T){
z[i, t] ~ dbern(z[i, t-1] * (1 - eps) + (1 - z[i, t-1]) * gam)
}
}
# Detection
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
y[i,j,t] ~ dbern(p * z[i, t])
}
}
}
# Priors
psi ~ dunif(0, 1)
eps ~ dunif(0, 1)
gam ~ dunif(0, 1)
p ~ dunif(0, 1)
}
", file = "dynocc_null.bugs")
pars <- c("psi", "eps", "gam", "p")
inits <- function() list(z = matrix(1, M, T))
library(jagsUI)
jags_null <- jags(data = jags_data,
inits = inits,
parameters.to.save = pars,
model.file = "dynocc_null.bugs",
n.chains = 3, n.iter = 1000, n.burnin = 500,
parallel=TRUE)
Processing function input....... Done. Beginning parallel processing using 3 cores. Console output will be suppressed. Parallel processing completed. Calculating statistics....... Done.
jagsUI::traceplot(jags_null)
jags_null
JAGS output for model 'dynocc_null.bugs', generated by jagsUI.
Estimates based on 3 chains of 1000 iterations,
adaptation = 100 iterations (sufficient),
burn-in = 500 iterations and thin rate = 1,
yielding 1500 total samples from the joint posterior.
MCMC ran in parallel for 0.548 minutes at time 2023-07-18 13:51:35.75341.
mean sd 2.5% 50% 97.5% overlap0 f Rhat n.eff
psi 0.313 0.032 0.254 0.313 0.382 FALSE 1 1.004 417
eps 0.177 0.023 0.133 0.177 0.223 FALSE 1 1.050 48
gam 0.125 0.012 0.102 0.124 0.151 FALSE 1 1.032 68
p 0.518 0.014 0.492 0.518 0.545 FALSE 1 1.044 53
deviance 3644.009 78.074 3495.490 3642.852 3798.456 FALSE 1 1.057 41
Successful convergence based on Rhat values (all < 1.1).
Rhat is the potential scale reduction factor (at convergence, Rhat=1).
For each parameter, n.eff is a crude measure of effective sample size.
overlap0 checks if 0 falls in the parameter's 95% credible interval.
f is the proportion of the posterior with the same sign as the mean;
i.e., our confidence that the parameter is positive or negative.
DIC info: (pD = var(deviance)/2)
pD = 2899.2 and DIC = 6543.184
DIC is an estimate of expected predictive error (lower is better).
psi¶hist(jags_null$sims.list$psi)
abline(v=jags_null$mean$psi, col='red')
psi_est <- jags_null$mean$psi
psi_ci <- quantile(jags_null$sims.list$psi, c(0.025, 0.975))
plot(1, psi_est, cex=2, pch=19, xaxt='n', xlab="psi")
segments(1, psi_ci[1], 1, psi_ci[2])
head(crossbill)
| id | ele | forest | surveys | det991 | det992 | det993 | det001 | det002 | det003 | ⋯ | date043 | date051 | date052 | date053 | date061 | date062 | date063 | date071 | date072 | date073 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | ⋯ | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | |
| 1 | 1 | 450 | 3 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 74 | 35 | 52 | 73 | 19 | 56 | 65 | 20 | 46 | 54 |
| 2 | 2 | 450 | 21 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 59 | 20 | 35 | 65 | 15 | 28 | 55 | 19 | 50 | 56 |
| 3 | 3 | 1050 | 32 | 3 | NA | NA | NA | 0 | 0 | 0 | ⋯ | 95 | 24 | 59 | 87 | 23 | 62 | 90 | 22 | 55 | 61 |
| 4 | 4 | 950 | 9 | 3 | 0 | 0 | 0 | 1 | 0 | 0 | ⋯ | 74 | 22 | 60 | 76 | 22 | 57 | 78 | 22 | 49 | 70 |
| 5 | 5 | 1150 | 35 | 3 | 0 | 0 | 0 | 1 | 1 | 1 | ⋯ | 69 | 28 | 51 | 73 | 23 | 42 | 70 | 15 | 43 | 60 |
| 6 | 6 | 550 | 2 | 3 | NA | NA | NA | 0 | 0 | 0 | ⋯ | 64 | 27 | 39 | 55 | 29 | 40 | 52 | 21 | 40 | 49 |
Select only the columns corresponding to date data
datecols <- grepl("date", names(crossbill))
date <- as.matrix(crossbill[,datecols])
head(date)
| date991 | date992 | date993 | date001 | date002 | date003 | date011 | date012 | date013 | date021 | ⋯ | date043 | date051 | date052 | date053 | date061 | date062 | date063 | date071 | date072 | date073 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 34 | 59 | 65 | 33 | 48 | 68 | 30 | 54 | 76 | 29 | ⋯ | 74 | 35 | 52 | 73 | 19 | 56 | 65 | 20 | 46 | 54 |
| 17 | 33 | 65 | 22 | 50 | 66 | 14 | 28 | 61 | 13 | ⋯ | 59 | 20 | 35 | 65 | 15 | 28 | 55 | 19 | 50 | 56 |
| NA | NA | NA | 38 | 67 | 85 | 28 | 50 | 83 | 30 | ⋯ | 95 | 24 | 59 | 87 | 23 | 62 | 90 | 22 | 55 | 61 |
| 29 | 59 | 65 | 33 | 48 | 68 | 24 | 54 | 76 | 39 | ⋯ | 74 | 22 | 60 | 76 | 22 | 57 | 78 | 22 | 49 | 70 |
| 24 | 45 | 65 | 21 | 46 | 71 | 31 | 56 | 73 | 23 | ⋯ | 69 | 28 | 51 | 73 | 23 | 42 | 70 | 15 | 43 | 60 |
| NA | NA | NA | 22 | 35 | 72 | 27 | 55 | 80 | 28 | ⋯ | 64 | 27 | 39 | 55 | 29 | 40 | 52 | 21 | 40 | 49 |
Convert to an array, the same way we did with the y-matrix
date_array <- array(NA, dim = c(M, J, T))
period <- rep(1:T, each = J)
for (t in 1:T){
date_array[,,t] <- date[, which(period == t)]
}
print(date_array[1:5,,1:3])
, , 1
[,1] [,2] [,3]
[1,] 34 59 65
[2,] 17 33 65
[3,] NA NA NA
[4,] 29 59 65
[5,] 24 45 65
, , 2
[,1] [,2] [,3]
[1,] 33 48 68
[2,] 22 50 66
[3,] 38 67 85
[4,] 33 48 68
[5,] 21 46 71
, , 3
[,1] [,2] [,3]
[1,] 30 54 76
[2,] 14 28 61
[3,] 28 50 83
[4,] 24 54 76
[5,] 31 56 73
Including the site-level covariates, which we standardize
jags_data <- list(M = M, J = J, T = T,
y = y_array,
date = (date_array - mean(date_array, na.rm=TRUE)) / sd(date_array, na.rm=TRUE),
ele = (crossbill$ele - mean(crossbill$ele)) / sd(crossbill$ele),
forest = (crossbill$forest - mean(crossbill$forest)) / sd(crossbill$forest)
)
str(jags_data)
List of 7 $ M : int 267 $ J : num 3 $ T : num 9 $ y : int [1:267, 1:3, 1:9] 0 0 NA 0 0 NA 0 0 0 0 ... $ date : num [1:267, 1:3, 1:9] -0.921 -1.692 NA -1.148 -1.375 ... $ ele : num [1:267] -1.1539 -1.1539 -0.2175 -0.3735 -0.0614 ... $ forest: num [1:267] -1.1471 -0.4967 -0.0992 -0.9303 0.0092 ...
Initial occupancy: Elevation
Colonization: Forest
Extinction: None
Detection: Date
Initial occupancy:
$$ \mathrm{logit}(\psi_{i,1}) = \alpha_0 + \alpha_{ele} \cdot \mathrm{ele}_i $$for (i in 1:M){
logit(psi[i,1]) <- alpha_0 + alpha_ele * ele[i]
}
# psi priors
alpha_0 ~ dlogis(0, 1)
alpha_ele ~ dlogis(0, 1)
Why dlogis?
Looks better on the original (probability) scale.
# Typical normal prior
hist(plogis(rnorm(1000, 0, 10)))
# dlogis prior
hist(plogis(rlogis(1000, 0, 1)))
Add linear predictor for gamma:
for (i in 1:M){
logit(psi[i,1]) <- alpha_0 + alpha_ele * ele[i]
for (t in 2:T){
logit(gam[i,t]) <- beta_0 + beta_forest * forest[i]
}
}
# psi priors
alpha_0 ~ dlogis(0, 1)
alpha_ele ~ dlogis(0, 1)
# gamma priors
beta_0 ~ dlogis(0, 1)
beta_ele ~ dlogis(0, 1)
# eps priors
eps ~ dunif(0, 1)
eps) has no covariates, so it stays the same as the old modelt starts with 2t at 1 and go to T-1 insteadDetection parameters:
$$ \mathrm{logit}(p_{i,j,t}) = \rho_0 + \rho_{date} \cdot \mathrm{date}_{i,j,t} $$for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
logit(p[i,j,t]) <- rho_0 + rho_date * date[i,j,t]
}
}
}
# detection priors
rho_0 ~ dlogis(0, 1)
rho_date ~ dlogis(0, 1)
cat("
model{
# State submodel
for (i in 1:M){
z[i, 1] ~ dbern(psi[i,1])
for (t in 2:T){
z[i, t] ~ dbern(z[i, t-1] * (1 - eps) + (1 - z[i, t-1]) * gam[i,t])
}
}
# ----------------------
# Calculate psi, gamma, eps
for (i in 1:M){
logit(psi[i,1]) <- alpha_0 + alpha_ele * ele[i]
for (t in 2:T){
logit(gam[i,t]) <- beta_0 + beta_forest * forest[i]
}
}
# psi priors
alpha_0 ~ dlogis(0, 1)
alpha_ele ~ dlogis(0, 1)
# gamma priors
beta_0 ~ dlogis(0, 1)
beta_forest ~ dlogis(0, 1)
# eps priors
eps ~ dunif(0, 1)
#--------------------------
# Detection submodel
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
y[i,j,t] ~ dbern(p[i,j,t] * z[i, t])
}
}
}
# -----------------------
# Calculate p
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
logit(p[i,j,t]) <- rho_0 + rho_date * date[i,j,t]
}
}
}
rho_0 ~ dlogis(0, 1)
rho_date ~ dlogis(0, 1)
}
", file = "dynocc_covs.bugs")
pars <- c("alpha_0", "alpha_ele", "beta_0", "beta_forest",
"rho_0", "rho_date")
inits <- function() list(z = matrix(1, M, T))
library(jagsUI)
jags_covs <- jags(data = jags_data,
inits = inits,
parameters.to.save = pars,
model.file = "dynocc_covs.bugs",
n.chains = 3, n.iter = 1000, n.burnin = 500,
parallel=TRUE)
Processing function input....... Done. Beginning parallel processing using 3 cores. Console output will be suppressed.
Error in checkForRemoteErrors(val): 3 nodes produced errors; first error: RUNTIME ERROR: Unable to resolve the following parameters: date[3,1,1] (line 53) date[3,2,1] (line 53) date[3,3,1] (line 53) date[6,1,1] (line 53) date[6,2,1] (line 53) date[6,3,1] (line 53) date[7,1,3] (line 53) date[7,1,4] (line 53) date[7,2,3] (line 53) date[7,2,4] (line 53) date[7,3,3] (line 53) date[7,3,4] (line 53) date[9,1,5] (line 53) date[9,2,5] (line 53) date[9,3,5] (line 53) date[14,1,3] (line 53) date[14,2,3] (line 53) date[14,3,3] (line 53) date[15,1,2] (line 53) date[15,2,2] (line 53) date[15,3,2] (line 53) date[22,1,6] (line 53) date[22,2,6] (line 53) date[22,3,6] (line 53) date[34,1,3] (line 53) date[34,1,4] (line 53) date[34,2,3] (line 53) date[34,2,4] (line 53) date[34,3,3] (line 53) date[34,3,4] (line 53) date[36,1,2] (line 53) date[36,1,3] (line 53) date[36,2,2] (line 53) date[36,2,3] (line 53) date[36,3,2] (line 53) date[36,3,3] (line 53) date[38,1,1] (line 53) date[38,2,1] (line 53) date[38,3,1] (line 53) date[43,1,3] (line 53) date[43,2,3] (line 53) date[43,3,1] Traceback: 1. jags(data = jags_data, inits = inits, parameters.to.save = pars, . model.file = "dynocc_covs.bugs", n.chains = 3, n.iter = 1000, . n.burnin = 500, parallel = TRUE) 2. run.parallel(data, inits, parameters.to.save, model.file, n.chains, . n.adapt, n.iter, n.burnin, n.thin, modules = modules, factories = factories, . DIC = DIC, verbose = verbose, n.cores = n.cores) 3. clusterApply(cl = cl, x = 1:n.chains, fun = jags.clust) 4. staticClusterApply(cl, fun, length(x), argfun) 5. checkForRemoteErrors(val) 6. stop(count, " nodes produced errors; first error: ", firstmsg, . domain = NA)
jags_data$date[43,3,1]
sum(is.na(jags_data$date))
Missing date values are simulated as coming from a normal distribution with mean and SD of the original dataset
# Impute missing dates
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
date[i,j,t] ~ dnorm(mu_date, tau_date)
}
}
}
mu_date ~ dnorm(0, 0.01)
sigma_date ~ dunif(0, 10)
tau_date <- pow(sigma_date, -2)
It's OK that date has a mixture of data and missing values!
One way to think about it:
date[i,j,k] is not NA: information flows from left to rightdate[i,j,k] is NA: information flows from right to leftcat("
model{
# State submodel
for (i in 1:M){
z[i, 1] ~ dbern(psi[i,1])
for (t in 2:T){
z[i, t] ~ dbern(z[i, t-1] * (1 - eps) + (1 - z[i, t-1]) * gam[i,t])
}
}
# ----------------------
# Calculate psi, gamma, eps
for (i in 1:M){
logit(psi[i,1]) <- alpha_0 + alpha_ele * ele[i]
for (t in 2:T){
logit(gam[i,t]) <- beta_0 + beta_forest * forest[i]
}
}
# psi priors
alpha_0 ~ dlogis(0, 1)
alpha_ele ~ dlogis(0, 1)
# gamma priors
beta_0 ~ dlogis(0, 1)
beta_forest ~ dlogis(0, 1)
# eps priors
eps ~ dunif(0, 1)
# -----------------------
# Detection submodel
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
y[i,j,t] ~ dbern(p[i,j,t] * z[i, t])
}
}
}
# -----------------------
# Impute missing dates
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
date[i,j,t] ~ dnorm(mu_date, tau_date)
}
}
}
mu_date ~ dnorm(0, 0.01)
sigma_date ~ dunif(0, 10)
tau_date <- pow(sigma_date, -2)
# -----------------------
# Calculate p
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
logit(p[i,j,t]) <- rho_0 + rho_date * date[i,j,t]
}
}
}
rho_0 ~ dlogis(0, 1)
rho_date ~ dlogis(0, 1)
}
", file = "dynocc_covs.bugs")
pars <- c("alpha_0", "alpha_ele", "beta_0", "beta_forest",
"rho_0", "rho_date", "mu_date", "sigma_date")
inits <- function() list(z = matrix(1, M, T))
library(jagsUI)
jags_covs <- jags(data = jags_data,
inits = inits,
parameters.to.save = pars,
model.file = "dynocc_covs.bugs",
n.chains = 3, n.iter = 1000, n.burnin = 500,
parallel=TRUE)
Processing function input....... Done. Beginning parallel processing using 3 cores. Console output will be suppressed. Parallel processing completed. Calculating statistics....... Done.
jags_covs
jagsUI::traceplot(jags_covs)
JAGS output for model 'dynocc_covs.bugs', generated by jagsUI.
Estimates based on 3 chains of 1000 iterations,
adaptation = 100 iterations (sufficient),
burn-in = 500 iterations and thin rate = 1,
yielding 1500 total samples from the joint posterior.
MCMC ran in parallel for 1.592 minutes at time 2023-07-18 13:54:03.252241.
mean sd 2.5% 50% 97.5% overlap0 f Rhat
alpha_0 -0.881 0.156 -1.169 -0.882 -0.576 FALSE 1.000 1.005
alpha_ele 0.508 0.147 0.220 0.506 0.821 FALSE 1.000 1.004
beta_0 -1.851 0.110 -2.065 -1.847 -1.646 FALSE 1.000 1.013
beta_forest 0.702 0.095 0.516 0.701 0.899 FALSE 1.000 1.008
rho_0 0.103 0.056 -0.010 0.103 0.213 TRUE 0.965 1.008
rho_date 0.219 0.044 0.134 0.219 0.304 FALSE 1.000 1.001
mu_date 0.000 0.012 -0.025 0.000 0.023 TRUE 0.502 1.000
sigma_date 1.000 0.008 0.984 1.000 1.017 FALSE 1.000 1.001
deviance 22388.215 74.134 22246.979 22384.989 22538.551 FALSE 1.000 1.010
n.eff
alpha_0 441
alpha_ele 458
beta_0 151
beta_forest 298
rho_0 299
rho_date 1010
mu_date 1500
sigma_date 1500
deviance 213
Successful convergence based on Rhat values (all < 1.1).
Rhat is the potential scale reduction factor (at convergence, Rhat=1).
For each parameter, n.eff is a crude measure of effective sample size.
overlap0 checks if 0 falls in the parameter's 95% credible interval.
f is the proportion of the posterior with the same sign as the mean;
i.e., our confidence that the parameter is positive or negative.
DIC info: (pD = var(deviance)/2)
pD = 2725.7 and DIC = 25113.87
DIC is an estimate of expected predictive error (lower is better).
mean sd 2.5% 50% 97.5% overlap0 f Rhat
alpha_ele 0.503 0.153 0.195 0.505 0.799 FALSE 0.999 1.008
mean sd 2.5% 50% 97.5% overlap0 f Rhat
beta_forest 0.703 0.089 0.528 0.704 0.873 FALSE 1.000 1.002
mean sd 2.5% 50% 97.5% overlap0 f Rhat
rho_date 0.220 0.048 0.125 0.222 0.311 FALSE 1.000 0.999
psi¶psielev_rng <- range(crossbill$ele)
elev_seq <- seq(elev_rng[1], elev_rng[2], length.out=100) # 1.
elev_std <- (elev_seq - mean(crossbill$ele)) / sd(crossbill$ele) # 2.
head(elev_std)
logit(psi[i,1]) <- alpha_0 + alpha_ele * ele[i]
psi_est <- jags_covs$mean$alpha_0 + jags_covs$mean$alpha_ele * elev_std # 3.
psi_est <- plogis(psi_est)
head(psi_est)
plot(elev_seq, psi_est, type='l', xlab="Elevation", ylab="psi") # 4.
What about the 95% credible interval?
We need to calculate psi for each saved iteration in the JAGS output (not just the posterior mean).
nsamples <- jags_covs$mcmc.info$n.samples
nsamples
psi_post <- matrix(NA, nrow=nsamples, ncol=100)
for (i in 1:100){
psi_post[,i] <- jags_covs$sims.list$alpha_0 + jags_covs$sims.list$alpha_ele * elev_std[i]
}
psi_post <- plogis(psi_post)
psi_post[1:5,1:5]
| 0.1538320 | 0.1567736 | 0.1597607 | 0.1627938 | 0.1658732 |
| 0.1874531 | 0.1904333 | 0.1934497 | 0.1965022 | 0.1995909 |
| 0.2345080 | 0.2366951 | 0.2388962 | 0.2411113 | 0.2433404 |
| 0.2296686 | 0.2315533 | 0.2334487 | 0.2353549 | 0.2372719 |
| 0.1844937 | 0.1872250 | 0.1899872 | 0.1927805 | 0.1956050 |
psi_lower <- apply(psi_post, 2, quantile, 0.025) # lower bound of 95% ci
psi_upper <- apply(psi_post, 2, quantile, 0.975) # upper bound of 95% ci
head(psi_lower)
head(psi_upper)
Plot in base R
plot(elev_seq, psi_est, type='l', xlab="Elevation", ylab="psi", ylim=c(0,0.8))
polygon(c(elev_seq, rev(elev_seq)),
c(psi_lower, rev(psi_upper)),
col='grey90', border=NA)
lines(elev_seq, psi_est)
Plot in ggplot2
library(ggplot2)
plot_data <- data.frame(elev=elev_seq, psi=psi_est, lower=psi_lower, upper=psi_upper)
head(plot_data)
| elev | psi | lower | upper | |
|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <dbl> | |
| 1 | 250.0000 | 0.1645080 | 0.09931026 | 0.2518344 |
| 2 | 275.2525 | 0.1672765 | 0.10193656 | 0.2541165 |
| 3 | 300.5051 | 0.1700822 | 0.10447124 | 0.2563221 |
| 4 | 325.7576 | 0.1729252 | 0.10730957 | 0.2583171 |
| 5 | 351.0101 | 0.1758056 | 0.11047750 | 0.2599927 |
| 6 | 376.2626 | 0.1787237 | 0.11390621 | 0.2619797 |
ggplot(data=plot_data, aes(x=elev)) +
geom_ribbon(aes(ymin=lower, ymax=upper), alpha=0.2) +
geom_line(aes(y=psi))
Right now we only estimate $\psi$ at $t=1$!
What about for other times?
For this we need a derived parameter, that is a function of other parameters already in the model
$\psi_{t}$ at times $t>1$ is a function of occupancy in the previous time step, extinction, and colonization
This math is very similar to JAGS code we already have in our model:
for (t in 2:T){
z[i, t] ~ dbern(z[i, t-1] * (1 - eps) + (1 - z[i, t-1]) * gam[i,t])
}
We just need to replace z with psi.
# Derive psi at t>1
for (i in 1:M){
for (t in 2:T){
psi[i, t] <- psi[i, t-1] * (1 - eps) + (1 - psi[i, t-1]) * gam[i,t]
}
}
psi is now indexed by both i and tcat("
model{
# State submodel
for (i in 1:M){
z[i, 1] ~ dbern(psi[i,1])
for (t in 2:T){
z[i, t] ~ dbern(z[i, t-1] * (1 - eps) + (1 - z[i, t-1]) * gam[i,t])
}
}
# ----------------------
# Calculate psi, gamma, eps
for (i in 1:M){
logit(psi[i,1]) <- alpha_0 + alpha_ele * ele[i]
for (t in 2:T){
logit(gam[i,t]) <- beta_0 + beta_forest * forest[i]
}
}
# psi priors
alpha_0 ~ dlogis(0, 1)
alpha_ele ~ dlogis(0, 1)
# gamma priors
beta_0 ~ dlogis(0, 1)
beta_forest ~ dlogis(0, 1)
# eps priors
eps ~ dunif(0, 1)
# -----------------------
# Detection submodel
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
y[i,j,t] ~ dbern(p[i,j,t] * z[i, t])
}
}
}
# -----------------------
# Impute missing dates
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
date[i,j,t] ~ dnorm(mu_date, tau_date)
}
}
}
mu_date ~ dnorm(0, 0.01)
sigma_date ~ dunif(0, 10)
tau_date <- pow(sigma_date, -2)
# -----------------------
# Calculate p
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
logit(p[i,j,t]) <- rho_0 + rho_date * date[i,j,t]
}
}
}
rho_0 ~ dlogis(0, 1)
rho_date ~ dlogis(0, 1)
# -----------------------
# Derive psi at t>1
for (i in 1:M){
for (t in 2:T){
psi[i, t] <- psi[i, t-1] * (1 - eps) + (1 - psi[i, t-1]) * gam[i,t]
}
}
}
", file = "dynocc_covs.bugs")
pars <- c("alpha_0", "alpha_ele", "beta_0", "beta_forest",
"rho_0", "rho_date", "mu_date", "sigma_date", "psi")
inits <- function() list(z = matrix(1, M, T))
library(jagsUI)
jags_covs <- jags(data = jags_data,
inits = inits,
parameters.to.save = pars,
model.file = "dynocc_covs.bugs",
n.chains = 3, n.iter = 1000, n.burnin = 500,
parallel=TRUE)
Processing function input....... Done. Beginning parallel processing using 3 cores. Console output will be suppressed. Parallel processing completed. Calculating statistics....... Done.
dim(jags_covs$sims.list$psi)
Plot psi over time for sites 1-5
psi_est <- psi_lower <- psi_upper <- matrix(NA, nrow=5, ncol=9)
for (t in 1:9){
for (i in 1:5){
psi_it <- jags_covs$sims.list$psi[,i,t]
psi_est[i,t] <- mean(psi_it)
psi_lower[i,t] <- quantile(psi_it, 0.025)
psi_upper[i,t] <- quantile(psi_it, 0.975)
}
}
cols <- rainbow(5)
plot(1:9, psi_est[1,1:9], ylim=c(0,0.5), type='o', pch=19,
col = cols[1],
ylab="psi", xlab="time")
for (i in 2:5){
lines(1:9, psi_est[i,1:9], type='o', pch=19, col=cols[i])
}
Use JAGS to fit the following model:
Also try plotting the effect of elevation on extinction.
JAGS model
cat("
model{
# State submodel
for (i in 1:M){
z[i, 1] ~ dbern(psi[i,1])
for (t in 2:T){
z[i, t] ~ dbern(z[i, t-1] * (1 - eps[i,t]) + (1 - z[i, t-1]) * gam[i,t])
}
}
# ----------------------
# Calculate psi, gamma, eps
for (i in 1:M){
logit(psi[i,1]) <- alpha_0 + alpha_ele * ele[i] + alpha_ele2 * ele2[i] + alpha_forest * forest[i]
for (t in 2:T){
logit(gam[i,t]) <- beta_0 + beta_forest * forest[i]
logit(eps[i,t]) <- eps_0 + eps_ele * ele[i]
}
}
# psi priors
alpha_0 ~ dlogis(0, 1)
alpha_ele ~ dlogis(0, 1)
alpha_ele2 ~ dlogis(0, 1)
alpha_forest ~ dlogis(0, 1)
# gamma priors
beta_0 ~ dlogis(0, 1)
beta_forest ~ dlogis(0, 1)
# eps priors
eps_0 ~ dlogis(0,1)
eps_ele ~ dlogis(0, 1)
# -----------------------
# Detection submodel
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
y[i,j,t] ~ dbern(p[i,j,t] * z[i, t])
}
}
}
# -----------------------
# Impute missing dates
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
date[i,j,t] ~ dnorm(mu_date, tau_date)
}
}
}
mu_date ~ dnorm(0, 0.01)
sigma_date ~ dunif(0, 10)
tau_date <- pow(sigma_date, -2)
# -----------------------
# Calculate p
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
logit(p[i,j,t]) <- rho_0 + rho_date * date[i,j,t]
}
}
}
rho_0 ~ dlogis(0, 1)
rho_date ~ dlogis(0, 1)
# -----------------------
# Derive psi at t>1
for (i in 1:M){
for (t in 2:T){
psi[i, t] <- psi[i, t-1] * (1 - eps[i,t]) + (1 - psi[i, t-1]) * gam[i,t]
}
}
}
", file = "dynocc_exercise.bugs")
jags_data$ele2 <- jags_data$ele^2
pars <- c("alpha_0", "alpha_ele", "alpha_ele2", "alpha_forest", "beta_0", "beta_forest",
"rho_0", "rho_date", "eps_0", "eps_ele")
inits <- function() list(z = matrix(1, M, T))
library(jagsUI)
jags_exercise <- jags(data = jags_data,
inits = inits,
parameters.to.save = pars,
model.file = "dynocc_exercise.bugs",
n.chains = 3, n.iter = 1000, n.burnin = 500,
parallel=TRUE)
Processing function input....... Done. Beginning parallel processing using 3 cores. Console output will be suppressed. Parallel processing completed. Calculating statistics....... Done.
jags_exercise
JAGS output for model 'dynocc_exercise.bugs', generated by jagsUI.
Estimates based on 3 chains of 1000 iterations,
adaptation = 100 iterations (sufficient),
burn-in = 500 iterations and thin rate = 1,
yielding 1500 total samples from the joint posterior.
MCMC ran in parallel for 1.679 minutes at time 2023-07-18 13:57:18.6436.
mean sd 2.5% 50% 97.5% overlap0 f
alpha_0 0.019 0.292 -0.567 0.015 0.617 TRUE 0.519
alpha_ele 1.340 0.258 0.846 1.339 1.872 FALSE 1.000
alpha_ele2 -1.262 0.290 -1.855 -1.253 -0.721 FALSE 1.000
alpha_forest 0.424 0.205 0.036 0.427 0.851 FALSE 0.983
beta_0 -1.781 0.102 -1.997 -1.778 -1.595 FALSE 1.000
beta_forest 0.715 0.091 0.536 0.716 0.899 FALSE 1.000
rho_0 0.120 0.058 0.005 0.121 0.228 FALSE 0.980
rho_date 0.181 0.046 0.091 0.181 0.266 FALSE 1.000
eps_0 -1.191 0.148 -1.488 -1.192 -0.913 FALSE 1.000
eps_ele -1.062 0.178 -1.414 -1.052 -0.714 FALSE 1.000
deviance 22365.496 68.972 22238.201 22362.378 22500.734 FALSE 1.000
Rhat n.eff
alpha_0 1.035 68
alpha_ele 1.019 130
alpha_ele2 1.045 53
alpha_forest 1.006 394
beta_0 1.015 147
beta_forest 1.003 632
rho_0 1.004 608
rho_date 1.001 1500
eps_0 1.011 184
eps_ele 1.008 301
deviance 1.012 173
Successful convergence based on Rhat values (all < 1.1).
Rhat is the potential scale reduction factor (at convergence, Rhat=1).
For each parameter, n.eff is a crude measure of effective sample size.
overlap0 checks if 0 falls in the parameter's 95% credible interval.
f is the proportion of the posterior with the same sign as the mean;
i.e., our confidence that the parameter is positive or negative.
DIC info: (pD = var(deviance)/2)
pD = 2354 and DIC = 24719.48
DIC is an estimate of expected predictive error (lower is better).
psi_est <- jags_exercise$mean$eps_0 + jags_exercise$mean$eps_ele * elev_std # 3.
psi_est <- plogis(psi_est)
psi_post <- matrix(NA, nrow=nsamples, ncol=100)
for (i in 1:100){
psi_post[,i] <- jags_exercise$sims.list$eps_0 + jags_exercise$sims.list$eps_ele * elev_std[i]
}
psi_post <- plogis(psi_post)
psi_lower <- apply(psi_post, 2, quantile, 0.025) # lower bound of 95% ci
psi_upper <- apply(psi_post, 2, quantile, 0.975) # upper bound of 95% ci
plot(elev_seq, psi_est, type='l', xlab="Elevation", ylab="eps", ylim=c(0,0.8))
polygon(c(elev_seq, rev(elev_seq)),
c(psi_lower, rev(psi_upper)),
col='grey90', border=NA)
lines(elev_seq, psi_est)
unmarked¶Look at the data again:
head(crossbill)
| id | ele | forest | surveys | det991 | det992 | det993 | det001 | det002 | det003 | ⋯ | date043 | date051 | date052 | date053 | date061 | date062 | date063 | date071 | date072 | date073 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | ⋯ | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | |
| 1 | 1 | 450 | 3 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 74 | 35 | 52 | 73 | 19 | 56 | 65 | 20 | 46 | 54 |
| 2 | 2 | 450 | 21 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 59 | 20 | 35 | 65 | 15 | 28 | 55 | 19 | 50 | 56 |
| 3 | 3 | 1050 | 32 | 3 | NA | NA | NA | 0 | 0 | 0 | ⋯ | 95 | 24 | 59 | 87 | 23 | 62 | 90 | 22 | 55 | 61 |
| 4 | 4 | 950 | 9 | 3 | 0 | 0 | 0 | 1 | 0 | 0 | ⋯ | 74 | 22 | 60 | 76 | 22 | 57 | 78 | 22 | 49 | 70 |
| 5 | 5 | 1150 | 35 | 3 | 0 | 0 | 0 | 1 | 1 | 1 | ⋯ | 69 | 28 | 51 | 73 | 23 | 42 | 70 | 15 | 43 | 60 |
| 6 | 6 | 550 | 2 | 3 | NA | NA | NA | 0 | 0 | 0 | ⋯ | 64 | 27 | 39 | 55 | 29 | 40 | 52 | 21 | 40 | 49 |
Grab the detection/non-detection data.
This time we want to keep it in "wide" format (M x TJ), because that's what unmarked uses.
ycols <- grepl("det", names(crossbill))
y <- crossbill[,ycols]
names(y)
head(y)
| det991 | det992 | det993 | det001 | det002 | det003 | det011 | det012 | det013 | det021 | ⋯ | det043 | det051 | det052 | det053 | det061 | det062 | det063 | det071 | det072 | det073 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | ⋯ | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | NA | NA | NA | 0 | 0 | 0 | 1 | 1 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 4 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 |
| 5 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | ⋯ | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
| 6 | NA | NA | NA | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Get the site covariates:
site_covs <- crossbill[,1:4]
head(site_covs)
| id | ele | forest | surveys | |
|---|---|---|---|---|
| <int> | <int> | <int> | <int> | |
| 1 | 1 | 450 | 3 | 3 |
| 2 | 2 | 450 | 21 | 3 |
| 3 | 3 | 1050 | 32 | 3 |
| 4 | 4 | 950 | 9 | 3 |
| 5 | 5 | 1150 | 35 | 3 |
| 6 | 6 | 550 | 2 | 3 |
And the observation covariate (date):
datecols <- grepl("date", names(crossbill))
date <- crossbill[,datecols]
names(date)
Make the unmarkedFrame
Special unmarked frame type: unmarkedMultFrame
We need to specify the number of seasons (numPrimary)
umf <- unmarkedMultFrame(y = y, siteCovs = site_covs, obsCovs = list(date = date), numPrimary = 9)
summary(umf)
unmarkedFrame Object
267 sites
Maximum number of observations per site: 27
Mean number of observations per site: 24.99
Number of primary survey periods: 9
Number of secondary survey periods: 3
Sites with at least one detection: 165
Tabulation of y observations:
0 1 <NA>
5311 1362 536
Site-level covariates:
id ele forest surveys
Min. : 1.0 Min. : 250 Min. : 0.00 Min. :2.000
1st Qu.: 67.5 1st Qu.: 550 1st Qu.: 8.50 1st Qu.:3.000
Median :134.0 Median :1150 Median :33.00 Median :3.000
Mean :134.0 Mean :1189 Mean :34.75 Mean :2.824
3rd Qu.:200.5 3rd Qu.:1850 3rd Qu.:56.50 3rd Qu.:3.000
Max. :267.0 Max. :2750 Max. :99.00 Max. :3.000
Observation-level covariates:
date
Min. : 10.00
1st Qu.: 37.00
Median : 55.00
Mean : 54.29
3rd Qu.: 71.00
Max. :118.00
NA's :567
We will use the colext function.
We need to specify R formulas for four parameters:
psiformulaepsilonformulagammaformulapformulaFor the null model, all four will be intercept-only, or ~1.
mod_null <- colext(psiformula = ~1, epsilonformula = ~1, gammaformula = ~1, pformula = ~1, umf)
summary(mod_null)
Call:
colext(psiformula = ~1, gammaformula = ~1, epsilonformula = ~1,
pformula = ~1, data = umf)
Initial (logit-scale):
Estimate SE z P(>|z|)
-0.795 0.154 -5.16 2.52e-07
Colonization (logit-scale):
Estimate SE z P(>|z|)
-1.96 0.111 -17.6 1.63e-69
Extinction (logit-scale):
Estimate SE z P(>|z|)
-1.56 0.152 -10.2 1.37e-24
Detection (logit-scale):
Estimate SE z P(>|z|)
0.0668 0.0564 1.18 0.236
AIC: 5197.441
Number of sites: 267
optim convergence code: 0
optim iterations: 53
Bootstrap iterations: 0
plogis(-0.795)
Or use the backTransform function:
backTransform(mod_null, 'psi')
Backtransformed linear combination(s) of Initial estimate(s)
Estimate SE LinComb (Intercept)
0.311 0.033 -0.795 1
Transformation: logistic
backTransform(mod_null, 'det')
Backtransformed linear combination(s) of Detection estimate(s)
Estimate SE LinComb (Intercept)
0.517 0.0141 0.0668 1
Transformation: logistic
confint(backTransform(mod_null, 'psi'))
| 0.025 | 0.975 | |
|---|---|---|
| 0.2502368 | 0.3792124 |
Or use the predict function:
head(predict(mod_null, 'psi'))
| Predicted | SE | lower | upper | |
|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <dbl> | |
| 1 | 0.3110701 | 0.03304615 | 0.2502368 | 0.3792124 |
| 2 | 0.3110701 | 0.03304615 | 0.2502368 | 0.3792124 |
| 3 | 0.3110701 | 0.03304615 | 0.2502368 | 0.3792124 |
| 4 | 0.3110701 | 0.03304615 | 0.2502368 | 0.3792124 |
| 5 | 0.3110701 | 0.03304615 | 0.2502368 | 0.3792124 |
| 6 | 0.3110701 | 0.03304615 | 0.2502368 | 0.3792124 |
cbind(JAGS=unlist(jags_null$mean[c(1,3,2,4)]),
unmarked=plogis(coef(mod_null)))
| JAGS | unmarked | |
|---|---|---|
| psi | 0.3127166 | 0.3110701 |
| gam | 0.1246883 | 0.1231143 |
| eps | 0.1772180 | 0.1736495 |
| p | 0.5180958 | 0.5166957 |
(same as with JAGS)
mod_covs <- colext(psiformula = ~scale(ele), epsilonformula = ~1,
gammaformula = ~scale(forest), pformula = ~scale(date), umf)
summary(mod_covs)
Warning message: “Some observations have been discarded because correspoding covariates were missing.”
Call:
colext(psiformula = ~scale(ele), gammaformula = ~scale(forest),
epsilonformula = ~1, pformula = ~scale(date), data = umf)
Initial (logit-scale):
Estimate SE z P(>|z|)
(Intercept) -0.890 0.157 -5.65 1.60e-08
scale(ele) 0.506 0.151 3.34 8.27e-04
Colonization (logit-scale):
Estimate SE z P(>|z|)
(Intercept) -1.857 0.1086 -17.10 1.46e-65
scale(forest) 0.705 0.0908 7.77 8.14e-15
Extinction (logit-scale):
Estimate SE z P(>|z|)
-1.41 0.146 -9.66 4.62e-22
Detection (logit-scale):
Estimate SE z P(>|z|)
(Intercept) 0.101 0.0566 1.78 7.53e-02
scale(date) 0.219 0.0458 4.79 1.69e-06
AIC: 5103.782
Number of sites: 267
optim convergence code: 0
optim iterations: 55
Bootstrap iterations: 0
Get estimates on probability scale
backTransform(mod_covs, 'psi')
Error in .local(obj, ...): Cannot directly backTransform an unmarkedEstimate with length > 1.
Traceback:
1. backTransform(mod_covs, "psi")
2. backTransform(mod_covs, "psi")
3. .local(obj, ...)
4. stop("Cannot directly backTransform an unmarkedEstimate with length > 1.")
head(predict(mod_covs, "psi"))
Warning message: “Some observations have been discarded because correspoding covariates were missing.”
| Predicted | SE | lower | upper | |
|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <dbl> | |
| 1 | 0.1863059 | 0.03803834 | 0.1228206 | 0.2724154 |
| 2 | 0.1863059 | 0.03803834 | 0.1228206 | 0.2724154 |
| 3 | 0.2689311 | 0.03249979 | 0.2101468 | 0.3371403 |
| 4 | 0.2536809 | 0.03304904 | 0.1944722 | 0.3236729 |
| 5 | 0.2847483 | 0.03239262 | 0.2256995 | 0.3522185 |
| 6 | 0.1985835 | 0.03706077 | 0.1356867 | 0.2811520 |
head(predict(mod_covs, "det"))
Warning message: “Some observations have been discarded because correspoding covariates were missing.”
| Predicted | SE | lower | upper | |
|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <dbl> | |
| 1 | 0.4746512 | 0.01801346 | 0.4395302 | 0.5100248 |
| 2 | 0.5368267 | 0.01415879 | 0.5089904 | 0.5644353 |
| 3 | 0.5516519 | 0.01478593 | 0.5225291 | 0.5804248 |
| 4 | 0.4721677 | 0.01833149 | 0.4364438 | 0.5081789 |
| 5 | 0.5094962 | 0.01466858 | 0.4807466 | 0.5381831 |
| 6 | 0.5590319 | 0.01530854 | 0.5288481 | 0.5887856 |
Compare with JAGS
cbind(JAGS=unlist(jags_covs$mean[1:6]),
unmarked=coef(mod_covs)[c(1:4,6:7)])
| JAGS | unmarked | |
|---|---|---|
| alpha_0 | -0.8942170 | -0.8899519 |
| alpha_ele | 0.5087193 | 0.5063164 |
| beta_0 | -1.8539007 | -1.8570536 |
| beta_forest | 0.7022862 | 0.7047543 |
| rho_0 | 0.1052687 | 0.1006335 |
| rho_date | 0.2205359 | 0.2194877 |
Procedure applying parameteric bootstrap:
If model fits well, well they should be similar.
Default GOF test in unmarked is sum of squared residuals (SSE).
We can do parametric bootstrap with the parboot function.
gof <- suppressWarnings(parboot(mod_covs, statistic=SSE, nsim=30))
gof
plot(gof)
Call: parboot(object = mod_covs, statistic = SSE, nsim = 30)
Parametric Bootstrap Statistics:
t0 mean(t0 - t_B) StdDev(t0 - t_B) Pr(t_B > t0)
SSE 1004 -35.8 33.1 0.806
t_B quantiles:
0% 2.5% 25% 50% 75% 97.5% 100%
[1,] 964 973 1018 1039 1065 1092 1097
t0 = Original statistic computed from data
t_B = Vector of bootstrap samples
Alternative statistic: MacKenzie-Bailey chi-square (MacKenzie & Bailey 2004, JABES)
Test is based on comparing expected vs. observed frequency of specific detection histories (001, 110, 101, etc.)
library(AICcmodavg)
mb_test <- suppressWarnings(mb.gof.test(mod_covs, nsim=30))
Thoughts on GOF tests
unmarked's approach to calculating residuals (and thus SSE) is crudeNAsUse the projected function
Mean for each $t$ across sites:
unmarked::projected(mod_covs)
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|
| unoccupied | 0.6989645 | 0.6479712 | 0.6216321 | 0.6078406 | 0.6005742 | 0.5967567 | 0.5947811 | 0.5937927 | 0.593331 |
| occupied | 0.3010355 | 0.3520288 | 0.3783679 | 0.3921594 | 0.3994258 | 0.4032433 | 0.4052189 | 0.4062073 | 0.406669 |
For each site separately:
proj <- unmarked::projected(mod_covs, mean = FALSE)
dim(proj)
proj <- t(proj[2,,]) # just "occupied" probability
proj[1:5,]
| 0.1863059 | 0.2025223 | 0.2144960 | 0.2233369 | 0.2298648 | 0.2346847 | 0.2382436 | 0.2408713 | 0.2428115 |
| 0.1863059 | 0.2302722 | 0.2612360 | 0.2830426 | 0.2984001 | 0.3092159 | 0.3168330 | 0.3221974 | 0.3259754 |
| 0.2689311 | 0.3089392 | 0.3359947 | 0.3542911 | 0.3666640 | 0.3750312 | 0.3806896 | 0.3845160 | 0.3871037 |
| 0.2536809 | 0.2596870 | 0.2640620 | 0.2672489 | 0.2695703 | 0.2712613 | 0.2724930 | 0.2733903 | 0.2740438 |
| 0.2847483 | 0.3258822 | 0.3533398 | 0.3716683 | 0.3839029 | 0.3920697 | 0.3975212 | 0.4011602 | 0.4035893 |
Plot for first 5 sites
cols <- rainbow(5)
plot(1:9, proj[1,1:9], ylim=c(0,0.5), type='o', pch=19,
col = cols[1],
ylab="psi", xlab="time")
for (i in 2:5){
lines(1:9, proj[i,1:9], type='o', pch=19, col=cols[i])
}
Getting CIs on these estimates
Use parboot function for parametric bootstrapping
Steps:
statistic function to each dataset?parboot
proj_output <- function(mod){
unmarked::projected(mod)[2,] # just save "occupied" only
}
pb <- suppressWarnings(parboot(mod_covs, statistic=proj_output, nsim=30))
pb
Call: parboot(object = mod_covs, statistic = proj_output, nsim = 30)
Parametric Bootstrap Statistics:
t0 mean(t0 - t_B) StdDev(t0 - t_B) Pr(t_B > t0)
1 0.301 6.69e-03 0.0319 0.613
2 0.352 3.94e-03 0.0249 0.613
3 0.378 2.14e-03 0.0234 0.613
4 0.392 8.93e-04 0.0238 0.613
5 0.399 1.43e-05 0.0245 0.581
6 0.403 -6.08e-04 0.0252 0.548
7 0.405 -1.05e-03 0.0257 0.516
8 0.406 -1.36e-03 0.0261 0.484
9 0.407 -1.58e-03 0.0264 0.484
t_B quantiles:
0% 2.5% 25% 50% 75% 97.5% 100%
1 0.21 0.23 0.28 0.29 0.31 0.35 0.36
2 0.28 0.31 0.33 0.34 0.36 0.40 0.41
3 0.33 0.34 0.36 0.37 0.38 0.43 0.44
4 0.35 0.36 0.38 0.39 0.40 0.44 0.46
5 0.36 0.36 0.39 0.40 0.41 0.45 0.47
6 0.36 0.37 0.39 0.40 0.42 0.46 0.48
7 0.36 0.37 0.39 0.40 0.42 0.46 0.48
8 0.36 0.37 0.39 0.41 0.42 0.46 0.48
9 0.36 0.37 0.39 0.41 0.42 0.46 0.49
t0 = Original statistic computed from data
t_B = Vector of bootstrap samples
head(pb@t.star)
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|
| 0.2819571 | 0.3290566 | 0.3537790 | 0.3669009 | 0.3739101 | 0.3776565 | 0.3796460 | 0.3806853 | 0.3812114 |
| 0.2892106 | 0.3492453 | 0.3819693 | 0.4001991 | 0.4105348 | 0.4164738 | 0.4199181 | 0.4219258 | 0.4230968 |
| 0.3021260 | 0.3456247 | 0.3704970 | 0.3848454 | 0.3931749 | 0.3980267 | 0.4008527 | 0.4024917 | 0.4034333 |
| 0.3305423 | 0.3887629 | 0.4197473 | 0.4365379 | 0.4457576 | 0.4508602 | 0.4536897 | 0.4552508 | 0.4560998 |
| 0.3440246 | 0.3760141 | 0.3946504 | 0.4055333 | 0.4118947 | 0.4156106 | 0.4177751 | 0.4190292 | 0.4197493 |
| 0.2725891 | 0.3386260 | 0.3752054 | 0.3958530 | 0.4077027 | 0.4146015 | 0.4186675 | 0.4210884 | 0.4225420 |
Calculate 95% CIs
proj_lower <- apply(pb@t.star, 2, quantile, 0.025)
proj_upper <- apply(pb@t.star, 2, quantile, 0.975)
proj_est <- projected(mod_covs)[2,]
Make plot
plot(1:9, proj_est, ylim=c(0.2,0.5), type='o', pch=19,
xlab="time", ylab="psi")
segments(1:9, proj_lower, 1:9, proj_upper)
Use matching map data in Switzerland dataset.
data(Switzerland)
head(Switzerland)
| x | y | elevation | forest | water | |
|---|---|---|---|---|---|
| <int> | <int> | <int> | <int> | <int> | |
| 1 | 910942 | 54276 | 340 | 6 | 0 |
| 2 | 910942 | 55276 | 340 | 11 | 1 |
| 3 | 911942 | 54276 | 380 | 4 | 0 |
| 4 | 911942 | 55276 | 390 | 72 | 3 |
| 5 | 911942 | 56276 | 357 | 9 | 7 |
| 6 | 911942 | 57276 | 357 | 5 | 5 |
Convert dataset to raster format
(technically need to specify projection, but we won't worry about it)
library(terra)
sw_raster <- rast(Switzerland, type="xyz")
names(sw_raster) <- c("ele", "forest", "water")
plot(sw_raster)
terra 1.7.39
pr_map <- predict(mod_covs, type="psi", newdata=sw_raster)
plot(pr_map)
Use unmarked to fit the following models:
Model 1:
Model 2:
If using unmarked, also make a map of initial occupancy using the Switzerland dataset for both models
Hint: you can use the I() function in formulas to run a calculation, such as
~I(a^2)
mod_exercise1 <- colext(psiformula = ~scale(ele)+scale(forest),
gammaformula = ~scale(forest),
epsilonformula = ~scale(ele),
pformula = ~scale(date),
data = umf)
mod_exercise1
Warning message: “Some observations have been discarded because correspoding covariates were missing.”
Call:
colext(psiformula = ~scale(ele) + scale(forest), gammaformula = ~scale(forest),
epsilonformula = ~scale(ele), pformula = ~scale(date), data = umf)
Initial:
Estimate SE z P(>|z|)
(Intercept) -1.059 0.182 -5.81 6.10e-09
scale(ele) 0.715 0.182 3.92 8.76e-05
scale(forest) 0.856 0.177 4.83 1.33e-06
Colonization:
Estimate SE z P(>|z|)
(Intercept) -1.746 0.1005 -17.36 1.63e-67
scale(forest) 0.709 0.0878 8.08 6.47e-16
Extinction:
Estimate SE z P(>|z|)
(Intercept) -1.14 0.144 -7.91 2.51e-15
scale(ele) -1.09 0.178 -6.10 1.05e-09
Detection:
Estimate SE z P(>|z|)
(Intercept) 0.145 0.0548 2.64 0.008266
scale(date) 0.177 0.0468 3.79 0.000152
AIC: 5039.136
pr_exercise1 <- predict(mod_exercise1, 'psi', newdata=sw_raster)
plot(pr_exercise1[[1]])
mod_exercise2 <- colext(psiformula = ~scale(ele)+scale(forest)+I(scale(ele)^2),
gammaformula = ~scale(forest),
epsilonformula = ~scale(ele),
pformula = ~scale(date),
data = umf)
mod_exercise2
Warning message: “Some observations have been discarded because correspoding covariates were missing.”
Call:
colext(psiformula = ~scale(ele) + scale(forest) + I(scale(ele)^2),
gammaformula = ~scale(forest), epsilonformula = ~scale(ele),
pformula = ~scale(date), data = umf)
Initial:
Estimate SE z P(>|z|)
(Intercept) -0.000405 0.298 -0.00136 9.99e-01
scale(ele) 1.349477 0.273 4.95019 7.41e-07
scale(forest) 0.424758 0.205 2.07119 3.83e-02
I(scale(ele)^2) -1.226293 0.299 -4.10108 4.11e-05
Colonization:
Estimate SE z P(>|z|)
(Intercept) -1.790 0.1061 -16.87 7.16e-64
scale(forest) 0.715 0.0895 7.99 1.37e-15
Extinction:
Estimate SE z P(>|z|)
(Intercept) -1.19 0.150 -7.95 1.86e-15
scale(ele) -1.09 0.181 -6.03 1.65e-09
Detection:
Estimate SE z P(>|z|)
(Intercept) 0.122 0.0562 2.17 0.030286
scale(date) 0.172 0.0466 3.70 0.000216
AIC: 5017.826
plotEffects(mod_exercise2, "psi", "ele")
pr_exercise2 <- predict(mod_exercise2, 'psi', newdata=sw_raster)
plot(pr_exercise2[[1]])
Use the fitList and modSel functions.
fl <- fitList(mod_covs=mod_covs, mod_exercise1=mod_exercise1, mod_exercise2=mod_exercise2)
modSel(fl)
Warning message: “Some observations have been discarded because correspoding covariates were missing.” Warning message: “Some observations have been discarded because correspoding covariates were missing.” Warning message: “Some observations have been discarded because correspoding covariates were missing.” Warning message: “Some observations have been discarded because correspoding covariates were missing.”
nPars AIC delta AICwt cumltvWt mod_exercise2 10 5017.83 0.00 1.0e+00 1.00 mod_exercise1 9 5039.14 21.31 2.4e-05 1.00 mod_covs 7 5103.78 85.96 2.2e-19 1.00
unmarked-style interfaceStanlibrary(ubms)
mod_stan <- stan_colext(psiformula = ~scale(ele), epsilonformula = ~1,
gammaformula = ~scale(forest), pformula = ~scale(date), umf,
chains = 3, iter = 300, refresh=0)
Attaching package: ‘ubms’
The following objects are masked from ‘package:unmarked’:
fitList, projected
The following object is masked from ‘package:base’:
gamma
Warning message:
“Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess”
Warning message:
“Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess”
mod_stan
traceplot(mod_stan)
Call:
stan_colext(psiformula = ~scale(ele), gammaformula = ~scale(forest),
epsilonformula = ~1, pformula = ~scale(date), data = umf,
chains = 3, iter = 300, refresh = 0)
Occupancy (logit-scale):
Estimate SD 2.5% 97.5% n_eff Rhat
(Intercept) -0.876 0.175 -1.249 -0.546 462 1.004
scale(ele) 0.498 0.158 0.184 0.804 705 0.995
Colonization (logit-scale):
Estimate SD 2.5% 97.5% n_eff Rhat
(Intercept) -1.859 0.1069 -2.049 -1.659 400 0.999
scale(forest) 0.706 0.0893 0.529 0.864 682 0.999
Extinction (logit-scale):
Estimate SD 2.5% 97.5% n_eff Rhat
-1.41 0.138 -1.68 -1.15 448 0.995
Detection (logit-scale):
Estimate SD 2.5% 97.5% n_eff Rhat
(Intercept) 0.099 0.0563 -0.0102 0.206 399 0.997
scale(date) 0.222 0.0453 0.1276 0.307 836 1.001
LOOIC: 5110.618
Runtime: 36.793 sec
'pars' not specified. Showing first 10 parameters by default.
Compare unmarked and ubms output
cbind(unmarked=coef(mod_covs), ubms=coef(mod_stan))
| unmarked | ubms | |
|---|---|---|
| psi(Int) | -0.8899519 | -0.87648184 |
| psi(scale(ele)) | 0.5063164 | 0.49840203 |
| col(Int) | -1.8570536 | -1.85921071 |
| col(scale(forest)) | 0.7047543 | 0.70561970 |
| ext(Int) | -1.4072120 | -1.41252267 |
| p(Int) | 0.1006335 | 0.09903921 |
| p(scale(date)) | 0.2194877 | 0.22245250 |
Some interesting helper functions:
plot_posteriors(mod_stan)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot(mod_stan)
gof_test <- gof(mod_stan)
gof_test
plot(gof_test)
MacKenzie-Bailey Chi-square Point estimate = 404.289 Posterior predictive p = 0
plot_effects(mod_stan, 'state')
?stan_colext