unmarked
Site occupancy can change over time (depending on your definition of time!)
Examples:
We want to understand what drives these patterns of change, while accounting for imperfect detection.
Solution:
"Dynamic" or multi-season occupancy model (MacKenzie et al. 2003 Ecology)
Model overview diagram
Review parameters of interest
State submodel:
Detection submodel:
Parameters:
$z_{i, 1}$: Initial occupancy state of site $i$
$\psi$: Initial occupancy probability at time $t=1$
Math:
$$ z_{i,1} \sim \mathrm{Bernoulli}(\psi) $$Parameters:
$z_{i,t}$: Occupancy state of site $i$ in time $t$
$\epsilon$: Probability of extinction, i.e. probability site occupied in time $t-1$ is unoccupied in time $t$
$\phi$: Probability of persistence; $1-\epsilon$
$\gamma$: Probability of colonization; i.e. probability non-occupied site becomes occupied in time $t$
Math:
$$ z_{i,t} \sim \begin{cases} \mathrm{Bernoulli}(1 - \epsilon), & \mathrm{if} z_{i, t-1} = 1 \\ \mathrm{Bernoulli}(\gamma), & \mathrm{if} z_{i, t-1} = 0 \end{cases} $$Alternatively:
$$ z_{i,t} \sim \mathrm{Bernoulli}(z_{i,t-1}\cdot(1-\epsilon) + (1-z_{i,t-1})\cdot\gamma) $$When $z_{i,t-1} = 1$:
$$ z_{i,t} \sim \mathrm{Bernoulli}(1\cdot(1-\epsilon) + 0\cdot\gamma) = $$$$z_{i,t} \sim \mathrm{Bernoulli}(1-\epsilon)$$When $z_{i,t-1} = 0$:
$$ z_{i,t} \sim \mathrm{Bernoulli}(0\cdot(1-\epsilon) + 1\cdot\gamma) = $$$$z_{i,t} \sim \mathrm{Bernoulli}(\gamma)$$Parameters:
$p$: Probability of detection
$z_{i,t}$: Occupancy state of site $i$ in time $t$
Data:
$y_{i,j,t}$: Detection/non-detection data at site $i$, time $t$, occasion $j$
Math:
$$ y_{i,j,t} \sim \mathrm{Bernoulli}(p \cdot z_{i,t}) $$When $z_{i,t} = 1$:
$$y_{i,j,t} \sim \mathrm{Bernoulli}(p\cdot1)$$When $z_{i,t} = 0$:
$$y_{i,j,t} \sim \mathrm{Bernoulli}(p\cdot0) = 0$$MacKenzie, D. I., Nichols, J. D., Hines, J. E., Knutson, M. G., & Franklin, A. B. (2003). Estimating site occupancy, colonization, and local extinction when a species is detected imperfectly. Ecology, 84(8), 2200-2207.
Royle, J. A., & Kéry, M. (2007). A Bayesian state‐space formulation of dynamic occupancy models. Ecology, 88(7), 1813-1823.
AHM vol 2, Chapter 4
Loxia curvirostra, European (common) crossbill
library(unmarked)
data(crossbill)
head(crossbill)
id | ele | forest | surveys | det991 | det992 | det993 | det001 | det002 | det003 | ⋯ | date043 | date051 | date052 | date053 | date061 | date062 | date063 | date071 | date072 | date073 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
<int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | ⋯ | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | |
1 | 1 | 450 | 3 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 74 | 35 | 52 | 73 | 19 | 56 | 65 | 20 | 46 | 54 |
2 | 2 | 450 | 21 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 59 | 20 | 35 | 65 | 15 | 28 | 55 | 19 | 50 | 56 |
3 | 3 | 1050 | 32 | 3 | NA | NA | NA | 0 | 0 | 0 | ⋯ | 95 | 24 | 59 | 87 | 23 | 62 | 90 | 22 | 55 | 61 |
4 | 4 | 950 | 9 | 3 | 0 | 0 | 0 | 1 | 0 | 0 | ⋯ | 74 | 22 | 60 | 76 | 22 | 57 | 78 | 22 | 49 | 70 |
5 | 5 | 1150 | 35 | 3 | 0 | 0 | 0 | 1 | 1 | 1 | ⋯ | 69 | 28 | 51 | 73 | 23 | 42 | 70 | 15 | 43 | 60 |
6 | 6 | 550 | 2 | 3 | NA | NA | NA | 0 | 0 | 0 | ⋯ | 64 | 27 | 39 | 55 | 29 | 40 | 52 | 21 | 40 | 49 |
First, select only the columns corresponding to detection/non-detection data
ycols <- grepl("det", names(crossbill))
head(ycols)
y <- as.matrix(crossbill[,ycols])
head(y)
det991 | det992 | det993 | det001 | det002 | det003 | det011 | det012 | det013 | det021 | ⋯ | det043 | det051 | det052 | det053 | det061 | det062 | det063 | det071 | det072 | det073 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
NA | NA | NA | 0 | 0 | 0 | 1 | 1 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 |
0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | ⋯ | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
NA | NA | NA | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
The y-matrix is in "wide" format, we want to convert it to a M x J x T array
M: Sites
J: Sampling occasions
T: Time periods
M <- nrow(y)
J <- 3
T <- ncol(y) / J
y_array <- array(NA, dim = c(M, J, T))
period <- rep(1:T, each = J)
period
for (t in 1:T){
y_array[1:M,1:J,t] <- y[, which(period == t)]
}
print(y_array[1:5,,1:3])
, , 1 [,1] [,2] [,3] [1,] 0 0 0 [2,] 0 0 0 [3,] NA NA NA [4,] 0 0 0 [5,] 0 0 0 , , 2 [,1] [,2] [,3] [1,] 0 0 0 [2,] 0 0 0 [3,] 0 0 0 [4,] 1 0 0 [5,] 1 1 1 , , 3 [,1] [,2] [,3] [1,] 0 0 0 [2,] 0 0 0 [3,] 1 1 0 [4,] 0 0 0 [5,] 0 0 1
jags_data <- list(M = M, J = J, T = T,
y = y_array
)
str(jags_data)
List of 4 $ M: int 267 $ J: num 3 $ T: num 9 $ y: int [1:267, 1:3, 1:9] 0 0 NA 0 0 NA 0 0 0 0 ...
Initial occupancy state
Math:
$$ z_{i,1} \sim \mathrm{Bernoulli}(\psi) $$BUGS:
for (i in 1:M){
z[i, 1] ~ dbern(psi)
}
Occupancy state at time t
Math:
at $t = 1$:
$$ z_{i,1} \sim \mathrm{Bernoulli}(\psi) $$at $t > 1$:
$$ z_{i,t} \sim \mathrm{Bernoulli}(z_{i, t-1}\cdot(1-\epsilon) + (1-z_{i, t-1})\cdot\gamma) $$BUGS:
for (i in 1:M){
z[i, 1] ~ dbern(psi)
for (t in 2:T){
z[i, t] ~ dbern(z[i, t-1] * (1 - eps) + (1 - z[i, t-1]) * gam)
}
}
Detection process
Math:
$$ y_{i, j, t} \sim \mathrm{Bernoulli}(p \cdot z_{i, t}) $$BUGS:
# Detection
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
y[i,j,t] ~ dbern(p * z[i, t])
}
}
}
cat("
model{
# State
for (i in 1:M){
z[i, 1] ~ dbern(psi)
for (t in 2:T){
z[i, t] ~ dbern(z[i, t-1] * (1 - eps) + (1 - z[i, t-1]) * gam)
}
}
# Detection
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
y[i,j,t] ~ dbern(p * z[i, t])
}
}
}
# Priors
psi ~ dunif(0, 1)
eps ~ dunif(0, 1)
gam ~ dunif(0, 1)
p ~ dunif(0, 1)
}
", file = "dynocc_null.bugs")
pars <- c("psi", "eps", "gam", "p")
inits <- function() list(z = matrix(1, M, T))
library(jagsUI)
jags_null <- jags(data = jags_data,
inits = inits,
parameters.to.save = pars,
model.file = "dynocc_null.bugs",
n.chains = 3, n.iter = 1000, n.burnin = 500,
parallel=TRUE)
Processing function input....... Done. Beginning parallel processing using 3 cores. Console output will be suppressed. Parallel processing completed. Calculating statistics....... Done.
jagsUI::traceplot(jags_null)
jags_null
JAGS output for model 'dynocc_null.bugs', generated by jagsUI. Estimates based on 3 chains of 1000 iterations, adaptation = 100 iterations (sufficient), burn-in = 500 iterations and thin rate = 1, yielding 1500 total samples from the joint posterior. MCMC ran in parallel for 0.548 minutes at time 2023-07-18 13:51:35.75341. mean sd 2.5% 50% 97.5% overlap0 f Rhat n.eff psi 0.313 0.032 0.254 0.313 0.382 FALSE 1 1.004 417 eps 0.177 0.023 0.133 0.177 0.223 FALSE 1 1.050 48 gam 0.125 0.012 0.102 0.124 0.151 FALSE 1 1.032 68 p 0.518 0.014 0.492 0.518 0.545 FALSE 1 1.044 53 deviance 3644.009 78.074 3495.490 3642.852 3798.456 FALSE 1 1.057 41 Successful convergence based on Rhat values (all < 1.1). Rhat is the potential scale reduction factor (at convergence, Rhat=1). For each parameter, n.eff is a crude measure of effective sample size. overlap0 checks if 0 falls in the parameter's 95% credible interval. f is the proportion of the posterior with the same sign as the mean; i.e., our confidence that the parameter is positive or negative. DIC info: (pD = var(deviance)/2) pD = 2899.2 and DIC = 6543.184 DIC is an estimate of expected predictive error (lower is better).
psi
¶hist(jags_null$sims.list$psi)
abline(v=jags_null$mean$psi, col='red')
psi_est <- jags_null$mean$psi
psi_ci <- quantile(jags_null$sims.list$psi, c(0.025, 0.975))
plot(1, psi_est, cex=2, pch=19, xaxt='n', xlab="psi")
segments(1, psi_ci[1], 1, psi_ci[2])
head(crossbill)
id | ele | forest | surveys | det991 | det992 | det993 | det001 | det002 | det003 | ⋯ | date043 | date051 | date052 | date053 | date061 | date062 | date063 | date071 | date072 | date073 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
<int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | ⋯ | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | |
1 | 1 | 450 | 3 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 74 | 35 | 52 | 73 | 19 | 56 | 65 | 20 | 46 | 54 |
2 | 2 | 450 | 21 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 59 | 20 | 35 | 65 | 15 | 28 | 55 | 19 | 50 | 56 |
3 | 3 | 1050 | 32 | 3 | NA | NA | NA | 0 | 0 | 0 | ⋯ | 95 | 24 | 59 | 87 | 23 | 62 | 90 | 22 | 55 | 61 |
4 | 4 | 950 | 9 | 3 | 0 | 0 | 0 | 1 | 0 | 0 | ⋯ | 74 | 22 | 60 | 76 | 22 | 57 | 78 | 22 | 49 | 70 |
5 | 5 | 1150 | 35 | 3 | 0 | 0 | 0 | 1 | 1 | 1 | ⋯ | 69 | 28 | 51 | 73 | 23 | 42 | 70 | 15 | 43 | 60 |
6 | 6 | 550 | 2 | 3 | NA | NA | NA | 0 | 0 | 0 | ⋯ | 64 | 27 | 39 | 55 | 29 | 40 | 52 | 21 | 40 | 49 |
Select only the columns corresponding to date data
datecols <- grepl("date", names(crossbill))
date <- as.matrix(crossbill[,datecols])
head(date)
date991 | date992 | date993 | date001 | date002 | date003 | date011 | date012 | date013 | date021 | ⋯ | date043 | date051 | date052 | date053 | date061 | date062 | date063 | date071 | date072 | date073 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
34 | 59 | 65 | 33 | 48 | 68 | 30 | 54 | 76 | 29 | ⋯ | 74 | 35 | 52 | 73 | 19 | 56 | 65 | 20 | 46 | 54 |
17 | 33 | 65 | 22 | 50 | 66 | 14 | 28 | 61 | 13 | ⋯ | 59 | 20 | 35 | 65 | 15 | 28 | 55 | 19 | 50 | 56 |
NA | NA | NA | 38 | 67 | 85 | 28 | 50 | 83 | 30 | ⋯ | 95 | 24 | 59 | 87 | 23 | 62 | 90 | 22 | 55 | 61 |
29 | 59 | 65 | 33 | 48 | 68 | 24 | 54 | 76 | 39 | ⋯ | 74 | 22 | 60 | 76 | 22 | 57 | 78 | 22 | 49 | 70 |
24 | 45 | 65 | 21 | 46 | 71 | 31 | 56 | 73 | 23 | ⋯ | 69 | 28 | 51 | 73 | 23 | 42 | 70 | 15 | 43 | 60 |
NA | NA | NA | 22 | 35 | 72 | 27 | 55 | 80 | 28 | ⋯ | 64 | 27 | 39 | 55 | 29 | 40 | 52 | 21 | 40 | 49 |
Convert to an array, the same way we did with the y-matrix
date_array <- array(NA, dim = c(M, J, T))
period <- rep(1:T, each = J)
for (t in 1:T){
date_array[,,t] <- date[, which(period == t)]
}
print(date_array[1:5,,1:3])
, , 1 [,1] [,2] [,3] [1,] 34 59 65 [2,] 17 33 65 [3,] NA NA NA [4,] 29 59 65 [5,] 24 45 65 , , 2 [,1] [,2] [,3] [1,] 33 48 68 [2,] 22 50 66 [3,] 38 67 85 [4,] 33 48 68 [5,] 21 46 71 , , 3 [,1] [,2] [,3] [1,] 30 54 76 [2,] 14 28 61 [3,] 28 50 83 [4,] 24 54 76 [5,] 31 56 73
Including the site-level covariates, which we standardize
jags_data <- list(M = M, J = J, T = T,
y = y_array,
date = (date_array - mean(date_array, na.rm=TRUE)) / sd(date_array, na.rm=TRUE),
ele = (crossbill$ele - mean(crossbill$ele)) / sd(crossbill$ele),
forest = (crossbill$forest - mean(crossbill$forest)) / sd(crossbill$forest)
)
str(jags_data)
List of 7 $ M : int 267 $ J : num 3 $ T : num 9 $ y : int [1:267, 1:3, 1:9] 0 0 NA 0 0 NA 0 0 0 0 ... $ date : num [1:267, 1:3, 1:9] -0.921 -1.692 NA -1.148 -1.375 ... $ ele : num [1:267] -1.1539 -1.1539 -0.2175 -0.3735 -0.0614 ... $ forest: num [1:267] -1.1471 -0.4967 -0.0992 -0.9303 0.0092 ...
Initial occupancy: Elevation
Colonization: Forest
Extinction: None
Detection: Date
Initial occupancy:
$$ \mathrm{logit}(\psi_{i,1}) = \alpha_0 + \alpha_{ele} \cdot \mathrm{ele}_i $$for (i in 1:M){
logit(psi[i,1]) <- alpha_0 + alpha_ele * ele[i]
}
# psi priors
alpha_0 ~ dlogis(0, 1)
alpha_ele ~ dlogis(0, 1)
Why dlogis
?
Looks better on the original (probability) scale.
# Typical normal prior
hist(plogis(rnorm(1000, 0, 10)))
# dlogis prior
hist(plogis(rlogis(1000, 0, 1)))
Add linear predictor for gamma
:
for (i in 1:M){
logit(psi[i,1]) <- alpha_0 + alpha_ele * ele[i]
for (t in 2:T){
logit(gam[i,t]) <- beta_0 + beta_forest * forest[i]
}
}
# psi priors
alpha_0 ~ dlogis(0, 1)
alpha_ele ~ dlogis(0, 1)
# gamma priors
beta_0 ~ dlogis(0, 1)
beta_ele ~ dlogis(0, 1)
# eps priors
eps ~ dunif(0, 1)
eps
) has no covariates, so it stays the same as the old modelt
starts with 2t
at 1 and go to T-1
insteadDetection parameters:
$$ \mathrm{logit}(p_{i,j,t}) = \rho_0 + \rho_{date} \cdot \mathrm{date}_{i,j,t} $$for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
logit(p[i,j,t]) <- rho_0 + rho_date * date[i,j,t]
}
}
}
# detection priors
rho_0 ~ dlogis(0, 1)
rho_date ~ dlogis(0, 1)
cat("
model{
# State submodel
for (i in 1:M){
z[i, 1] ~ dbern(psi[i,1])
for (t in 2:T){
z[i, t] ~ dbern(z[i, t-1] * (1 - eps) + (1 - z[i, t-1]) * gam[i,t])
}
}
# ----------------------
# Calculate psi, gamma, eps
for (i in 1:M){
logit(psi[i,1]) <- alpha_0 + alpha_ele * ele[i]
for (t in 2:T){
logit(gam[i,t]) <- beta_0 + beta_forest * forest[i]
}
}
# psi priors
alpha_0 ~ dlogis(0, 1)
alpha_ele ~ dlogis(0, 1)
# gamma priors
beta_0 ~ dlogis(0, 1)
beta_forest ~ dlogis(0, 1)
# eps priors
eps ~ dunif(0, 1)
#--------------------------
# Detection submodel
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
y[i,j,t] ~ dbern(p[i,j,t] * z[i, t])
}
}
}
# -----------------------
# Calculate p
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
logit(p[i,j,t]) <- rho_0 + rho_date * date[i,j,t]
}
}
}
rho_0 ~ dlogis(0, 1)
rho_date ~ dlogis(0, 1)
}
", file = "dynocc_covs.bugs")
pars <- c("alpha_0", "alpha_ele", "beta_0", "beta_forest",
"rho_0", "rho_date")
inits <- function() list(z = matrix(1, M, T))
library(jagsUI)
jags_covs <- jags(data = jags_data,
inits = inits,
parameters.to.save = pars,
model.file = "dynocc_covs.bugs",
n.chains = 3, n.iter = 1000, n.burnin = 500,
parallel=TRUE)
Processing function input....... Done. Beginning parallel processing using 3 cores. Console output will be suppressed.
Error in checkForRemoteErrors(val): 3 nodes produced errors; first error: RUNTIME ERROR: Unable to resolve the following parameters: date[3,1,1] (line 53) date[3,2,1] (line 53) date[3,3,1] (line 53) date[6,1,1] (line 53) date[6,2,1] (line 53) date[6,3,1] (line 53) date[7,1,3] (line 53) date[7,1,4] (line 53) date[7,2,3] (line 53) date[7,2,4] (line 53) date[7,3,3] (line 53) date[7,3,4] (line 53) date[9,1,5] (line 53) date[9,2,5] (line 53) date[9,3,5] (line 53) date[14,1,3] (line 53) date[14,2,3] (line 53) date[14,3,3] (line 53) date[15,1,2] (line 53) date[15,2,2] (line 53) date[15,3,2] (line 53) date[22,1,6] (line 53) date[22,2,6] (line 53) date[22,3,6] (line 53) date[34,1,3] (line 53) date[34,1,4] (line 53) date[34,2,3] (line 53) date[34,2,4] (line 53) date[34,3,3] (line 53) date[34,3,4] (line 53) date[36,1,2] (line 53) date[36,1,3] (line 53) date[36,2,2] (line 53) date[36,2,3] (line 53) date[36,3,2] (line 53) date[36,3,3] (line 53) date[38,1,1] (line 53) date[38,2,1] (line 53) date[38,3,1] (line 53) date[43,1,3] (line 53) date[43,2,3] (line 53) date[43,3,1] Traceback: 1. jags(data = jags_data, inits = inits, parameters.to.save = pars, . model.file = "dynocc_covs.bugs", n.chains = 3, n.iter = 1000, . n.burnin = 500, parallel = TRUE) 2. run.parallel(data, inits, parameters.to.save, model.file, n.chains, . n.adapt, n.iter, n.burnin, n.thin, modules = modules, factories = factories, . DIC = DIC, verbose = verbose, n.cores = n.cores) 3. clusterApply(cl = cl, x = 1:n.chains, fun = jags.clust) 4. staticClusterApply(cl, fun, length(x), argfun) 5. checkForRemoteErrors(val) 6. stop(count, " nodes produced errors; first error: ", firstmsg, . domain = NA)
jags_data$date[43,3,1]
sum(is.na(jags_data$date))
Missing date values are simulated as coming from a normal distribution with mean and SD of the original dataset
# Impute missing dates
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
date[i,j,t] ~ dnorm(mu_date, tau_date)
}
}
}
mu_date ~ dnorm(0, 0.01)
sigma_date ~ dunif(0, 10)
tau_date <- pow(sigma_date, -2)
It's OK that date
has a mixture of data and missing values!
One way to think about it:
date[i,j,k]
is not NA
: information flows from left to rightdate[i,j,k]
is NA
: information flows from right to leftcat("
model{
# State submodel
for (i in 1:M){
z[i, 1] ~ dbern(psi[i,1])
for (t in 2:T){
z[i, t] ~ dbern(z[i, t-1] * (1 - eps) + (1 - z[i, t-1]) * gam[i,t])
}
}
# ----------------------
# Calculate psi, gamma, eps
for (i in 1:M){
logit(psi[i,1]) <- alpha_0 + alpha_ele * ele[i]
for (t in 2:T){
logit(gam[i,t]) <- beta_0 + beta_forest * forest[i]
}
}
# psi priors
alpha_0 ~ dlogis(0, 1)
alpha_ele ~ dlogis(0, 1)
# gamma priors
beta_0 ~ dlogis(0, 1)
beta_forest ~ dlogis(0, 1)
# eps priors
eps ~ dunif(0, 1)
# -----------------------
# Detection submodel
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
y[i,j,t] ~ dbern(p[i,j,t] * z[i, t])
}
}
}
# -----------------------
# Impute missing dates
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
date[i,j,t] ~ dnorm(mu_date, tau_date)
}
}
}
mu_date ~ dnorm(0, 0.01)
sigma_date ~ dunif(0, 10)
tau_date <- pow(sigma_date, -2)
# -----------------------
# Calculate p
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
logit(p[i,j,t]) <- rho_0 + rho_date * date[i,j,t]
}
}
}
rho_0 ~ dlogis(0, 1)
rho_date ~ dlogis(0, 1)
}
", file = "dynocc_covs.bugs")
pars <- c("alpha_0", "alpha_ele", "beta_0", "beta_forest",
"rho_0", "rho_date", "mu_date", "sigma_date")
inits <- function() list(z = matrix(1, M, T))
library(jagsUI)
jags_covs <- jags(data = jags_data,
inits = inits,
parameters.to.save = pars,
model.file = "dynocc_covs.bugs",
n.chains = 3, n.iter = 1000, n.burnin = 500,
parallel=TRUE)
Processing function input....... Done. Beginning parallel processing using 3 cores. Console output will be suppressed. Parallel processing completed. Calculating statistics....... Done.
jags_covs
jagsUI::traceplot(jags_covs)
JAGS output for model 'dynocc_covs.bugs', generated by jagsUI. Estimates based on 3 chains of 1000 iterations, adaptation = 100 iterations (sufficient), burn-in = 500 iterations and thin rate = 1, yielding 1500 total samples from the joint posterior. MCMC ran in parallel for 1.592 minutes at time 2023-07-18 13:54:03.252241. mean sd 2.5% 50% 97.5% overlap0 f Rhat alpha_0 -0.881 0.156 -1.169 -0.882 -0.576 FALSE 1.000 1.005 alpha_ele 0.508 0.147 0.220 0.506 0.821 FALSE 1.000 1.004 beta_0 -1.851 0.110 -2.065 -1.847 -1.646 FALSE 1.000 1.013 beta_forest 0.702 0.095 0.516 0.701 0.899 FALSE 1.000 1.008 rho_0 0.103 0.056 -0.010 0.103 0.213 TRUE 0.965 1.008 rho_date 0.219 0.044 0.134 0.219 0.304 FALSE 1.000 1.001 mu_date 0.000 0.012 -0.025 0.000 0.023 TRUE 0.502 1.000 sigma_date 1.000 0.008 0.984 1.000 1.017 FALSE 1.000 1.001 deviance 22388.215 74.134 22246.979 22384.989 22538.551 FALSE 1.000 1.010 n.eff alpha_0 441 alpha_ele 458 beta_0 151 beta_forest 298 rho_0 299 rho_date 1010 mu_date 1500 sigma_date 1500 deviance 213 Successful convergence based on Rhat values (all < 1.1). Rhat is the potential scale reduction factor (at convergence, Rhat=1). For each parameter, n.eff is a crude measure of effective sample size. overlap0 checks if 0 falls in the parameter's 95% credible interval. f is the proportion of the posterior with the same sign as the mean; i.e., our confidence that the parameter is positive or negative. DIC info: (pD = var(deviance)/2) pD = 2725.7 and DIC = 25113.87 DIC is an estimate of expected predictive error (lower is better).
mean sd 2.5% 50% 97.5% overlap0 f Rhat
alpha_ele 0.503 0.153 0.195 0.505 0.799 FALSE 0.999 1.008
mean sd 2.5% 50% 97.5% overlap0 f Rhat
beta_forest 0.703 0.089 0.528 0.704 0.873 FALSE 1.000 1.002
mean sd 2.5% 50% 97.5% overlap0 f Rhat
rho_date 0.220 0.048 0.125 0.222 0.311 FALSE 1.000 0.999
psi
¶psi
elev_rng <- range(crossbill$ele)
elev_seq <- seq(elev_rng[1], elev_rng[2], length.out=100) # 1.
elev_std <- (elev_seq - mean(crossbill$ele)) / sd(crossbill$ele) # 2.
head(elev_std)
logit(psi[i,1]) <- alpha_0 + alpha_ele * ele[i]
psi_est <- jags_covs$mean$alpha_0 + jags_covs$mean$alpha_ele * elev_std # 3.
psi_est <- plogis(psi_est)
head(psi_est)
plot(elev_seq, psi_est, type='l', xlab="Elevation", ylab="psi") # 4.
What about the 95% credible interval?
We need to calculate psi
for each saved iteration in the JAGS output (not just the posterior mean).
nsamples <- jags_covs$mcmc.info$n.samples
nsamples
psi_post <- matrix(NA, nrow=nsamples, ncol=100)
for (i in 1:100){
psi_post[,i] <- jags_covs$sims.list$alpha_0 + jags_covs$sims.list$alpha_ele * elev_std[i]
}
psi_post <- plogis(psi_post)
psi_post[1:5,1:5]
0.1538320 | 0.1567736 | 0.1597607 | 0.1627938 | 0.1658732 |
0.1874531 | 0.1904333 | 0.1934497 | 0.1965022 | 0.1995909 |
0.2345080 | 0.2366951 | 0.2388962 | 0.2411113 | 0.2433404 |
0.2296686 | 0.2315533 | 0.2334487 | 0.2353549 | 0.2372719 |
0.1844937 | 0.1872250 | 0.1899872 | 0.1927805 | 0.1956050 |
psi_lower <- apply(psi_post, 2, quantile, 0.025) # lower bound of 95% ci
psi_upper <- apply(psi_post, 2, quantile, 0.975) # upper bound of 95% ci
head(psi_lower)
head(psi_upper)
Plot in base R
plot(elev_seq, psi_est, type='l', xlab="Elevation", ylab="psi", ylim=c(0,0.8))
polygon(c(elev_seq, rev(elev_seq)),
c(psi_lower, rev(psi_upper)),
col='grey90', border=NA)
lines(elev_seq, psi_est)
Plot in ggplot2
library(ggplot2)
plot_data <- data.frame(elev=elev_seq, psi=psi_est, lower=psi_lower, upper=psi_upper)
head(plot_data)
elev | psi | lower | upper | |
---|---|---|---|---|
<dbl> | <dbl> | <dbl> | <dbl> | |
1 | 250.0000 | 0.1645080 | 0.09931026 | 0.2518344 |
2 | 275.2525 | 0.1672765 | 0.10193656 | 0.2541165 |
3 | 300.5051 | 0.1700822 | 0.10447124 | 0.2563221 |
4 | 325.7576 | 0.1729252 | 0.10730957 | 0.2583171 |
5 | 351.0101 | 0.1758056 | 0.11047750 | 0.2599927 |
6 | 376.2626 | 0.1787237 | 0.11390621 | 0.2619797 |
ggplot(data=plot_data, aes(x=elev)) +
geom_ribbon(aes(ymin=lower, ymax=upper), alpha=0.2) +
geom_line(aes(y=psi))
Right now we only estimate $\psi$ at $t=1$!
What about for other times?
For this we need a derived parameter, that is a function of other parameters already in the model
$\psi_{t}$ at times $t>1$ is a function of occupancy in the previous time step, extinction, and colonization
This math is very similar to JAGS code we already have in our model:
for (t in 2:T){
z[i, t] ~ dbern(z[i, t-1] * (1 - eps) + (1 - z[i, t-1]) * gam[i,t])
}
We just need to replace z
with psi
.
# Derive psi at t>1
for (i in 1:M){
for (t in 2:T){
psi[i, t] <- psi[i, t-1] * (1 - eps) + (1 - psi[i, t-1]) * gam[i,t]
}
}
psi
is now indexed by both i
and t
cat("
model{
# State submodel
for (i in 1:M){
z[i, 1] ~ dbern(psi[i,1])
for (t in 2:T){
z[i, t] ~ dbern(z[i, t-1] * (1 - eps) + (1 - z[i, t-1]) * gam[i,t])
}
}
# ----------------------
# Calculate psi, gamma, eps
for (i in 1:M){
logit(psi[i,1]) <- alpha_0 + alpha_ele * ele[i]
for (t in 2:T){
logit(gam[i,t]) <- beta_0 + beta_forest * forest[i]
}
}
# psi priors
alpha_0 ~ dlogis(0, 1)
alpha_ele ~ dlogis(0, 1)
# gamma priors
beta_0 ~ dlogis(0, 1)
beta_forest ~ dlogis(0, 1)
# eps priors
eps ~ dunif(0, 1)
# -----------------------
# Detection submodel
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
y[i,j,t] ~ dbern(p[i,j,t] * z[i, t])
}
}
}
# -----------------------
# Impute missing dates
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
date[i,j,t] ~ dnorm(mu_date, tau_date)
}
}
}
mu_date ~ dnorm(0, 0.01)
sigma_date ~ dunif(0, 10)
tau_date <- pow(sigma_date, -2)
# -----------------------
# Calculate p
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
logit(p[i,j,t]) <- rho_0 + rho_date * date[i,j,t]
}
}
}
rho_0 ~ dlogis(0, 1)
rho_date ~ dlogis(0, 1)
# -----------------------
# Derive psi at t>1
for (i in 1:M){
for (t in 2:T){
psi[i, t] <- psi[i, t-1] * (1 - eps) + (1 - psi[i, t-1]) * gam[i,t]
}
}
}
", file = "dynocc_covs.bugs")
pars <- c("alpha_0", "alpha_ele", "beta_0", "beta_forest",
"rho_0", "rho_date", "mu_date", "sigma_date", "psi")
inits <- function() list(z = matrix(1, M, T))
library(jagsUI)
jags_covs <- jags(data = jags_data,
inits = inits,
parameters.to.save = pars,
model.file = "dynocc_covs.bugs",
n.chains = 3, n.iter = 1000, n.burnin = 500,
parallel=TRUE)
Processing function input....... Done. Beginning parallel processing using 3 cores. Console output will be suppressed. Parallel processing completed. Calculating statistics....... Done.
dim(jags_covs$sims.list$psi)
Plot psi
over time for sites 1-5
psi_est <- psi_lower <- psi_upper <- matrix(NA, nrow=5, ncol=9)
for (t in 1:9){
for (i in 1:5){
psi_it <- jags_covs$sims.list$psi[,i,t]
psi_est[i,t] <- mean(psi_it)
psi_lower[i,t] <- quantile(psi_it, 0.025)
psi_upper[i,t] <- quantile(psi_it, 0.975)
}
}
cols <- rainbow(5)
plot(1:9, psi_est[1,1:9], ylim=c(0,0.5), type='o', pch=19,
col = cols[1],
ylab="psi", xlab="time")
for (i in 2:5){
lines(1:9, psi_est[i,1:9], type='o', pch=19, col=cols[i])
}
Use JAGS to fit the following model:
Also try plotting the effect of elevation on extinction.
JAGS model
cat("
model{
# State submodel
for (i in 1:M){
z[i, 1] ~ dbern(psi[i,1])
for (t in 2:T){
z[i, t] ~ dbern(z[i, t-1] * (1 - eps[i,t]) + (1 - z[i, t-1]) * gam[i,t])
}
}
# ----------------------
# Calculate psi, gamma, eps
for (i in 1:M){
logit(psi[i,1]) <- alpha_0 + alpha_ele * ele[i] + alpha_ele2 * ele2[i] + alpha_forest * forest[i]
for (t in 2:T){
logit(gam[i,t]) <- beta_0 + beta_forest * forest[i]
logit(eps[i,t]) <- eps_0 + eps_ele * ele[i]
}
}
# psi priors
alpha_0 ~ dlogis(0, 1)
alpha_ele ~ dlogis(0, 1)
alpha_ele2 ~ dlogis(0, 1)
alpha_forest ~ dlogis(0, 1)
# gamma priors
beta_0 ~ dlogis(0, 1)
beta_forest ~ dlogis(0, 1)
# eps priors
eps_0 ~ dlogis(0,1)
eps_ele ~ dlogis(0, 1)
# -----------------------
# Detection submodel
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
y[i,j,t] ~ dbern(p[i,j,t] * z[i, t])
}
}
}
# -----------------------
# Impute missing dates
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
date[i,j,t] ~ dnorm(mu_date, tau_date)
}
}
}
mu_date ~ dnorm(0, 0.01)
sigma_date ~ dunif(0, 10)
tau_date <- pow(sigma_date, -2)
# -----------------------
# Calculate p
for (i in 1:M){
for (t in 1:T){
for (j in 1:J){
logit(p[i,j,t]) <- rho_0 + rho_date * date[i,j,t]
}
}
}
rho_0 ~ dlogis(0, 1)
rho_date ~ dlogis(0, 1)
# -----------------------
# Derive psi at t>1
for (i in 1:M){
for (t in 2:T){
psi[i, t] <- psi[i, t-1] * (1 - eps[i,t]) + (1 - psi[i, t-1]) * gam[i,t]
}
}
}
", file = "dynocc_exercise.bugs")
jags_data$ele2 <- jags_data$ele^2
pars <- c("alpha_0", "alpha_ele", "alpha_ele2", "alpha_forest", "beta_0", "beta_forest",
"rho_0", "rho_date", "eps_0", "eps_ele")
inits <- function() list(z = matrix(1, M, T))
library(jagsUI)
jags_exercise <- jags(data = jags_data,
inits = inits,
parameters.to.save = pars,
model.file = "dynocc_exercise.bugs",
n.chains = 3, n.iter = 1000, n.burnin = 500,
parallel=TRUE)
Processing function input....... Done. Beginning parallel processing using 3 cores. Console output will be suppressed. Parallel processing completed. Calculating statistics....... Done.
jags_exercise
JAGS output for model 'dynocc_exercise.bugs', generated by jagsUI. Estimates based on 3 chains of 1000 iterations, adaptation = 100 iterations (sufficient), burn-in = 500 iterations and thin rate = 1, yielding 1500 total samples from the joint posterior. MCMC ran in parallel for 1.679 minutes at time 2023-07-18 13:57:18.6436. mean sd 2.5% 50% 97.5% overlap0 f alpha_0 0.019 0.292 -0.567 0.015 0.617 TRUE 0.519 alpha_ele 1.340 0.258 0.846 1.339 1.872 FALSE 1.000 alpha_ele2 -1.262 0.290 -1.855 -1.253 -0.721 FALSE 1.000 alpha_forest 0.424 0.205 0.036 0.427 0.851 FALSE 0.983 beta_0 -1.781 0.102 -1.997 -1.778 -1.595 FALSE 1.000 beta_forest 0.715 0.091 0.536 0.716 0.899 FALSE 1.000 rho_0 0.120 0.058 0.005 0.121 0.228 FALSE 0.980 rho_date 0.181 0.046 0.091 0.181 0.266 FALSE 1.000 eps_0 -1.191 0.148 -1.488 -1.192 -0.913 FALSE 1.000 eps_ele -1.062 0.178 -1.414 -1.052 -0.714 FALSE 1.000 deviance 22365.496 68.972 22238.201 22362.378 22500.734 FALSE 1.000 Rhat n.eff alpha_0 1.035 68 alpha_ele 1.019 130 alpha_ele2 1.045 53 alpha_forest 1.006 394 beta_0 1.015 147 beta_forest 1.003 632 rho_0 1.004 608 rho_date 1.001 1500 eps_0 1.011 184 eps_ele 1.008 301 deviance 1.012 173 Successful convergence based on Rhat values (all < 1.1). Rhat is the potential scale reduction factor (at convergence, Rhat=1). For each parameter, n.eff is a crude measure of effective sample size. overlap0 checks if 0 falls in the parameter's 95% credible interval. f is the proportion of the posterior with the same sign as the mean; i.e., our confidence that the parameter is positive or negative. DIC info: (pD = var(deviance)/2) pD = 2354 and DIC = 24719.48 DIC is an estimate of expected predictive error (lower is better).
psi_est <- jags_exercise$mean$eps_0 + jags_exercise$mean$eps_ele * elev_std # 3.
psi_est <- plogis(psi_est)
psi_post <- matrix(NA, nrow=nsamples, ncol=100)
for (i in 1:100){
psi_post[,i] <- jags_exercise$sims.list$eps_0 + jags_exercise$sims.list$eps_ele * elev_std[i]
}
psi_post <- plogis(psi_post)
psi_lower <- apply(psi_post, 2, quantile, 0.025) # lower bound of 95% ci
psi_upper <- apply(psi_post, 2, quantile, 0.975) # upper bound of 95% ci
plot(elev_seq, psi_est, type='l', xlab="Elevation", ylab="eps", ylim=c(0,0.8))
polygon(c(elev_seq, rev(elev_seq)),
c(psi_lower, rev(psi_upper)),
col='grey90', border=NA)
lines(elev_seq, psi_est)
unmarked
¶Look at the data again:
head(crossbill)
id | ele | forest | surveys | det991 | det992 | det993 | det001 | det002 | det003 | ⋯ | date043 | date051 | date052 | date053 | date061 | date062 | date063 | date071 | date072 | date073 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
<int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | ⋯ | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | |
1 | 1 | 450 | 3 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 74 | 35 | 52 | 73 | 19 | 56 | 65 | 20 | 46 | 54 |
2 | 2 | 450 | 21 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 59 | 20 | 35 | 65 | 15 | 28 | 55 | 19 | 50 | 56 |
3 | 3 | 1050 | 32 | 3 | NA | NA | NA | 0 | 0 | 0 | ⋯ | 95 | 24 | 59 | 87 | 23 | 62 | 90 | 22 | 55 | 61 |
4 | 4 | 950 | 9 | 3 | 0 | 0 | 0 | 1 | 0 | 0 | ⋯ | 74 | 22 | 60 | 76 | 22 | 57 | 78 | 22 | 49 | 70 |
5 | 5 | 1150 | 35 | 3 | 0 | 0 | 0 | 1 | 1 | 1 | ⋯ | 69 | 28 | 51 | 73 | 23 | 42 | 70 | 15 | 43 | 60 |
6 | 6 | 550 | 2 | 3 | NA | NA | NA | 0 | 0 | 0 | ⋯ | 64 | 27 | 39 | 55 | 29 | 40 | 52 | 21 | 40 | 49 |
Grab the detection/non-detection data.
This time we want to keep it in "wide" format (M x TJ), because that's what unmarked
uses.
ycols <- grepl("det", names(crossbill))
y <- crossbill[,ycols]
names(y)
head(y)
det991 | det992 | det993 | det001 | det002 | det003 | det011 | det012 | det013 | det021 | ⋯ | det043 | det051 | det052 | det053 | det061 | det062 | det063 | det071 | det072 | det073 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
<int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | ⋯ | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> | |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | NA | NA | NA | 0 | 0 | 0 | 1 | 1 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
4 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 |
5 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | ⋯ | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
6 | NA | NA | NA | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Get the site covariates:
site_covs <- crossbill[,1:4]
head(site_covs)
id | ele | forest | surveys | |
---|---|---|---|---|
<int> | <int> | <int> | <int> | |
1 | 1 | 450 | 3 | 3 |
2 | 2 | 450 | 21 | 3 |
3 | 3 | 1050 | 32 | 3 |
4 | 4 | 950 | 9 | 3 |
5 | 5 | 1150 | 35 | 3 |
6 | 6 | 550 | 2 | 3 |
And the observation covariate (date):
datecols <- grepl("date", names(crossbill))
date <- crossbill[,datecols]
names(date)
Make the unmarkedFrame
Special unmarked frame type: unmarkedMultFrame
We need to specify the number of seasons (numPrimary
)
umf <- unmarkedMultFrame(y = y, siteCovs = site_covs, obsCovs = list(date = date), numPrimary = 9)
summary(umf)
unmarkedFrame Object 267 sites Maximum number of observations per site: 27 Mean number of observations per site: 24.99 Number of primary survey periods: 9 Number of secondary survey periods: 3 Sites with at least one detection: 165 Tabulation of y observations: 0 1 <NA> 5311 1362 536 Site-level covariates: id ele forest surveys Min. : 1.0 Min. : 250 Min. : 0.00 Min. :2.000 1st Qu.: 67.5 1st Qu.: 550 1st Qu.: 8.50 1st Qu.:3.000 Median :134.0 Median :1150 Median :33.00 Median :3.000 Mean :134.0 Mean :1189 Mean :34.75 Mean :2.824 3rd Qu.:200.5 3rd Qu.:1850 3rd Qu.:56.50 3rd Qu.:3.000 Max. :267.0 Max. :2750 Max. :99.00 Max. :3.000 Observation-level covariates: date Min. : 10.00 1st Qu.: 37.00 Median : 55.00 Mean : 54.29 3rd Qu.: 71.00 Max. :118.00 NA's :567
We will use the colext
function.
We need to specify R formulas for four parameters:
psiformula
epsilonformula
gammaformula
pformula
For the null model, all four will be intercept-only, or ~1
.
mod_null <- colext(psiformula = ~1, epsilonformula = ~1, gammaformula = ~1, pformula = ~1, umf)
summary(mod_null)
Call: colext(psiformula = ~1, gammaformula = ~1, epsilonformula = ~1, pformula = ~1, data = umf) Initial (logit-scale): Estimate SE z P(>|z|) -0.795 0.154 -5.16 2.52e-07 Colonization (logit-scale): Estimate SE z P(>|z|) -1.96 0.111 -17.6 1.63e-69 Extinction (logit-scale): Estimate SE z P(>|z|) -1.56 0.152 -10.2 1.37e-24 Detection (logit-scale): Estimate SE z P(>|z|) 0.0668 0.0564 1.18 0.236 AIC: 5197.441 Number of sites: 267 optim convergence code: 0 optim iterations: 53 Bootstrap iterations: 0
plogis(-0.795)
Or use the backTransform
function:
backTransform(mod_null, 'psi')
Backtransformed linear combination(s) of Initial estimate(s) Estimate SE LinComb (Intercept) 0.311 0.033 -0.795 1 Transformation: logistic
backTransform(mod_null, 'det')
Backtransformed linear combination(s) of Detection estimate(s) Estimate SE LinComb (Intercept) 0.517 0.0141 0.0668 1 Transformation: logistic
confint(backTransform(mod_null, 'psi'))
0.025 | 0.975 | |
---|---|---|
0.2502368 | 0.3792124 |
Or use the predict
function:
head(predict(mod_null, 'psi'))
Predicted | SE | lower | upper | |
---|---|---|---|---|
<dbl> | <dbl> | <dbl> | <dbl> | |
1 | 0.3110701 | 0.03304615 | 0.2502368 | 0.3792124 |
2 | 0.3110701 | 0.03304615 | 0.2502368 | 0.3792124 |
3 | 0.3110701 | 0.03304615 | 0.2502368 | 0.3792124 |
4 | 0.3110701 | 0.03304615 | 0.2502368 | 0.3792124 |
5 | 0.3110701 | 0.03304615 | 0.2502368 | 0.3792124 |
6 | 0.3110701 | 0.03304615 | 0.2502368 | 0.3792124 |
cbind(JAGS=unlist(jags_null$mean[c(1,3,2,4)]),
unmarked=plogis(coef(mod_null)))
JAGS | unmarked | |
---|---|---|
psi | 0.3127166 | 0.3110701 |
gam | 0.1246883 | 0.1231143 |
eps | 0.1772180 | 0.1736495 |
p | 0.5180958 | 0.5166957 |
(same as with JAGS)
mod_covs <- colext(psiformula = ~scale(ele), epsilonformula = ~1,
gammaformula = ~scale(forest), pformula = ~scale(date), umf)
summary(mod_covs)
Warning message: “Some observations have been discarded because correspoding covariates were missing.”
Call: colext(psiformula = ~scale(ele), gammaformula = ~scale(forest), epsilonformula = ~1, pformula = ~scale(date), data = umf) Initial (logit-scale): Estimate SE z P(>|z|) (Intercept) -0.890 0.157 -5.65 1.60e-08 scale(ele) 0.506 0.151 3.34 8.27e-04 Colonization (logit-scale): Estimate SE z P(>|z|) (Intercept) -1.857 0.1086 -17.10 1.46e-65 scale(forest) 0.705 0.0908 7.77 8.14e-15 Extinction (logit-scale): Estimate SE z P(>|z|) -1.41 0.146 -9.66 4.62e-22 Detection (logit-scale): Estimate SE z P(>|z|) (Intercept) 0.101 0.0566 1.78 7.53e-02 scale(date) 0.219 0.0458 4.79 1.69e-06 AIC: 5103.782 Number of sites: 267 optim convergence code: 0 optim iterations: 55 Bootstrap iterations: 0
Get estimates on probability scale
backTransform(mod_covs, 'psi')
Error in .local(obj, ...): Cannot directly backTransform an unmarkedEstimate with length > 1. Traceback: 1. backTransform(mod_covs, "psi") 2. backTransform(mod_covs, "psi") 3. .local(obj, ...) 4. stop("Cannot directly backTransform an unmarkedEstimate with length > 1.")
head(predict(mod_covs, "psi"))
Warning message: “Some observations have been discarded because correspoding covariates were missing.”
Predicted | SE | lower | upper | |
---|---|---|---|---|
<dbl> | <dbl> | <dbl> | <dbl> | |
1 | 0.1863059 | 0.03803834 | 0.1228206 | 0.2724154 |
2 | 0.1863059 | 0.03803834 | 0.1228206 | 0.2724154 |
3 | 0.2689311 | 0.03249979 | 0.2101468 | 0.3371403 |
4 | 0.2536809 | 0.03304904 | 0.1944722 | 0.3236729 |
5 | 0.2847483 | 0.03239262 | 0.2256995 | 0.3522185 |
6 | 0.1985835 | 0.03706077 | 0.1356867 | 0.2811520 |
head(predict(mod_covs, "det"))
Warning message: “Some observations have been discarded because correspoding covariates were missing.”
Predicted | SE | lower | upper | |
---|---|---|---|---|
<dbl> | <dbl> | <dbl> | <dbl> | |
1 | 0.4746512 | 0.01801346 | 0.4395302 | 0.5100248 |
2 | 0.5368267 | 0.01415879 | 0.5089904 | 0.5644353 |
3 | 0.5516519 | 0.01478593 | 0.5225291 | 0.5804248 |
4 | 0.4721677 | 0.01833149 | 0.4364438 | 0.5081789 |
5 | 0.5094962 | 0.01466858 | 0.4807466 | 0.5381831 |
6 | 0.5590319 | 0.01530854 | 0.5288481 | 0.5887856 |
Compare with JAGS
cbind(JAGS=unlist(jags_covs$mean[1:6]),
unmarked=coef(mod_covs)[c(1:4,6:7)])
JAGS | unmarked | |
---|---|---|
alpha_0 | -0.8942170 | -0.8899519 |
alpha_ele | 0.5087193 | 0.5063164 |
beta_0 | -1.8539007 | -1.8570536 |
beta_forest | 0.7022862 | 0.7047543 |
rho_0 | 0.1052687 | 0.1006335 |
rho_date | 0.2205359 | 0.2194877 |
Procedure applying parameteric bootstrap:
If model fits well, well they should be similar.
Default GOF test in unmarked
is sum of squared residuals (SSE).
We can do parametric bootstrap with the parboot
function.
gof <- suppressWarnings(parboot(mod_covs, statistic=SSE, nsim=30))
gof
plot(gof)
Call: parboot(object = mod_covs, statistic = SSE, nsim = 30) Parametric Bootstrap Statistics: t0 mean(t0 - t_B) StdDev(t0 - t_B) Pr(t_B > t0) SSE 1004 -35.8 33.1 0.806 t_B quantiles: 0% 2.5% 25% 50% 75% 97.5% 100% [1,] 964 973 1018 1039 1065 1092 1097 t0 = Original statistic computed from data t_B = Vector of bootstrap samples
Alternative statistic: MacKenzie-Bailey chi-square (MacKenzie & Bailey 2004, JABES)
Test is based on comparing expected vs. observed frequency of specific detection histories (001, 110, 101, etc.)
library(AICcmodavg)
mb_test <- suppressWarnings(mb.gof.test(mod_covs, nsim=30))
Thoughts on GOF tests
unmarked
's approach to calculating residuals (and thus SSE) is crudeNA
sUse the projected
function
Mean for each $t$ across sites:
unmarked::projected(mod_covs)
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|
unoccupied | 0.6989645 | 0.6479712 | 0.6216321 | 0.6078406 | 0.6005742 | 0.5967567 | 0.5947811 | 0.5937927 | 0.593331 |
occupied | 0.3010355 | 0.3520288 | 0.3783679 | 0.3921594 | 0.3994258 | 0.4032433 | 0.4052189 | 0.4062073 | 0.406669 |
For each site separately:
proj <- unmarked::projected(mod_covs, mean = FALSE)
dim(proj)
proj <- t(proj[2,,]) # just "occupied" probability
proj[1:5,]
0.1863059 | 0.2025223 | 0.2144960 | 0.2233369 | 0.2298648 | 0.2346847 | 0.2382436 | 0.2408713 | 0.2428115 |
0.1863059 | 0.2302722 | 0.2612360 | 0.2830426 | 0.2984001 | 0.3092159 | 0.3168330 | 0.3221974 | 0.3259754 |
0.2689311 | 0.3089392 | 0.3359947 | 0.3542911 | 0.3666640 | 0.3750312 | 0.3806896 | 0.3845160 | 0.3871037 |
0.2536809 | 0.2596870 | 0.2640620 | 0.2672489 | 0.2695703 | 0.2712613 | 0.2724930 | 0.2733903 | 0.2740438 |
0.2847483 | 0.3258822 | 0.3533398 | 0.3716683 | 0.3839029 | 0.3920697 | 0.3975212 | 0.4011602 | 0.4035893 |
Plot for first 5 sites
cols <- rainbow(5)
plot(1:9, proj[1,1:9], ylim=c(0,0.5), type='o', pch=19,
col = cols[1],
ylab="psi", xlab="time")
for (i in 2:5){
lines(1:9, proj[i,1:9], type='o', pch=19, col=cols[i])
}
Getting CIs on these estimates
Use parboot
function for parametric bootstrapping
Steps:
statistic
function to each dataset?parboot
proj_output <- function(mod){
unmarked::projected(mod)[2,] # just save "occupied" only
}
pb <- suppressWarnings(parboot(mod_covs, statistic=proj_output, nsim=30))
pb
Call: parboot(object = mod_covs, statistic = proj_output, nsim = 30) Parametric Bootstrap Statistics: t0 mean(t0 - t_B) StdDev(t0 - t_B) Pr(t_B > t0) 1 0.301 6.69e-03 0.0319 0.613 2 0.352 3.94e-03 0.0249 0.613 3 0.378 2.14e-03 0.0234 0.613 4 0.392 8.93e-04 0.0238 0.613 5 0.399 1.43e-05 0.0245 0.581 6 0.403 -6.08e-04 0.0252 0.548 7 0.405 -1.05e-03 0.0257 0.516 8 0.406 -1.36e-03 0.0261 0.484 9 0.407 -1.58e-03 0.0264 0.484 t_B quantiles: 0% 2.5% 25% 50% 75% 97.5% 100% 1 0.21 0.23 0.28 0.29 0.31 0.35 0.36 2 0.28 0.31 0.33 0.34 0.36 0.40 0.41 3 0.33 0.34 0.36 0.37 0.38 0.43 0.44 4 0.35 0.36 0.38 0.39 0.40 0.44 0.46 5 0.36 0.36 0.39 0.40 0.41 0.45 0.47 6 0.36 0.37 0.39 0.40 0.42 0.46 0.48 7 0.36 0.37 0.39 0.40 0.42 0.46 0.48 8 0.36 0.37 0.39 0.41 0.42 0.46 0.48 9 0.36 0.37 0.39 0.41 0.42 0.46 0.49 t0 = Original statistic computed from data t_B = Vector of bootstrap samples
head(pb@t.star)
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|
0.2819571 | 0.3290566 | 0.3537790 | 0.3669009 | 0.3739101 | 0.3776565 | 0.3796460 | 0.3806853 | 0.3812114 |
0.2892106 | 0.3492453 | 0.3819693 | 0.4001991 | 0.4105348 | 0.4164738 | 0.4199181 | 0.4219258 | 0.4230968 |
0.3021260 | 0.3456247 | 0.3704970 | 0.3848454 | 0.3931749 | 0.3980267 | 0.4008527 | 0.4024917 | 0.4034333 |
0.3305423 | 0.3887629 | 0.4197473 | 0.4365379 | 0.4457576 | 0.4508602 | 0.4536897 | 0.4552508 | 0.4560998 |
0.3440246 | 0.3760141 | 0.3946504 | 0.4055333 | 0.4118947 | 0.4156106 | 0.4177751 | 0.4190292 | 0.4197493 |
0.2725891 | 0.3386260 | 0.3752054 | 0.3958530 | 0.4077027 | 0.4146015 | 0.4186675 | 0.4210884 | 0.4225420 |
Calculate 95% CIs
proj_lower <- apply(pb@t.star, 2, quantile, 0.025)
proj_upper <- apply(pb@t.star, 2, quantile, 0.975)
proj_est <- projected(mod_covs)[2,]
Make plot
plot(1:9, proj_est, ylim=c(0.2,0.5), type='o', pch=19,
xlab="time", ylab="psi")
segments(1:9, proj_lower, 1:9, proj_upper)
Use matching map data in Switzerland
dataset.
data(Switzerland)
head(Switzerland)
x | y | elevation | forest | water | |
---|---|---|---|---|---|
<int> | <int> | <int> | <int> | <int> | |
1 | 910942 | 54276 | 340 | 6 | 0 |
2 | 910942 | 55276 | 340 | 11 | 1 |
3 | 911942 | 54276 | 380 | 4 | 0 |
4 | 911942 | 55276 | 390 | 72 | 3 |
5 | 911942 | 56276 | 357 | 9 | 7 |
6 | 911942 | 57276 | 357 | 5 | 5 |
Convert dataset to raster format
(technically need to specify projection, but we won't worry about it)
library(terra)
sw_raster <- rast(Switzerland, type="xyz")
names(sw_raster) <- c("ele", "forest", "water")
plot(sw_raster)
terra 1.7.39
pr_map <- predict(mod_covs, type="psi", newdata=sw_raster)
plot(pr_map)
Use unmarked to fit the following models:
Model 1:
Model 2:
If using unmarked
, also make a map of initial occupancy using the Switzerland
dataset for both models
Hint: you can use the I()
function in formulas to run a calculation, such as
~I(a^2)
mod_exercise1 <- colext(psiformula = ~scale(ele)+scale(forest),
gammaformula = ~scale(forest),
epsilonformula = ~scale(ele),
pformula = ~scale(date),
data = umf)
mod_exercise1
Warning message: “Some observations have been discarded because correspoding covariates were missing.”
Call: colext(psiformula = ~scale(ele) + scale(forest), gammaformula = ~scale(forest), epsilonformula = ~scale(ele), pformula = ~scale(date), data = umf) Initial: Estimate SE z P(>|z|) (Intercept) -1.059 0.182 -5.81 6.10e-09 scale(ele) 0.715 0.182 3.92 8.76e-05 scale(forest) 0.856 0.177 4.83 1.33e-06 Colonization: Estimate SE z P(>|z|) (Intercept) -1.746 0.1005 -17.36 1.63e-67 scale(forest) 0.709 0.0878 8.08 6.47e-16 Extinction: Estimate SE z P(>|z|) (Intercept) -1.14 0.144 -7.91 2.51e-15 scale(ele) -1.09 0.178 -6.10 1.05e-09 Detection: Estimate SE z P(>|z|) (Intercept) 0.145 0.0548 2.64 0.008266 scale(date) 0.177 0.0468 3.79 0.000152 AIC: 5039.136
pr_exercise1 <- predict(mod_exercise1, 'psi', newdata=sw_raster)
plot(pr_exercise1[[1]])
mod_exercise2 <- colext(psiformula = ~scale(ele)+scale(forest)+I(scale(ele)^2),
gammaformula = ~scale(forest),
epsilonformula = ~scale(ele),
pformula = ~scale(date),
data = umf)
mod_exercise2
Warning message: “Some observations have been discarded because correspoding covariates were missing.”
Call: colext(psiformula = ~scale(ele) + scale(forest) + I(scale(ele)^2), gammaformula = ~scale(forest), epsilonformula = ~scale(ele), pformula = ~scale(date), data = umf) Initial: Estimate SE z P(>|z|) (Intercept) -0.000405 0.298 -0.00136 9.99e-01 scale(ele) 1.349477 0.273 4.95019 7.41e-07 scale(forest) 0.424758 0.205 2.07119 3.83e-02 I(scale(ele)^2) -1.226293 0.299 -4.10108 4.11e-05 Colonization: Estimate SE z P(>|z|) (Intercept) -1.790 0.1061 -16.87 7.16e-64 scale(forest) 0.715 0.0895 7.99 1.37e-15 Extinction: Estimate SE z P(>|z|) (Intercept) -1.19 0.150 -7.95 1.86e-15 scale(ele) -1.09 0.181 -6.03 1.65e-09 Detection: Estimate SE z P(>|z|) (Intercept) 0.122 0.0562 2.17 0.030286 scale(date) 0.172 0.0466 3.70 0.000216 AIC: 5017.826
plotEffects(mod_exercise2, "psi", "ele")
pr_exercise2 <- predict(mod_exercise2, 'psi', newdata=sw_raster)
plot(pr_exercise2[[1]])
Use the fitList
and modSel
functions.
fl <- fitList(mod_covs=mod_covs, mod_exercise1=mod_exercise1, mod_exercise2=mod_exercise2)
modSel(fl)
Warning message: “Some observations have been discarded because correspoding covariates were missing.” Warning message: “Some observations have been discarded because correspoding covariates were missing.” Warning message: “Some observations have been discarded because correspoding covariates were missing.” Warning message: “Some observations have been discarded because correspoding covariates were missing.”
nPars AIC delta AICwt cumltvWt mod_exercise2 10 5017.83 0.00 1.0e+00 1.00 mod_exercise1 9 5039.14 21.31 2.4e-05 1.00 mod_covs 7 5103.78 85.96 2.2e-19 1.00
unmarked
-style interfaceStan
library(ubms)
mod_stan <- stan_colext(psiformula = ~scale(ele), epsilonformula = ~1,
gammaformula = ~scale(forest), pformula = ~scale(date), umf,
chains = 3, iter = 300, refresh=0)
Attaching package: ‘ubms’ The following objects are masked from ‘package:unmarked’: fitList, projected The following object is masked from ‘package:base’: gamma Warning message: “Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable. Running the chains for more iterations may help. See https://mc-stan.org/misc/warnings.html#bulk-ess” Warning message: “Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable. Running the chains for more iterations may help. See https://mc-stan.org/misc/warnings.html#tail-ess”
mod_stan
traceplot(mod_stan)
Call: stan_colext(psiformula = ~scale(ele), gammaformula = ~scale(forest), epsilonformula = ~1, pformula = ~scale(date), data = umf, chains = 3, iter = 300, refresh = 0) Occupancy (logit-scale): Estimate SD 2.5% 97.5% n_eff Rhat (Intercept) -0.876 0.175 -1.249 -0.546 462 1.004 scale(ele) 0.498 0.158 0.184 0.804 705 0.995 Colonization (logit-scale): Estimate SD 2.5% 97.5% n_eff Rhat (Intercept) -1.859 0.1069 -2.049 -1.659 400 0.999 scale(forest) 0.706 0.0893 0.529 0.864 682 0.999 Extinction (logit-scale): Estimate SD 2.5% 97.5% n_eff Rhat -1.41 0.138 -1.68 -1.15 448 0.995 Detection (logit-scale): Estimate SD 2.5% 97.5% n_eff Rhat (Intercept) 0.099 0.0563 -0.0102 0.206 399 0.997 scale(date) 0.222 0.0453 0.1276 0.307 836 1.001 LOOIC: 5110.618 Runtime: 36.793 sec
'pars' not specified. Showing first 10 parameters by default.
Compare unmarked
and ubms
output
cbind(unmarked=coef(mod_covs), ubms=coef(mod_stan))
unmarked | ubms | |
---|---|---|
psi(Int) | -0.8899519 | -0.87648184 |
psi(scale(ele)) | 0.5063164 | 0.49840203 |
col(Int) | -1.8570536 | -1.85921071 |
col(scale(forest)) | 0.7047543 | 0.70561970 |
ext(Int) | -1.4072120 | -1.41252267 |
p(Int) | 0.1006335 | 0.09903921 |
p(scale(date)) | 0.2194877 | 0.22245250 |
Some interesting helper functions:
plot_posteriors(mod_stan)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot(mod_stan)
gof_test <- gof(mod_stan)
gof_test
plot(gof_test)
MacKenzie-Bailey Chi-square Point estimate = 404.289 Posterior predictive p = 0
plot_effects(mod_stan, 'state')
?stan_colext