Dynamic occupancy models¶

Ken Kellner¶

Outline¶

  1. Background
  2. Model description
  3. Dataset
  4. Fit model with JAGS
  5. Exercise
  6. Fit model with unmarked
  7. Exercise

Background¶

Site occupancy can change over time (depending on your definition of time!)

Examples:

  • Anthropogenic land use change affects species distribution
  • Spread of an invasive species
  • Reintroduction of endangered species
  • Climate change impacts reduces/increases species range

We want to understand what drives these patterns of change, while accounting for imperfect detection.

Solution:

"Dynamic" or multi-season occupancy model (MacKenzie et al. 2003 Ecology)

Model overview diagram

Review parameters of interest

State submodel:

  • Initial probability of occupancy: $\psi$
  • Probability occupied sites become unoccupied (local extinction): $\epsilon$
  • Probability occupied sites stay occupied (persistance, 1 - $\epsilon$): $\phi$
  • Probability unoccupied sites become occupied (colonization): $\gamma$

Detection submodel:

  • Probability of detection: $p$

Model description¶

State model at $t=1$¶

Parameters:

$z_{i, 1}$: Initial occupancy state of site $i$

$\psi$: Initial occupancy probability at time $t=1$

Math:

$$ z_{i,1} \sim \mathrm{Bernoulli}(\psi) $$

State model for $t > 1$¶

Parameters:

$z_{i,t}$: Occupancy state of site $i$ in time $t$

$\epsilon$: Probability of extinction, i.e. probability site occupied in time $t-1$ is unoccupied in time $t$

$\phi$: Probability of persistence; $1-\epsilon$

$\gamma$: Probability of colonization; i.e. probability non-occupied site becomes occupied in time $t$

Math:

$$ z_{i,t} \sim \begin{cases} \mathrm{Bernoulli}(1 - \epsilon), & \mathrm{if} z_{i, t-1} = 1 \\ \mathrm{Bernoulli}(\gamma), & \mathrm{if} z_{i, t-1} = 0 \end{cases} $$

Alternatively:

$$ z_{i,t} \sim \mathrm{Bernoulli}(z_{i,t-1}\cdot(1-\epsilon) + (1-z_{i,t-1})\cdot\gamma) $$

When $z_{i,t-1} = 1$:

$$ z_{i,t} \sim \mathrm{Bernoulli}(1\cdot(1-\epsilon) + 0\cdot\gamma) = $$$$z_{i,t} \sim \mathrm{Bernoulli}(1-\epsilon)$$

When $z_{i,t-1} = 0$:

$$ z_{i,t} \sim \mathrm{Bernoulli}(0\cdot(1-\epsilon) + 1\cdot\gamma) = $$$$z_{i,t} \sim \mathrm{Bernoulli}(\gamma)$$

Detection submodel¶

Parameters:

$p$: Probability of detection

$z_{i,t}$: Occupancy state of site $i$ in time $t$

Data:

$y_{i,j,t}$: Detection/non-detection data at site $i$, time $t$, occasion $j$

Math:

$$ y_{i,j,t} \sim \mathrm{Bernoulli}(p \cdot z_{i,t}) $$

When $z_{i,t} = 1$:

$$y_{i,j,t} \sim \mathrm{Bernoulli}(p\cdot1)$$

When $z_{i,t} = 0$:

$$y_{i,j,t} \sim \mathrm{Bernoulli}(p\cdot0) = 0$$

Data needed¶

  • Detection/non-detection data from multiple primary periods ($t$)
  • As with single-season occupancy, need repeated samples ($j$) at at least some sites in some primary periods

To learn more:¶

  • MacKenzie, D. I., Nichols, J. D., Hines, J. E., Knutson, M. G., & Franklin, A. B. (2003). Estimating site occupancy, colonization, and local extinction when a species is detected imperfectly. Ecology, 84(8), 2200-2207.

  • Royle, J. A., & Kéry, M. (2007). A Bayesian state‐space formulation of dynamic occupancy models. Ecology, 88(7), 1813-1823.

  • AHM vol 2, Chapter 4

Example dataset¶

Loxia curvirostra, European (common) crossbill

Study design¶

  • 267 quadrats
  • 3 surveys per year
  • 9 years

Look at the raw data¶

In [1]:
library(unmarked)
data(crossbill)
head(crossbill)
A data.frame: 6 × 58
ideleforestsurveysdet991det992det993det001det002det003⋯date043date051date052date053date061date062date063date071date072date073
<int><int><int><int><int><int><int><int><int><int>⋯<int><int><int><int><int><int><int><int><int><int>
11 450 33 0 0 0000⋯74355273195665204654
22 450213 0 0 0000⋯59203565152855195056
331050323NANANA000⋯95245987236290225561
44 950 93 0 0 0100⋯74226076225778224970
551150353 0 0 0111⋯69285173234270154360
66 550 23NANANA000⋯64273955294052214049

Dynamic occupancy model in JAGS¶

Simple model with no covariates¶

Format the y-matrix for JAGS¶

First, select only the columns corresponding to detection/non-detection data

In [2]:
ycols <- grepl("det", names(crossbill))
head(ycols)
y <- as.matrix(crossbill[,ycols])
head(y)
  1. FALSE
  2. FALSE
  3. FALSE
  4. FALSE
  5. TRUE
  6. TRUE
A matrix: 6 × 27 of type int
det991det992det993det001det002det003det011det012det013det021⋯det043det051det052det053det061det062det063det071det072det073
0 0 00000000⋯0000000000
0 0 00000000⋯0000000000
NANANA0001100⋯0000001010
0 0 01000000⋯0000001011
0 0 01110011⋯0100000111
NANANA0000000⋯0000000000

The y-matrix is in "wide" format, we want to convert it to a M x J x T array

M: Sites

J: Sampling occasions

T: Time periods

In [3]:
M <- nrow(y)
J <- 3
T <- ncol(y) / J
In [4]:
y_array <- array(NA, dim = c(M, J, T))
period <- rep(1:T, each = J)
period
  1. 1
  2. 1
  3. 1
  4. 2
  5. 2
  6. 2
  7. 3
  8. 3
  9. 3
  10. 4
  11. 4
  12. 4
  13. 5
  14. 5
  15. 5
  16. 6
  17. 6
  18. 6
  19. 7
  20. 7
  21. 7
  22. 8
  23. 8
  24. 8
  25. 9
  26. 9
  27. 9
In [5]:
for (t in 1:T){
    y_array[1:M,1:J,t] <- y[, which(period == t)]
}

print(y_array[1:5,,1:3])
, , 1

     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    0    0    0
[3,]   NA   NA   NA
[4,]    0    0    0
[5,]    0    0    0

, , 2

     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    0    0    0
[3,]    0    0    0
[4,]    1    0    0
[5,]    1    1    1

, , 3

     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    0    0    0
[3,]    1    1    0
[4,]    0    0    0
[5,]    0    0    1

Bundle the data for JAGS¶

In [6]:
jags_data <- list(M = M, J = J, T = T,
                  y = y_array
                 )
str(jags_data)
List of 4
 $ M: int 267
 $ J: num 3
 $ T: num 9
 $ y: int [1:267, 1:3, 1:9] 0 0 NA 0 0 NA 0 0 0 0 ...

Write the model in JAGS language¶

Initial occupancy state

Math:

$$ z_{i,1} \sim \mathrm{Bernoulli}(\psi) $$

BUGS:

for (i in 1:M){
    z[i, 1] ~ dbern(psi)
}

Occupancy state at time t

Math:

at $t = 1$:

$$ z_{i,1} \sim \mathrm{Bernoulli}(\psi) $$

at $t > 1$:

$$ z_{i,t} \sim \mathrm{Bernoulli}(z_{i, t-1}\cdot(1-\epsilon) + (1-z_{i, t-1})\cdot\gamma) $$

BUGS:

for (i in 1:M){

    z[i, 1] ~ dbern(psi)

    for (t in 2:T){
        z[i, t] ~ dbern(z[i, t-1] * (1 - eps) + (1 - z[i, t-1]) * gam)
    }
}

Detection process

Math:

$$ y_{i, j, t} \sim \mathrm{Bernoulli}(p \cdot z_{i, t}) $$

BUGS:

# Detection
for (i in 1:M){
    for (t in 1:T){
        for (j in 1:J){
            y[i,j,t] ~ dbern(p * z[i, t])
        }
    }
}

Complete model file¶

In [7]:
cat("

model{

# State
for (i in 1:M){
    
    z[i, 1] ~ dbern(psi)

    for (t in 2:T){
        z[i, t] ~ dbern(z[i, t-1] * (1 - eps) + (1 - z[i, t-1]) * gam)
    }
}

# Detection
for (i in 1:M){
    for (t in 1:T){
        for (j in 1:J){
            y[i,j,t] ~ dbern(p * z[i, t])
        }
    }
}

# Priors
psi ~ dunif(0, 1)
eps ~ dunif(0, 1)
gam ~ dunif(0, 1)
p ~ dunif(0, 1)


}
", file = "dynocc_null.bugs")

Run model¶

In [8]:
pars <- c("psi", "eps", "gam", "p")

inits <- function() list(z = matrix(1, M, T))

library(jagsUI)
jags_null <- jags(data = jags_data, 
                  inits = inits, 
                  parameters.to.save = pars, 
                  model.file = "dynocc_null.bugs",
                  n.chains = 3, n.iter = 1000, n.burnin = 500, 
                  parallel=TRUE)
Processing function input....... 

Done. 
 
Beginning parallel processing using 3 cores. Console output will be suppressed.

Parallel processing completed.

Calculating statistics....... 

Done. 
In [9]:
jagsUI::traceplot(jags_null)
jags_null
JAGS output for model 'dynocc_null.bugs', generated by jagsUI.
Estimates based on 3 chains of 1000 iterations,
adaptation = 100 iterations (sufficient),
burn-in = 500 iterations and thin rate = 1,
yielding 1500 total samples from the joint posterior. 
MCMC ran in parallel for 0.548 minutes at time 2023-07-18 13:51:35.75341.

             mean     sd     2.5%      50%    97.5% overlap0 f  Rhat n.eff
psi         0.313  0.032    0.254    0.313    0.382    FALSE 1 1.004   417
eps         0.177  0.023    0.133    0.177    0.223    FALSE 1 1.050    48
gam         0.125  0.012    0.102    0.124    0.151    FALSE 1 1.032    68
p           0.518  0.014    0.492    0.518    0.545    FALSE 1 1.044    53
deviance 3644.009 78.074 3495.490 3642.852 3798.456    FALSE 1 1.057    41

Successful convergence based on Rhat values (all < 1.1). 
Rhat is the potential scale reduction factor (at convergence, Rhat=1). 
For each parameter, n.eff is a crude measure of effective sample size. 

overlap0 checks if 0 falls in the parameter's 95% credible interval.
f is the proportion of the posterior with the same sign as the mean;
i.e., our confidence that the parameter is positive or negative.

DIC info: (pD = var(deviance)/2) 
pD = 2899.2 and DIC = 6543.184 
DIC is an estimate of expected predictive error (lower is better).

Plot the posterior of psi¶

In [10]:
hist(jags_null$sims.list$psi)
abline(v=jags_null$mean$psi, col='red')

Plot 95% credible intervals¶

In [11]:
psi_est <- jags_null$mean$psi
psi_ci <- quantile(jags_null$sims.list$psi, c(0.025, 0.975))

plot(1, psi_est, cex=2, pch=19, xaxt='n', xlab="psi")
segments(1, psi_ci[1], 1, psi_ci[2])

Model with covariates¶

Site covariates¶

In [12]:
head(crossbill)
A data.frame: 6 × 58
ideleforestsurveysdet991det992det993det001det002det003⋯date043date051date052date053date061date062date063date071date072date073
<int><int><int><int><int><int><int><int><int><int>⋯<int><int><int><int><int><int><int><int><int><int>
11 450 33 0 0 0000⋯74355273195665204654
22 450213 0 0 0000⋯59203565152855195056
331050323NANANA000⋯95245987236290225561
44 950 93 0 0 0100⋯74226076225778224970
551150353 0 0 0111⋯69285173234270154360
66 550 23NANANA000⋯64273955294052214049

Format the detection covariate matrix for JAGS¶

Select only the columns corresponding to date data

In [13]:
datecols <- grepl("date", names(crossbill))
date <- as.matrix(crossbill[,datecols])
head(date)
A matrix: 6 × 27 of type int
date991date992date993date001date002date003date011date012date013date021⋯date043date051date052date053date061date062date063date071date072date073
34596533486830547629⋯74355273195665204654
17336522506614286113⋯59203565152855195056
NANANA38678528508330⋯95245987236290225561
29596533486824547639⋯74226076225778224970
24456521467131567323⋯69285173234270154360
NANANA22357227558028⋯64273955294052214049

Convert to an array, the same way we did with the y-matrix

In [14]:
date_array <- array(NA, dim = c(M, J, T))
period <- rep(1:T, each = J)

for (t in 1:T){
    date_array[,,t] <- date[, which(period == t)]
}
print(date_array[1:5,,1:3])
, , 1

     [,1] [,2] [,3]
[1,]   34   59   65
[2,]   17   33   65
[3,]   NA   NA   NA
[4,]   29   59   65
[5,]   24   45   65

, , 2

     [,1] [,2] [,3]
[1,]   33   48   68
[2,]   22   50   66
[3,]   38   67   85
[4,]   33   48   68
[5,]   21   46   71

, , 3

     [,1] [,2] [,3]
[1,]   30   54   76
[2,]   14   28   61
[3,]   28   50   83
[4,]   24   54   76
[5,]   31   56   73

Bundle the data for JAGS¶

Including the site-level covariates, which we standardize

In [15]:
jags_data <- list(M = M, J = J, T = T,
                  y = y_array,
                  date = (date_array - mean(date_array, na.rm=TRUE)) / sd(date_array, na.rm=TRUE),
                  ele = (crossbill$ele - mean(crossbill$ele)) / sd(crossbill$ele),
                  forest = (crossbill$forest - mean(crossbill$forest)) / sd(crossbill$forest)
                 )
str(jags_data)
List of 7
 $ M     : int 267
 $ J     : num 3
 $ T     : num 9
 $ y     : int [1:267, 1:3, 1:9] 0 0 NA 0 0 NA 0 0 0 0 ...
 $ date  : num [1:267, 1:3, 1:9] -0.921 -1.692 NA -1.148 -1.375 ...
 $ ele   : num [1:267] -1.1539 -1.1539 -0.2175 -0.3735 -0.0614 ...
 $ forest: num [1:267] -1.1471 -0.4967 -0.0992 -0.9303 0.0092 ...

Add covariates to JAGS code¶

Initial occupancy: Elevation

Colonization: Forest

Extinction: None

Detection: Date

Initial occupancy:

$$ \mathrm{logit}(\psi_{i,1}) = \alpha_0 + \alpha_{ele} \cdot \mathrm{ele}_i $$
for (i in 1:M){
  logit(psi[i,1]) <- alpha_0 + alpha_ele * ele[i]
}
# psi priors
alpha_0 ~ dlogis(0, 1)
alpha_ele ~ dlogis(0, 1)

Why dlogis?

Looks better on the original (probability) scale.

In [16]:
# Typical normal prior
hist(plogis(rnorm(1000, 0, 10)))
In [17]:
# dlogis prior
hist(plogis(rlogis(1000, 0, 1)))

Add linear predictor for gamma:

$$ \mathrm{logit}(\gamma_{i,t}) = \beta_0 + \beta_{forest} \cdot \mathrm{forest}_{i} $$
for (i in 1:M){
  logit(psi[i,1]) <- alpha_0 + alpha_ele * ele[i]
  for (t in 2:T){
    logit(gam[i,t]) <- beta_0 + beta_forest * forest[i]
  }
}

# psi priors
alpha_0 ~ dlogis(0, 1)
alpha_ele ~ dlogis(0, 1)
# gamma priors
beta_0 ~ dlogis(0, 1)
beta_ele ~ dlogis(0, 1)
# eps priors
eps ~ dunif(0, 1)
  • Extinction (eps) has no covariates, so it stays the same as the old model
  • Note that t starts with 2
  • We could also start t at 1 and go to T-1 instead

Detection parameters:

$$ \mathrm{logit}(p_{i,j,t}) = \rho_0 + \rho_{date} \cdot \mathrm{date}_{i,j,t} $$
  • Now we need to also loop over the $J$ occasions in each of the $T$ years
  • Note here that $t$ starts with 1 instead of 2.
for (i in 1:M){
  for (t in 1:T){
    for (j in 1:J){
      logit(p[i,j,t]) <- rho_0 + rho_date * date[i,j,t]
    }
  }  
}

# detection priors
rho_0 ~ dlogis(0, 1)
rho_date ~ dlogis(0, 1)

Complete JAGS model¶

In [18]:
cat("

model{

# State submodel
for (i in 1:M){
    
    z[i, 1] ~ dbern(psi[i,1])

    for (t in 2:T){
        z[i, t] ~ dbern(z[i, t-1] * (1 - eps) + (1 - z[i, t-1]) * gam[i,t])
    }
}

# ----------------------

# Calculate psi, gamma, eps
for (i in 1:M){

    logit(psi[i,1]) <- alpha_0 + alpha_ele * ele[i]

    for (t in 2:T){
        logit(gam[i,t]) <- beta_0 + beta_forest * forest[i]
    }
}

# psi priors
alpha_0 ~ dlogis(0, 1)
alpha_ele ~ dlogis(0, 1)
# gamma priors
beta_0 ~ dlogis(0, 1)
beta_forest ~ dlogis(0, 1)
# eps priors
eps ~ dunif(0, 1)

#--------------------------

# Detection submodel
for (i in 1:M){
    for (t in 1:T){
        for (j in 1:J){
            y[i,j,t] ~ dbern(p[i,j,t] * z[i, t])
        }
    }
}

# -----------------------

# Calculate p
for (i in 1:M){
    for (t in 1:T){
        for (j in 1:J){
            logit(p[i,j,t]) <- rho_0 + rho_date * date[i,j,t]
        }
    }  
}
rho_0 ~ dlogis(0, 1)
rho_date ~ dlogis(0, 1)

}
", file = "dynocc_covs.bugs")

Run model¶

In [19]:
pars <- c("alpha_0", "alpha_ele", "beta_0", "beta_forest",
          "rho_0", "rho_date")

inits <- function() list(z = matrix(1, M, T))

library(jagsUI)
jags_covs <- jags(data = jags_data, 
                  inits = inits, 
                  parameters.to.save = pars, 
                  model.file = "dynocc_covs.bugs",
                  n.chains = 3, n.iter = 1000, n.burnin = 500, 
                  parallel=TRUE)
Processing function input....... 

Done. 
 
Beginning parallel processing using 3 cores. Console output will be suppressed.
Error in checkForRemoteErrors(val): 3 nodes produced errors; first error: RUNTIME ERROR:
Unable to resolve the following parameters:
date[3,1,1] (line 53)
date[3,2,1] (line 53)
date[3,3,1] (line 53)
date[6,1,1] (line 53)
date[6,2,1] (line 53)
date[6,3,1] (line 53)
date[7,1,3] (line 53)
date[7,1,4] (line 53)
date[7,2,3] (line 53)
date[7,2,4] (line 53)
date[7,3,3] (line 53)
date[7,3,4] (line 53)
date[9,1,5] (line 53)
date[9,2,5] (line 53)
date[9,3,5] (line 53)
date[14,1,3] (line 53)
date[14,2,3] (line 53)
date[14,3,3] (line 53)
date[15,1,2] (line 53)
date[15,2,2] (line 53)
date[15,3,2] (line 53)
date[22,1,6] (line 53)
date[22,2,6] (line 53)
date[22,3,6] (line 53)
date[34,1,3] (line 53)
date[34,1,4] (line 53)
date[34,2,3] (line 53)
date[34,2,4] (line 53)
date[34,3,3] (line 53)
date[34,3,4] (line 53)
date[36,1,2] (line 53)
date[36,1,3] (line 53)
date[36,2,2] (line 53)
date[36,2,3] (line 53)
date[36,3,2] (line 53)
date[36,3,3] (line 53)
date[38,1,1] (line 53)
date[38,2,1] (line 53)
date[38,3,1] (line 53)
date[43,1,3] (line 53)
date[43,2,3] (line 53)
date[43,3,1]
Traceback:

1. jags(data = jags_data, inits = inits, parameters.to.save = pars, 
 .     model.file = "dynocc_covs.bugs", n.chains = 3, n.iter = 1000, 
 .     n.burnin = 500, parallel = TRUE)
2. run.parallel(data, inits, parameters.to.save, model.file, n.chains, 
 .     n.adapt, n.iter, n.burnin, n.thin, modules = modules, factories = factories, 
 .     DIC = DIC, verbose = verbose, n.cores = n.cores)
3. clusterApply(cl = cl, x = 1:n.chains, fun = jags.clust)
4. staticClusterApply(cl, fun, length(x), argfun)
5. checkForRemoteErrors(val)
6. stop(count, " nodes produced errors; first error: ", firstmsg, 
 .     domain = NA)
In [20]:
jags_data$date[43,3,1]
<NA>
In [21]:
sum(is.na(jags_data$date))
567

Missing values¶

(or unequal sample sizes)

A very common problem in multi-year datasets!

Solutions:

  • Change the indexing to work around the missing values
  • Impute the missing values

Impute missing dates¶

Missing date values are simulated as coming from a normal distribution with mean and SD of the original dataset

# Impute missing dates
for (i in 1:M){
    for (t in 1:T){
        for (j in 1:J){
            date[i,j,t] ~ dnorm(mu_date, tau_date)
        }
    }  
}
mu_date ~ dnorm(0, 0.01)
sigma_date ~ dunif(0, 10)
tau_date <- pow(sigma_date, -2)

It's OK that date has a mixture of data and missing values!

One way to think about it:

  • When date[i,j,k] is not NA: information flows from left to right
  • When date[i,j,k] is NA: information flows from right to left

Complete model .... again¶

In [22]:
cat("

model{

# State submodel
for (i in 1:M){
    
    z[i, 1] ~ dbern(psi[i,1])

    for (t in 2:T){
        z[i, t] ~ dbern(z[i, t-1] * (1 - eps) + (1 - z[i, t-1]) * gam[i,t])
    }
}

# ----------------------

# Calculate psi, gamma, eps
for (i in 1:M){

    logit(psi[i,1]) <- alpha_0 + alpha_ele * ele[i]

    for (t in 2:T){
        logit(gam[i,t]) <- beta_0 + beta_forest * forest[i]
    }
}

# psi priors
alpha_0 ~ dlogis(0, 1)
alpha_ele ~ dlogis(0, 1)
# gamma priors
beta_0 ~ dlogis(0, 1)
beta_forest ~ dlogis(0, 1)
# eps priors
eps ~ dunif(0, 1)

# -----------------------

# Detection submodel
for (i in 1:M){
    for (t in 1:T){
        for (j in 1:J){
            y[i,j,t] ~ dbern(p[i,j,t] * z[i, t])
        }
    }
}

# -----------------------

# Impute missing dates
for (i in 1:M){
    for (t in 1:T){
        for (j in 1:J){
            date[i,j,t] ~ dnorm(mu_date, tau_date)
        }
    }  
}
mu_date ~ dnorm(0, 0.01)
sigma_date ~ dunif(0, 10)
tau_date <- pow(sigma_date, -2)

# -----------------------

# Calculate p
for (i in 1:M){
    for (t in 1:T){
        for (j in 1:J){
            logit(p[i,j,t]) <- rho_0 + rho_date * date[i,j,t]
        }
    }  
}
rho_0 ~ dlogis(0, 1)
rho_date ~ dlogis(0, 1)

}
", file = "dynocc_covs.bugs")
In [23]:
pars <- c("alpha_0", "alpha_ele", "beta_0", "beta_forest",
          "rho_0", "rho_date", "mu_date", "sigma_date")

inits <- function() list(z = matrix(1, M, T))

library(jagsUI)
jags_covs <- jags(data = jags_data, 
                  inits = inits, 
                  parameters.to.save = pars, 
                  model.file = "dynocc_covs.bugs",
                  n.chains = 3, n.iter = 1000, n.burnin = 500, 
                  parallel=TRUE)
Processing function input....... 

Done. 
 
Beginning parallel processing using 3 cores. Console output will be suppressed.

Parallel processing completed.

Calculating statistics....... 

Done. 
In [24]:
jags_covs
jagsUI::traceplot(jags_covs)
JAGS output for model 'dynocc_covs.bugs', generated by jagsUI.
Estimates based on 3 chains of 1000 iterations,
adaptation = 100 iterations (sufficient),
burn-in = 500 iterations and thin rate = 1,
yielding 1500 total samples from the joint posterior. 
MCMC ran in parallel for 1.592 minutes at time 2023-07-18 13:54:03.252241.

                 mean     sd      2.5%       50%     97.5% overlap0     f  Rhat
alpha_0        -0.881  0.156    -1.169    -0.882    -0.576    FALSE 1.000 1.005
alpha_ele       0.508  0.147     0.220     0.506     0.821    FALSE 1.000 1.004
beta_0         -1.851  0.110    -2.065    -1.847    -1.646    FALSE 1.000 1.013
beta_forest     0.702  0.095     0.516     0.701     0.899    FALSE 1.000 1.008
rho_0           0.103  0.056    -0.010     0.103     0.213     TRUE 0.965 1.008
rho_date        0.219  0.044     0.134     0.219     0.304    FALSE 1.000 1.001
mu_date         0.000  0.012    -0.025     0.000     0.023     TRUE 0.502 1.000
sigma_date      1.000  0.008     0.984     1.000     1.017    FALSE 1.000 1.001
deviance    22388.215 74.134 22246.979 22384.989 22538.551    FALSE 1.000 1.010
            n.eff
alpha_0       441
alpha_ele     458
beta_0        151
beta_forest   298
rho_0         299
rho_date     1010
mu_date      1500
sigma_date   1500
deviance      213

Successful convergence based on Rhat values (all < 1.1). 
Rhat is the potential scale reduction factor (at convergence, Rhat=1). 
For each parameter, n.eff is a crude measure of effective sample size. 

overlap0 checks if 0 falls in the parameter's 95% credible interval.
f is the proportion of the posterior with the same sign as the mean;
i.e., our confidence that the parameter is positive or negative.

DIC info: (pD = var(deviance)/2) 
pD = 2725.7 and DIC = 25113.87 
DIC is an estimate of expected predictive error (lower is better).
mean     sd      2.5%       50%     97.5% overlap0     f  Rhat
alpha_ele       0.503  0.153     0.195     0.505     0.799    FALSE 0.999 1.008
mean     sd      2.5%       50%     97.5% overlap0     f  Rhat
beta_forest     0.703  0.089     0.528     0.704     0.873    FALSE 1.000 1.002
mean     sd      2.5%       50%     97.5% overlap0     f  Rhat
rho_date        0.220  0.048     0.125     0.222     0.311    FALSE 1.000 0.999

Plot effect of elevation on psi¶

  1. Generate sequence of elevation values
  2. Standardize them
  3. Plug them into the linear predictor for psi
  4. Plot results
In [25]:
elev_rng <- range(crossbill$ele)
elev_seq <- seq(elev_rng[1], elev_rng[2], length.out=100) # 1.
elev_std <- (elev_seq - mean(crossbill$ele)) / sd(crossbill$ele) # 2.
head(elev_std)
  1. -1.466059441561
  2. -1.42664638446525
  3. -1.38723332736949
  4. -1.34782027027373
  5. -1.30840721317798
  6. -1.26899415608222
$$ \mathrm{logit}(\psi_{i,1}) = \alpha_0 + \alpha_{ele} \cdot \mathrm{ele}_i $$
logit(psi[i,1]) <- alpha_0 + alpha_ele * ele[i]
In [26]:
psi_est <- jags_covs$mean$alpha_0 + jags_covs$mean$alpha_ele * elev_std # 3.
psi_est <- plogis(psi_est)
head(psi_est)
  1. 0.164507961757281
  2. 0.167276536912505
  3. 0.17008222049955
  4. 0.172925190681703
  5. 0.175805615264281
  6. 0.178723651311999
In [27]:
plot(elev_seq, psi_est, type='l', xlab="Elevation", ylab="psi") # 4.

What about the 95% credible interval?

We need to calculate psi for each saved iteration in the JAGS output (not just the posterior mean).

In [28]:
nsamples <- jags_covs$mcmc.info$n.samples
nsamples
1500
In [29]:
psi_post <- matrix(NA, nrow=nsamples, ncol=100)
for (i in 1:100){
  psi_post[,i] <- jags_covs$sims.list$alpha_0 + jags_covs$sims.list$alpha_ele * elev_std[i]
}
psi_post <- plogis(psi_post)
psi_post[1:5,1:5]
A matrix: 5 × 5 of type dbl
0.15383200.15677360.15976070.16279380.1658732
0.18745310.19043330.19344970.19650220.1995909
0.23450800.23669510.23889620.24111130.2433404
0.22966860.23155330.23344870.23535490.2372719
0.18449370.18722500.18998720.19278050.1956050
In [30]:
psi_lower <- apply(psi_post, 2, quantile, 0.025) # lower bound of 95% ci
psi_upper <- apply(psi_post, 2, quantile, 0.975) # upper bound of 95% ci

head(psi_lower)
head(psi_upper)
  1. 0.0993102573252439
  2. 0.101936559423592
  3. 0.10447124336553
  4. 0.107309567274149
  5. 0.110477502425268
  6. 0.113906214908629
  1. 0.251834430255353
  2. 0.254116501168531
  3. 0.25632211405531
  4. 0.258317130737826
  5. 0.259992744355674
  6. 0.261979729500315

Plot in base R

In [31]:
plot(elev_seq, psi_est, type='l', xlab="Elevation", ylab="psi", ylim=c(0,0.8))
polygon(c(elev_seq, rev(elev_seq)),
        c(psi_lower, rev(psi_upper)), 
        col='grey90', border=NA)
lines(elev_seq, psi_est)

Plot in ggplot2

In [32]:
library(ggplot2)

plot_data <- data.frame(elev=elev_seq, psi=psi_est, lower=psi_lower, upper=psi_upper)
head(plot_data)
A data.frame: 6 × 4
elevpsilowerupper
<dbl><dbl><dbl><dbl>
1250.00000.16450800.099310260.2518344
2275.25250.16727650.101936560.2541165
3300.50510.17008220.104471240.2563221
4325.75760.17292520.107309570.2583171
5351.01010.17580560.110477500.2599927
6376.26260.17872370.113906210.2619797
In [33]:
ggplot(data=plot_data, aes(x=elev)) +
  geom_ribbon(aes(ymin=lower, ymax=upper), alpha=0.2) +
  geom_line(aes(y=psi))

Estimate occupancy probability for years 2...T¶

Right now we only estimate $\psi$ at $t=1$!

What about for other times?

  • For this we need a derived parameter, that is a function of other parameters already in the model

  • $\psi_{t}$ at times $t>1$ is a function of occupancy in the previous time step, extinction, and colonization

$$\psi_{i,t} = \psi_{i,t-1} \cdot (1-\epsilon_{i,t}) + (1 - \psi_{i,t}) * \gamma_{i,t} $$
$$\psi_{i,t} = \psi_{i,t-1} \cdot (1-\epsilon_{i,t}) + (1 - \psi_{i,t}) * \gamma_{i,t} $$

This math is very similar to JAGS code we already have in our model:

for (t in 2:T){
    z[i, t] ~ dbern(z[i, t-1] * (1 - eps) + (1 - z[i, t-1]) * gam[i,t])
}

We just need to replace z with psi.

# Derive psi at t>1
for (i in 1:M){
    for (t in 2:T){
        psi[i, t] <- psi[i, t-1] * (1 - eps) + (1 - psi[i, t-1]) * gam[i,t]
    }
}
  • Note that psi is now indexed by both i and t
In [34]:
cat("

model{

# State submodel
for (i in 1:M){
    
    z[i, 1] ~ dbern(psi[i,1])

    for (t in 2:T){
        z[i, t] ~ dbern(z[i, t-1] * (1 - eps) + (1 - z[i, t-1]) * gam[i,t])
    }
}

# ----------------------

# Calculate psi, gamma, eps
for (i in 1:M){

    logit(psi[i,1]) <- alpha_0 + alpha_ele * ele[i]

    for (t in 2:T){
        logit(gam[i,t]) <- beta_0 + beta_forest * forest[i]
    }
}

# psi priors
alpha_0 ~ dlogis(0, 1)
alpha_ele ~ dlogis(0, 1)
# gamma priors
beta_0 ~ dlogis(0, 1)
beta_forest ~ dlogis(0, 1)
# eps priors
eps ~ dunif(0, 1)

# -----------------------

# Detection submodel
for (i in 1:M){
    for (t in 1:T){
        for (j in 1:J){
            y[i,j,t] ~ dbern(p[i,j,t] * z[i, t])
        }
    }
}
# -----------------------

# Impute missing dates
for (i in 1:M){
    for (t in 1:T){
        for (j in 1:J){
            date[i,j,t] ~ dnorm(mu_date, tau_date)
        }
    }  
}
mu_date ~ dnorm(0, 0.01)
sigma_date ~ dunif(0, 10)
tau_date <- pow(sigma_date, -2)

# -----------------------

# Calculate p
for (i in 1:M){
    for (t in 1:T){
        for (j in 1:J){
            logit(p[i,j,t]) <- rho_0 + rho_date * date[i,j,t]
        }
    }  
}
rho_0 ~ dlogis(0, 1)
rho_date ~ dlogis(0, 1)

# -----------------------

# Derive psi at t>1
for (i in 1:M){
    for (t in 2:T){
        psi[i, t] <- psi[i, t-1] * (1 - eps) + (1 - psi[i, t-1]) * gam[i,t]
    }
}

}
", file = "dynocc_covs.bugs")
In [35]:
pars <- c("alpha_0", "alpha_ele", "beta_0", "beta_forest",
          "rho_0", "rho_date", "mu_date", "sigma_date", "psi")

inits <- function() list(z = matrix(1, M, T))

library(jagsUI)
jags_covs <- jags(data = jags_data, 
                  inits = inits, 
                  parameters.to.save = pars, 
                  model.file = "dynocc_covs.bugs",
                  n.chains = 3, n.iter = 1000, n.burnin = 500, 
                  parallel=TRUE)
Processing function input....... 

Done. 
 
Beginning parallel processing using 3 cores. Console output will be suppressed.

Parallel processing completed.

Calculating statistics....... 

Done. 
In [36]:
dim(jags_covs$sims.list$psi)
  1. 1500
  2. 267
  3. 9

Plot psi over time for sites 1-5

In [37]:
psi_est <- psi_lower <- psi_upper <- matrix(NA, nrow=5, ncol=9)
for (t in 1:9){
  for (i in 1:5){
    psi_it <- jags_covs$sims.list$psi[,i,t]
    psi_est[i,t] <- mean(psi_it)
    psi_lower[i,t] <- quantile(psi_it, 0.025)
    psi_upper[i,t] <- quantile(psi_it, 0.975)
  }
}
In [38]:
cols <- rainbow(5)
plot(1:9, psi_est[1,1:9], ylim=c(0,0.5), type='o', pch=19, 
     col = cols[1],
     ylab="psi", xlab="time")
for (i in 2:5){
  lines(1:9, psi_est[i,1:9], type='o', pch=19, col=cols[i])    
}

Exercise¶

Use JAGS to fit the following model:

  • initial occupancy: elevation + elevation^2 + forest
  • colonization: forest
  • extinction: elevation
  • detection: date

Also try plotting the effect of elevation on extinction.

JAGS model

In [39]:
cat("

model{

# State submodel
for (i in 1:M){
    
    z[i, 1] ~ dbern(psi[i,1])

    for (t in 2:T){
        z[i, t] ~ dbern(z[i, t-1] * (1 - eps[i,t]) + (1 - z[i, t-1]) * gam[i,t])
    }
}

# ----------------------

# Calculate psi, gamma, eps
for (i in 1:M){

    logit(psi[i,1]) <- alpha_0 + alpha_ele * ele[i] + alpha_ele2 * ele2[i] + alpha_forest * forest[i]

    for (t in 2:T){
        logit(gam[i,t]) <- beta_0 + beta_forest * forest[i]
        logit(eps[i,t]) <- eps_0 + eps_ele * ele[i]
    }
}

# psi priors
alpha_0 ~ dlogis(0, 1)
alpha_ele ~ dlogis(0, 1)
alpha_ele2 ~ dlogis(0, 1)
alpha_forest ~ dlogis(0, 1)
# gamma priors
beta_0 ~ dlogis(0, 1)
beta_forest ~ dlogis(0, 1)
# eps priors
eps_0 ~ dlogis(0,1)
eps_ele ~ dlogis(0, 1)

# -----------------------

# Detection submodel
for (i in 1:M){
    for (t in 1:T){
        for (j in 1:J){
            y[i,j,t] ~ dbern(p[i,j,t] * z[i, t])
        }
    }
}
# -----------------------

# Impute missing dates
for (i in 1:M){
    for (t in 1:T){
        for (j in 1:J){
            date[i,j,t] ~ dnorm(mu_date, tau_date)
        }
    }  
}
mu_date ~ dnorm(0, 0.01)
sigma_date ~ dunif(0, 10)
tau_date <- pow(sigma_date, -2)

# -----------------------

# Calculate p
for (i in 1:M){
    for (t in 1:T){
        for (j in 1:J){
            logit(p[i,j,t]) <- rho_0 + rho_date * date[i,j,t]
        }
    }  
}
rho_0 ~ dlogis(0, 1)
rho_date ~ dlogis(0, 1)

# -----------------------

# Derive psi at t>1
for (i in 1:M){
    for (t in 2:T){
        psi[i, t] <- psi[i, t-1] * (1 - eps[i,t]) + (1 - psi[i, t-1]) * gam[i,t]
    }
}

}
", file = "dynocc_exercise.bugs")
In [40]:
jags_data$ele2 <- jags_data$ele^2

pars <- c("alpha_0", "alpha_ele", "alpha_ele2", "alpha_forest", "beta_0", "beta_forest",
          "rho_0", "rho_date", "eps_0", "eps_ele")

inits <- function() list(z = matrix(1, M, T))

library(jagsUI)
jags_exercise <- jags(data = jags_data, 
                  inits = inits, 
                  parameters.to.save = pars, 
                  model.file = "dynocc_exercise.bugs",
                  n.chains = 3, n.iter = 1000, n.burnin = 500, 
                  parallel=TRUE)
Processing function input....... 

Done. 
 
Beginning parallel processing using 3 cores. Console output will be suppressed.

Parallel processing completed.

Calculating statistics....... 

Done. 
In [41]:
jags_exercise
JAGS output for model 'dynocc_exercise.bugs', generated by jagsUI.
Estimates based on 3 chains of 1000 iterations,
adaptation = 100 iterations (sufficient),
burn-in = 500 iterations and thin rate = 1,
yielding 1500 total samples from the joint posterior. 
MCMC ran in parallel for 1.679 minutes at time 2023-07-18 13:57:18.6436.

                  mean     sd      2.5%       50%     97.5% overlap0     f
alpha_0          0.019  0.292    -0.567     0.015     0.617     TRUE 0.519
alpha_ele        1.340  0.258     0.846     1.339     1.872    FALSE 1.000
alpha_ele2      -1.262  0.290    -1.855    -1.253    -0.721    FALSE 1.000
alpha_forest     0.424  0.205     0.036     0.427     0.851    FALSE 0.983
beta_0          -1.781  0.102    -1.997    -1.778    -1.595    FALSE 1.000
beta_forest      0.715  0.091     0.536     0.716     0.899    FALSE 1.000
rho_0            0.120  0.058     0.005     0.121     0.228    FALSE 0.980
rho_date         0.181  0.046     0.091     0.181     0.266    FALSE 1.000
eps_0           -1.191  0.148    -1.488    -1.192    -0.913    FALSE 1.000
eps_ele         -1.062  0.178    -1.414    -1.052    -0.714    FALSE 1.000
deviance     22365.496 68.972 22238.201 22362.378 22500.734    FALSE 1.000
              Rhat n.eff
alpha_0      1.035    68
alpha_ele    1.019   130
alpha_ele2   1.045    53
alpha_forest 1.006   394
beta_0       1.015   147
beta_forest  1.003   632
rho_0        1.004   608
rho_date     1.001  1500
eps_0        1.011   184
eps_ele      1.008   301
deviance     1.012   173

Successful convergence based on Rhat values (all < 1.1). 
Rhat is the potential scale reduction factor (at convergence, Rhat=1). 
For each parameter, n.eff is a crude measure of effective sample size. 

overlap0 checks if 0 falls in the parameter's 95% credible interval.
f is the proportion of the posterior with the same sign as the mean;
i.e., our confidence that the parameter is positive or negative.

DIC info: (pD = var(deviance)/2) 
pD = 2354 and DIC = 24719.48 
DIC is an estimate of expected predictive error (lower is better).
In [42]:
psi_est <- jags_exercise$mean$eps_0 + jags_exercise$mean$eps_ele * elev_std # 3.
psi_est <- plogis(psi_est)

psi_post <- matrix(NA, nrow=nsamples, ncol=100)
for (i in 1:100){
  psi_post[,i] <- jags_exercise$sims.list$eps_0 + jags_exercise$sims.list$eps_ele * elev_std[i]
}
psi_post <- plogis(psi_post)

psi_lower <- apply(psi_post, 2, quantile, 0.025) # lower bound of 95% ci
psi_upper <- apply(psi_post, 2, quantile, 0.975) # upper bound of 95% ci

plot(elev_seq, psi_est, type='l', xlab="Elevation", ylab="eps", ylim=c(0,0.8))
polygon(c(elev_seq, rev(elev_seq)),
        c(psi_lower, rev(psi_upper)), 
        col='grey90', border=NA)
lines(elev_seq, psi_est)

Dynamic occupancy model in unmarked¶

Look at the data again:

In [43]:
head(crossbill)
A data.frame: 6 × 58
ideleforestsurveysdet991det992det993det001det002det003⋯date043date051date052date053date061date062date063date071date072date073
<int><int><int><int><int><int><int><int><int><int>⋯<int><int><int><int><int><int><int><int><int><int>
11 450 33 0 0 0000⋯74355273195665204654
22 450213 0 0 0000⋯59203565152855195056
331050323NANANA000⋯95245987236290225561
44 950 93 0 0 0100⋯74226076225778224970
551150353 0 0 0111⋯69285173234270154360
66 550 23NANANA000⋯64273955294052214049

Grab the detection/non-detection data.

This time we want to keep it in "wide" format (M x TJ), because that's what unmarked uses.

In [44]:
ycols <- grepl("det", names(crossbill))
y <- crossbill[,ycols]
names(y)
head(y)
  1. 'det991'
  2. 'det992'
  3. 'det993'
  4. 'det001'
  5. 'det002'
  6. 'det003'
  7. 'det011'
  8. 'det012'
  9. 'det013'
  10. 'det021'
  11. 'det022'
  12. 'det023'
  13. 'det031'
  14. 'det032'
  15. 'det033'
  16. 'det041'
  17. 'det042'
  18. 'det043'
  19. 'det051'
  20. 'det052'
  21. 'det053'
  22. 'det061'
  23. 'det062'
  24. 'det063'
  25. 'det071'
  26. 'det072'
  27. 'det073'
A data.frame: 6 × 27
det991det992det993det001det002det003det011det012det013det021⋯det043det051det052det053det061det062det063det071det072det073
<int><int><int><int><int><int><int><int><int><int>⋯<int><int><int><int><int><int><int><int><int><int>
1 0 0 00000000⋯0000000000
2 0 0 00000000⋯0000000000
3NANANA0001100⋯0000001010
4 0 0 01000000⋯0000001011
5 0 0 01110011⋯0100000111
6NANANA0000000⋯0000000000

Get the site covariates:

In [45]:
site_covs <- crossbill[,1:4]
head(site_covs)
A data.frame: 6 × 4
ideleforestsurveys
<int><int><int><int>
11 450 33
22 450213
331050323
44 950 93
551150353
66 550 23

And the observation covariate (date):

In [46]:
datecols <- grepl("date", names(crossbill))
date <- crossbill[,datecols]
names(date)
  1. 'date991'
  2. 'date992'
  3. 'date993'
  4. 'date001'
  5. 'date002'
  6. 'date003'
  7. 'date011'
  8. 'date012'
  9. 'date013'
  10. 'date021'
  11. 'date022'
  12. 'date023'
  13. 'date031'
  14. 'date032'
  15. 'date033'
  16. 'date041'
  17. 'date042'
  18. 'date043'
  19. 'date051'
  20. 'date052'
  21. 'date053'
  22. 'date061'
  23. 'date062'
  24. 'date063'
  25. 'date071'
  26. 'date072'
  27. 'date073'

Make the unmarkedFrame

Special unmarked frame type: unmarkedMultFrame

We need to specify the number of seasons (numPrimary)

In [47]:
umf <- unmarkedMultFrame(y = y, siteCovs = site_covs, obsCovs = list(date = date), numPrimary = 9)
summary(umf)
unmarkedFrame Object

267 sites
Maximum number of observations per site: 27 
Mean number of observations per site: 24.99 
Number of primary survey periods: 9 
Number of secondary survey periods: 3 
Sites with at least one detection: 165 

Tabulation of y observations:
   0    1 <NA> 
5311 1362  536 

Site-level covariates:
       id             ele           forest         surveys     
 Min.   :  1.0   Min.   : 250   Min.   : 0.00   Min.   :2.000  
 1st Qu.: 67.5   1st Qu.: 550   1st Qu.: 8.50   1st Qu.:3.000  
 Median :134.0   Median :1150   Median :33.00   Median :3.000  
 Mean   :134.0   Mean   :1189   Mean   :34.75   Mean   :2.824  
 3rd Qu.:200.5   3rd Qu.:1850   3rd Qu.:56.50   3rd Qu.:3.000  
 Max.   :267.0   Max.   :2750   Max.   :99.00   Max.   :3.000  

Observation-level covariates:
      date       
 Min.   : 10.00  
 1st Qu.: 37.00  
 Median : 55.00  
 Mean   : 54.29  
 3rd Qu.: 71.00  
 Max.   :118.00  
 NA's   :567     

Fit a null model¶

We will use the colext function.

We need to specify R formulas for four parameters:

  • $\psi$, psiformula
  • $\epsilon$, epsilonformula
  • $\gamma$, gammaformula
  • $p$, pformula

For the null model, all four will be intercept-only, or ~1.

In [48]:
mod_null <- colext(psiformula = ~1, epsilonformula = ~1, gammaformula = ~1, pformula = ~1, umf)
summary(mod_null)
Call:
colext(psiformula = ~1, gammaformula = ~1, epsilonformula = ~1, 
    pformula = ~1, data = umf)

Initial (logit-scale):
 Estimate    SE     z  P(>|z|)
   -0.795 0.154 -5.16 2.52e-07

Colonization (logit-scale):
 Estimate    SE     z  P(>|z|)
    -1.96 0.111 -17.6 1.63e-69

Extinction (logit-scale):
 Estimate    SE     z  P(>|z|)
    -1.56 0.152 -10.2 1.37e-24

Detection (logit-scale):
 Estimate     SE    z P(>|z|)
   0.0668 0.0564 1.18   0.236

AIC: 5197.441 
Number of sites: 267
optim convergence code: 0
optim iterations: 53 
Bootstrap iterations: 0 

Get parameters on the probability scale¶

In [49]:
plogis(-0.795)
0.311096082023045

Or use the backTransform function:

In [50]:
backTransform(mod_null, 'psi')
Backtransformed linear combination(s) of Initial estimate(s)

 Estimate    SE LinComb (Intercept)
    0.311 0.033  -0.795           1

Transformation: logistic 
In [51]:
backTransform(mod_null, 'det')
Backtransformed linear combination(s) of Detection estimate(s)

 Estimate     SE LinComb (Intercept)
    0.517 0.0141  0.0668           1

Transformation: logistic 
In [52]:
confint(backTransform(mod_null, 'psi'))
A matrix: 1 × 2 of type dbl
0.0250.975
0.25023680.3792124

Or use the predict function:

In [53]:
head(predict(mod_null, 'psi'))
A data.frame: 6 × 4
PredictedSElowerupper
<dbl><dbl><dbl><dbl>
10.31107010.033046150.25023680.3792124
20.31107010.033046150.25023680.3792124
30.31107010.033046150.25023680.3792124
40.31107010.033046150.25023680.3792124
50.31107010.033046150.25023680.3792124
60.31107010.033046150.25023680.3792124

Comparison with JAGS estimates¶

In [54]:
cbind(JAGS=unlist(jags_null$mean[c(1,3,2,4)]),
      unmarked=plogis(coef(mod_null)))
A matrix: 4 × 2 of type dbl
JAGSunmarked
psi0.31271660.3110701
gam0.12468830.1231143
eps0.17721800.1736495
p0.51809580.5166957

Model with covariates¶

  • $\psi$: elevation
  • $\epsilon$: intercept-only
  • $\gamma$: forest
  • $p$: date

(same as with JAGS)

In [55]:
mod_covs <- colext(psiformula = ~scale(ele), epsilonformula = ~1, 
                   gammaformula = ~scale(forest), pformula = ~scale(date), umf)
summary(mod_covs)
Warning message:
“Some observations have been discarded because correspoding covariates were missing.”
Call:
colext(psiformula = ~scale(ele), gammaformula = ~scale(forest), 
    epsilonformula = ~1, pformula = ~scale(date), data = umf)

Initial (logit-scale):
            Estimate    SE     z  P(>|z|)
(Intercept)   -0.890 0.157 -5.65 1.60e-08
scale(ele)     0.506 0.151  3.34 8.27e-04

Colonization (logit-scale):
              Estimate     SE      z  P(>|z|)
(Intercept)     -1.857 0.1086 -17.10 1.46e-65
scale(forest)    0.705 0.0908   7.77 8.14e-15

Extinction (logit-scale):
 Estimate    SE     z  P(>|z|)
    -1.41 0.146 -9.66 4.62e-22

Detection (logit-scale):
            Estimate     SE    z  P(>|z|)
(Intercept)    0.101 0.0566 1.78 7.53e-02
scale(date)    0.219 0.0458 4.79 1.69e-06

AIC: 5103.782 
Number of sites: 267
optim convergence code: 0
optim iterations: 55 
Bootstrap iterations: 0 

Get estimates on probability scale

In [56]:
backTransform(mod_covs, 'psi')
Error in .local(obj, ...): Cannot directly backTransform an unmarkedEstimate with length > 1.
Traceback:

1. backTransform(mod_covs, "psi")
2. backTransform(mod_covs, "psi")
3. .local(obj, ...)
4. stop("Cannot directly backTransform an unmarkedEstimate with length > 1.")
In [57]:
head(predict(mod_covs, "psi"))
Warning message:
“Some observations have been discarded because correspoding covariates were missing.”
A data.frame: 6 × 4
PredictedSElowerupper
<dbl><dbl><dbl><dbl>
10.18630590.038038340.12282060.2724154
20.18630590.038038340.12282060.2724154
30.26893110.032499790.21014680.3371403
40.25368090.033049040.19447220.3236729
50.28474830.032392620.22569950.3522185
60.19858350.037060770.13568670.2811520
In [58]:
head(predict(mod_covs, "det"))
Warning message:
“Some observations have been discarded because correspoding covariates were missing.”
A data.frame: 6 × 4
PredictedSElowerupper
<dbl><dbl><dbl><dbl>
10.47465120.018013460.43953020.5100248
20.53682670.014158790.50899040.5644353
30.55165190.014785930.52252910.5804248
40.47216770.018331490.43644380.5081789
50.50949620.014668580.48074660.5381831
60.55903190.015308540.52884810.5887856

Compare with JAGS

In [59]:
cbind(JAGS=unlist(jags_covs$mean[1:6]),
      unmarked=coef(mod_covs)[c(1:4,6:7)])
A matrix: 6 × 2 of type dbl
JAGSunmarked
alpha_0-0.8942170-0.8899519
alpha_ele 0.5087193 0.5063164
beta_0-1.8539007-1.8570536
beta_forest 0.7022862 0.7047543
rho_0 0.1052687 0.1006335
rho_date 0.2205359 0.2194877

Model goodness-of-fit¶

Procedure applying parameteric bootstrap:

  1. Calculate some fit statistic for model fit to real data
  2. Simulate $n$ datsets from fitted model
  3. Calculate fit statistics for simulated datasets
  4. Compare real fit stat to distribution of simulated fit stats

If model fits well, well they should be similar.

Default GOF test in unmarked is sum of squared residuals (SSE).

$$ \sum{(y_{ijt} - \hat{y}_{ijt})^2} $$$$\hat{y}_{ijt} = \psi_{it} \cdot p_{ijt} $$

We can do parametric bootstrap with the parboot function.

In [60]:
gof <- suppressWarnings(parboot(mod_covs, statistic=SSE, nsim=30))
In [61]:
gof
plot(gof)
Call: parboot(object = mod_covs, statistic = SSE, nsim = 30)

Parametric Bootstrap Statistics:
      t0 mean(t0 - t_B) StdDev(t0 - t_B) Pr(t_B > t0)
SSE 1004          -35.8             33.1        0.806

t_B quantiles:
      0% 2.5%  25%  50%  75% 97.5% 100%
[1,] 964  973 1018 1039 1065  1092 1097

t0 = Original statistic computed from data
t_B = Vector of bootstrap samples

Alternative statistic: MacKenzie-Bailey chi-square (MacKenzie & Bailey 2004, JABES)

Test is based on comparing expected vs. observed frequency of specific detection histories (001, 110, 101, etc.)

In [62]:
library(AICcmodavg)
mb_test <- suppressWarnings(mb.gof.test(mod_covs, nsim=30))

Thoughts on GOF tests

  • unmarked's approach to calculating residuals (and thus SSE) is crude
  • M-B test struggles when there are a lot of repeated samples or a lot of NAs
  • More work needed in this area

Get $\psi$ for each time $t$¶

Use the projected function

Mean for each $t$ across sites:

In [63]:
unmarked::projected(mod_covs)
A matrix: 2 × 9 of type dbl
123456789
unoccupied0.69896450.64797120.62163210.60784060.60057420.59675670.59478110.59379270.593331
occupied0.30103550.35202880.37836790.39215940.39942580.40324330.40521890.40620730.406669

For each site separately:

In [64]:
proj <- unmarked::projected(mod_covs, mean = FALSE)
dim(proj)
  1. 2
  2. 9
  3. 267
In [65]:
proj <- t(proj[2,,])  # just "occupied" probability
proj[1:5,]
A matrix: 5 × 9 of type dbl
0.18630590.20252230.21449600.22333690.22986480.23468470.23824360.24087130.2428115
0.18630590.23027220.26123600.28304260.29840010.30921590.31683300.32219740.3259754
0.26893110.30893920.33599470.35429110.36666400.37503120.38068960.38451600.3871037
0.25368090.25968700.26406200.26724890.26957030.27126130.27249300.27339030.2740438
0.28474830.32588220.35333980.37166830.38390290.39206970.39752120.40116020.4035893

Plot for first 5 sites

In [66]:
cols <- rainbow(5)
plot(1:9, proj[1,1:9], ylim=c(0,0.5), type='o', pch=19, 
     col = cols[1],
     ylab="psi", xlab="time")
for (i in 2:5){
  lines(1:9, proj[i,1:9], type='o', pch=19, col=cols[i])    
}

Getting CIs on these estimates

Use parboot function for parametric bootstrapping

Steps:

  • Simulate $n$ datasets from model
  • Apply statistic function to each dataset
  • Return results
In [67]:
?parboot
In [68]:
proj_output <- function(mod){
  unmarked::projected(mod)[2,] # just save "occupied" only
}
pb <- suppressWarnings(parboot(mod_covs, statistic=proj_output, nsim=30))
In [69]:
pb
Call: parboot(object = mod_covs, statistic = proj_output, nsim = 30)

Parametric Bootstrap Statistics:
     t0 mean(t0 - t_B) StdDev(t0 - t_B) Pr(t_B > t0)
1 0.301       6.69e-03           0.0319        0.613
2 0.352       3.94e-03           0.0249        0.613
3 0.378       2.14e-03           0.0234        0.613
4 0.392       8.93e-04           0.0238        0.613
5 0.399       1.43e-05           0.0245        0.581
6 0.403      -6.08e-04           0.0252        0.548
7 0.405      -1.05e-03           0.0257        0.516
8 0.406      -1.36e-03           0.0261        0.484
9 0.407      -1.58e-03           0.0264        0.484

t_B quantiles:
    0% 2.5%  25%  50%  75% 97.5% 100%
1 0.21 0.23 0.28 0.29 0.31  0.35 0.36
2 0.28 0.31 0.33 0.34 0.36  0.40 0.41
3 0.33 0.34 0.36 0.37 0.38  0.43 0.44
4 0.35 0.36 0.38 0.39 0.40  0.44 0.46
5 0.36 0.36 0.39 0.40 0.41  0.45 0.47
6 0.36 0.37 0.39 0.40 0.42  0.46 0.48
7 0.36 0.37 0.39 0.40 0.42  0.46 0.48
8 0.36 0.37 0.39 0.41 0.42  0.46 0.48
9 0.36 0.37 0.39 0.41 0.42  0.46 0.49

t0 = Original statistic computed from data
t_B = Vector of bootstrap samples
In [70]:
head(pb@t.star)
A matrix: 6 × 9 of type dbl
123456789
0.28195710.32905660.35377900.36690090.37391010.37765650.37964600.38068530.3812114
0.28921060.34924530.38196930.40019910.41053480.41647380.41991810.42192580.4230968
0.30212600.34562470.37049700.38484540.39317490.39802670.40085270.40249170.4034333
0.33054230.38876290.41974730.43653790.44575760.45086020.45368970.45525080.4560998
0.34402460.37601410.39465040.40553330.41189470.41561060.41777510.41902920.4197493
0.27258910.33862600.37520540.39585300.40770270.41460150.41866750.42108840.4225420

Calculate 95% CIs

In [71]:
proj_lower <- apply(pb@t.star, 2, quantile, 0.025)
proj_upper <- apply(pb@t.star, 2, quantile, 0.975)
proj_est <- projected(mod_covs)[2,]

Make plot

In [72]:
plot(1:9, proj_est, ylim=c(0.2,0.5), type='o', pch=19,
     xlab="time", ylab="psi")
segments(1:9, proj_lower, 1:9, proj_upper)

Build a map of occupancy¶

Use matching map data in Switzerland dataset.

In [73]:
data(Switzerland)
head(Switzerland)
A data.frame: 6 × 5
xyelevationforestwater
<int><int><int><int><int>
191094254276340 60
291094255276340111
391194254276380 40
491194255276390723
591194256276357 97
691194257276357 55

Convert dataset to raster format

(technically need to specify projection, but we won't worry about it)

In [74]:
library(terra)
sw_raster <- rast(Switzerland, type="xyz")
names(sw_raster) <- c("ele", "forest", "water")
plot(sw_raster)
terra 1.7.39

In [75]:
pr_map <- predict(mod_covs, type="psi", newdata=sw_raster)
plot(pr_map)

Exercise¶

Use unmarked to fit the following models:

Model 1:

  • initial occupancy: elevation + forest
  • colonization: forest
  • extinction: elevation
  • detection: date

Model 2:

  • initial occupancy: elevation + elevation^2 + forest
  • colonization: forest
  • extinction: elevation
  • detection: date

If using unmarked, also make a map of initial occupancy using the Switzerland dataset for both models

Hint: you can use the I() function in formulas to run a calculation, such as

~I(a^2)

In [76]:
mod_exercise1 <- colext(psiformula = ~scale(ele)+scale(forest),
                       gammaformula = ~scale(forest),
                       epsilonformula = ~scale(ele),
                       pformula = ~scale(date),
                       data = umf)
mod_exercise1
Warning message:
“Some observations have been discarded because correspoding covariates were missing.”
Call:
colext(psiformula = ~scale(ele) + scale(forest), gammaformula = ~scale(forest), 
    epsilonformula = ~scale(ele), pformula = ~scale(date), data = umf)

Initial:
              Estimate    SE     z  P(>|z|)
(Intercept)     -1.059 0.182 -5.81 6.10e-09
scale(ele)       0.715 0.182  3.92 8.76e-05
scale(forest)    0.856 0.177  4.83 1.33e-06

Colonization:
              Estimate     SE      z  P(>|z|)
(Intercept)     -1.746 0.1005 -17.36 1.63e-67
scale(forest)    0.709 0.0878   8.08 6.47e-16

Extinction:
            Estimate    SE     z  P(>|z|)
(Intercept)    -1.14 0.144 -7.91 2.51e-15
scale(ele)     -1.09 0.178 -6.10 1.05e-09

Detection:
            Estimate     SE    z  P(>|z|)
(Intercept)    0.145 0.0548 2.64 0.008266
scale(date)    0.177 0.0468 3.79 0.000152

AIC: 5039.136 
In [77]:
pr_exercise1 <- predict(mod_exercise1, 'psi', newdata=sw_raster)
plot(pr_exercise1[[1]])
In [78]:
mod_exercise2 <- colext(psiformula = ~scale(ele)+scale(forest)+I(scale(ele)^2),
                       gammaformula = ~scale(forest),
                       epsilonformula = ~scale(ele),
                       pformula = ~scale(date),
                       data = umf)
mod_exercise2
Warning message:
“Some observations have been discarded because correspoding covariates were missing.”
Call:
colext(psiformula = ~scale(ele) + scale(forest) + I(scale(ele)^2), 
    gammaformula = ~scale(forest), epsilonformula = ~scale(ele), 
    pformula = ~scale(date), data = umf)

Initial:
                 Estimate    SE        z  P(>|z|)
(Intercept)     -0.000405 0.298 -0.00136 9.99e-01
scale(ele)       1.349477 0.273  4.95019 7.41e-07
scale(forest)    0.424758 0.205  2.07119 3.83e-02
I(scale(ele)^2) -1.226293 0.299 -4.10108 4.11e-05

Colonization:
              Estimate     SE      z  P(>|z|)
(Intercept)     -1.790 0.1061 -16.87 7.16e-64
scale(forest)    0.715 0.0895   7.99 1.37e-15

Extinction:
            Estimate    SE     z  P(>|z|)
(Intercept)    -1.19 0.150 -7.95 1.86e-15
scale(ele)     -1.09 0.181 -6.03 1.65e-09

Detection:
            Estimate     SE    z  P(>|z|)
(Intercept)    0.122 0.0562 2.17 0.030286
scale(date)    0.172 0.0466 3.70 0.000216

AIC: 5017.826 
In [79]:
plotEffects(mod_exercise2, "psi", "ele")
In [80]:
pr_exercise2 <- predict(mod_exercise2, 'psi', newdata=sw_raster)
plot(pr_exercise2[[1]])

Model selection¶

Use the fitList and modSel functions.

In [81]:
fl <- fitList(mod_covs=mod_covs, mod_exercise1=mod_exercise1, mod_exercise2=mod_exercise2)
modSel(fl)
Warning message:
“Some observations have been discarded because correspoding covariates were missing.”
Warning message:
“Some observations have been discarded because correspoding covariates were missing.”
Warning message:
“Some observations have been discarded because correspoding covariates were missing.”
Warning message:
“Some observations have been discarded because correspoding covariates were missing.”
              nPars     AIC delta   AICwt cumltvWt
mod_exercise2    10 5017.83  0.00 1.0e+00     1.00
mod_exercise1     9 5039.14 21.31 2.4e-05     1.00
mod_covs          7 5103.78 85.96 2.2e-19     1.00

An alternative: ubms¶

  • unmarked-style interface
  • Fit in Bayesian framework with Stan
In [82]:
library(ubms)

mod_stan <- stan_colext(psiformula = ~scale(ele), epsilonformula = ~1, 
                        gammaformula = ~scale(forest), pformula = ~scale(date), umf,
                        chains = 3, iter = 300, refresh=0)
Attaching package: ‘ubms’


The following objects are masked from ‘package:unmarked’:

    fitList, projected


The following object is masked from ‘package:base’:

    gamma


Warning message:
“Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess”
Warning message:
“Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess”
In [83]:
mod_stan
traceplot(mod_stan)
Call:
stan_colext(psiformula = ~scale(ele), gammaformula = ~scale(forest), 
    epsilonformula = ~1, pformula = ~scale(date), data = umf, 
    chains = 3, iter = 300, refresh = 0)

Occupancy (logit-scale):
            Estimate    SD   2.5%  97.5% n_eff  Rhat
(Intercept)   -0.876 0.175 -1.249 -0.546   462 1.004
scale(ele)     0.498 0.158  0.184  0.804   705 0.995

Colonization (logit-scale):
              Estimate     SD   2.5%  97.5% n_eff  Rhat
(Intercept)     -1.859 0.1069 -2.049 -1.659   400 0.999
scale(forest)    0.706 0.0893  0.529  0.864   682 0.999

Extinction (logit-scale):
 Estimate    SD  2.5% 97.5% n_eff  Rhat
    -1.41 0.138 -1.68 -1.15   448 0.995

Detection (logit-scale):
            Estimate     SD    2.5% 97.5% n_eff  Rhat
(Intercept)    0.099 0.0563 -0.0102 0.206   399 0.997
scale(date)    0.222 0.0453  0.1276 0.307   836 1.001

LOOIC: 5110.618
Runtime: 36.793 sec
'pars' not specified. Showing first 10 parameters by default.

Compare unmarked and ubms output

In [84]:
cbind(unmarked=coef(mod_covs), ubms=coef(mod_stan))
A matrix: 7 × 2 of type dbl
unmarkedubms
psi(Int)-0.8899519-0.87648184
psi(scale(ele)) 0.5063164 0.49840203
col(Int)-1.8570536-1.85921071
col(scale(forest)) 0.7047543 0.70561970
ext(Int)-1.4072120-1.41252267
p(Int) 0.1006335 0.09903921
p(scale(date)) 0.2194877 0.22245250

Some interesting helper functions:

In [85]:
plot_posteriors(mod_stan)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
In [86]:
plot(mod_stan)
In [87]:
gof_test <- gof(mod_stan)
gof_test
plot(gof_test)
MacKenzie-Bailey Chi-square 
Point estimate = 404.289
Posterior predictive p = 0
In [88]:
plot_effects(mod_stan, 'state')
?stan_colext