###############################################################
#                                                             #
#    Solutions to the exercises in the book                   #
#    Kery, M. 2010. Introduction to WinBUGS for ecologists.   # 
#    Academic Press, Burlington.			      #
#                                                             #
###############################################################



# This document contains the solutions to the exercises in the book. This is mostly R and WinBUGS code, 
# therefore I chose to format this document as a simple text file, so you can open it in R directly or 
# in TinnR or similar text editors from which many people run R. 

# The document also contains some additional exercises, along with their solutions, which are not 
# contained in the first edition of the book.

# Some exercises deal with the Swiss hare data set, which you must first download from the book website 
# at www.mbr-pwrc.usgs.gov/pubanalysis/kerybook.
# This is a tab-delimited text file, which you can read into your R workspace using code similar to this:

hares <- read.table("F:/a_solutions WB book/hares.txt", header = TRUE)

# To execute the code, you can copy-paste parts of it into an open R window. Alternatively, you can open 
# the text file in Tinn-R (though beware of possible problems with sink(); check the book appendix for a 
# workaround) or any other R editor.

# All code was written and tried out with R 2.10.1. and WinBUGS 1.4.3.

# Note that you will have to set appropriately the following things, which are not documented in this code:
#    1. set an appropriate R working directory
#    2. change the bugs.directory setting in the bugs() function
#       when using a non-English computer (e.g., to "C:/Programme/WinBUGS14/" on German locales)
#    3. load the package R2WinBUGS and possibly other packages also


# Created in September 2010 by Marc Kery 
# Last changes: 20 November 2012








###############################################################
#                                                             #
#        Chapter 2: Introduction to Bayesian inference        #
#                                                             #
###############################################################




1. A classical application of Bayes rule
   -------------------------------------
   
Exercise:
Use Bayes' rule to compute the probability that you actually have the disease for which you have just 
been tested positively. What you know is this: 5% of the population to which you belong have the 
disease and the test has a sensitivity of 99% and a specificity of 95%. Denote D = 'have the disease' 
and N = 'not have the disease' and + = 'get a positive test result' and - = 'get a negative test result'.
| means 'given'. Also note that the test sensitivity is p(+|D) and specificity is p(-|N).

Solution:
Start with p(D|+) = p(+|D) * p(D) / p(+)

You can get a positive test result when you have the disease or when you don't have it. Hence, the 
denominator can be expressed as follows:

p(+) = p(+|D) * p(D) + (1-p(-|N)) * p(N)

which is this:

0.99 * 0.05 + (1-0.95) * (1-0.05)

Altogether, we get this for the probability to have the disease, given a positive test result:

p(D|+) = (0.99 * 0.05) / (0.99 * 0.05 + (1-0.95) * (1-0.05)) = 0.51

So there is only about a 51% chance of actually having the disease when you were just given a positive 
test result. Most people find this very surprising, given the very small error rates of the testing 
procedure (1 minus sensitivity and specificity, respectively). Note that this analysis assumes that 
what is effectively the prior of your analysis, p(D) = 0.05, is appropriate also for you. If there is a 
history in your family of having the disease in question, then the proportion of the general population
having the disease will NOT be a suitable prior for you in this analysis.








###############################################################
#                                                             # 
#        Chapter 4: First session in WinBUGS                  #
#                                                             #
###############################################################




1. Experiment with a few trivial errors in WinBUGS
   -----------------------------------------------
    
A colleague commented to me that she did not find this exercise very enlightening, since there are so 
many ways to make an error in coding an analysis in WinBUGS. The latter may well be true, but it is 
still very important that you get a feel for what may have gone wrong when WinBUGS crashes or reports 
an error. So here you can experiment with a few such causes for errors and see what WinBUGS tells you 
about the kind of error you made.

The part of the exercise with missing values is important for practical analyses, since those will
always include missing values, and you have to know how to deal with them. Therefore, this is probably 
the most important exercise in this section to solve.








###############################################################
#                                                             #
#        Chapter 5: Running WinBUGS from R                    #
#                                                             #
###############################################################




1. Informative priors
   ------------------
    
Retain a sample of the two data generation mechanism, i.e., one pair of data sets (with 10 or 1000 
birds measured). Then define a model with prior information included. You also have to chose the 
initial values carefully, because WinBUGS will crash when supplied with initials outside of the range 
over which a prior of a parameter is defined.


# Save model
sink("model.txt")
cat("
model {

# Priors
 population.mean ~ dunif(500, 590)	# Average believed to lie between 500 and 590 g
 precision <- 1 / population.variance
 population.variance <- population.sd * population.sd
 population.sd ~ dunif(0,100)

# Likelihood
 for(i in 1:nobs){
    mass[i] ~ dnorm(population.mean, precision)
 }
}
",fill=TRUE)
sink()


# Bundle data
win.data <- list(mass = y1000, nobs = length(y1000))

# Function to generate starting values: have to adapt that, too
inits <- function()
  list (population.mean = rnorm(1,550), population.sd = runif(1, 1, 30))


# Parameters to be monitored (= to estimate)
params <- c("population.mean", "population.sd", "population.variance")

# MCMC settings
nc <- 3		# Number of chains
ni <- 1000	# Number of draws from posterior (for each chain)
nb <- 1		# Number of draws to discard as burn-in
nt <- 1		# Thinning rate


# Start Gibbs sampler: Run model in WinBUGS and save results in object called out
out <- bugs(data = win.data, inits = inits, parameters.to.save = params, model.file = "model.txt", 
n.thin = nt, n.chains=nc, n.burnin=nb, n.iter=ni, debug=TRUE, DIC=TRUE, working.directory=getwd())

print(out, dig = 3)

# Here's the solution
> print(out, dig = 3)
Inference for Bugs model at "model.txt", fit using WinBUGS,
 3 chains, each with 1000 iterations (first 1 discarded)
 n.sims = 2997 iterations saved
                        mean     sd     2.5%     25%     50%     75%    97.5%  Rhat n.eff
population.mean      589.912  0.101  589.600  589.90  589.90  590.00  590.000 1.003  1500
population.sd         31.158  0.714   29.789   30.67   31.14   31.65   32.541 1.001  3000
population.variance  971.318 44.531  887.360  940.90  969.70 1002.00 1059.100 1.001  3000
deviance            9714.048  2.675 9711.000 9712.00 9713.00 9715.00 9721.000 1.021   550

So you see that the population mean is heavily affected by your choice of prior. This is not a bad 
thing, but rather, part of the essence of Bayesian inference: that your conclusion about a parameter
(embodied by the posterior distribution) is the result of both the information in the data AND of what
you knew or assumed about the magnitude of that parameter before getting your data.




2. Comparison of the inference about the parameters from the large and the small data sets
   ---------------------------------------------------------------------------------------
    
You have to first rewrite the model file to contain the non-informative priors. Then, you can simply 
change the 'data statements' before running the analysis.

win.data <- list(mass = y1000, nobs = length(y1000))		# To analyse the large data set
win.data <- list(mass = y10, nobs = length(y10))		# To analyse the small data set


Here's the summary of the analysis of one particular large data set:

> print( out, dig = 3)# Produces a summary of the object
Inference for Bugs model at "model.txt", fit using WinBUGS,
 3 chains, each with 1000 iterations (first 1 discarded)
 n.sims = 2997 iterations saved
                        mean     sd    2.5%    25%     50%     75%    97.5%  Rhat n.eff
population.mean      600.140  0.937  598.30  599.5  600.10  600.80  601.900 1.001  3000
population.sd         29.457  0.671   28.17   29.0   29.45   29.88   30.801 1.001  3000
population.variance  868.148 39.647  793.66  840.9  867.10  892.90  948.490 1.001  3000
deviance            9601.074  2.225 9599.00 9600.0 9600.00 9602.00 9607.000 1.012  2600
[...]

And here's the summary of the analysis of one particular small data set:

> print( out, dig = 3)# Produces a summary of the object
Inference for Bugs model at "model.txt", fit using WinBUGS,
 3 chains, each with 1000 iterations (first 1 discarded)
 n.sims = 2997 iterations saved
                       mean      sd    2.5%    25%    50%    75%    97.5%  Rhat n.eff
population.mean     587.722   8.996 569.980 582.10 587.70 593.30  605.310 1.002  1200
population.sd        26.788   8.416  16.269  21.21  25.12  30.13   48.412 1.004  1300
population.variance 788.383 611.895 264.750 449.70 630.80 907.60 2344.200 1.004  1300
deviance             92.241   2.729  89.640  90.33  91.43  93.20   99.840 1.009  1100
[...]

The large data set contains much more information about the population parameters, which are the aim 
of the estimation: the mean and the standard deviation of the body mass of the population of male 
peregrines. Therefore, the posterior means from analysing the large data set are much closer to the 
known, true values. In addition, the estimates are much more precise, i.e., the posterior sd for both 
parameters is much smaller in the large than in the small data set.




3. Some effects of the length of the Markov chains 
   -----------------------------------------------
    
For this analysis you can either take the native WinBUGS file of the analysis (i.e., the file with the
odc ending), or run WinBUGS from R and ask for only 1 or so iterations to be run. With bugs(..., debug 
= TRUE) WinBUGS then will stay open and you can run the chains for an additional 10, 100 etc. iterations 
and inspect the estimates after the required number of iterations.

# Bundle data
win.data <- list(mass = y1000, nobs = length(y1000))

# Function to generate starting values: have to adapt that, too
inits <- function()
  list (population.mean = rnorm(1, 600, 30), population.sd = runif(1, 0, 30))


# Parameters to be monitored (= to estimate)
params <- c("population.mean", "population.sd", "population.variance")

# MCMC settings
nc <- 3		
ni <- 3 	
nb <- 1		
nt <- 1		


# Start Gibbs sampler: Run model in WinBUGS and save results in object called out
out <- bugs(data = win.data, inits = inits, parameters.to.save = params, model.file = "model.txt", 
n.thin=nt, n.chains=nc, n.burnin=nb, n.iter=ni, debug=TRUE, DIC=TRUE, working.directory=getwd())

Summaries of the node statistics:

After approx. 10 iterations:
Node statistics
	 node	 mean	 sd	 MC error	2.5%	median	97.5%	start	sample
	deviance	9608.0	21.93	6.663	9599.0	9601.0	9720.0	2	30
	population.mean	600.1	1.188	0.2873	597.3	600.3	602.2	2	30
	population.sd	30.09	1.995	0.6734	27.82	29.54	38.43	2	30
	population.variance	909.2	130.5	43.74	774.2	872.3	1477.0	2	30


After approx. 100 iterations:
Node statistics
	 node	 mean	 sd	 MC error	2.5%	median	97.5%	start	sample
	deviance	9602.0	7.548	0.6984	9599.0	9600.0	9609.0	2	297
	population.mean	600.2	0.9721	0.07288	598.1	600.2	602.0	2	297
	population.sd	29.47	0.8875	0.0687	28.13	29.39	31.13	2	297
	population.variance	869.4	55.53	4.339	791.1	863.5	968.8	2	297


After approx. 1,000 iterations:
	 node	 mean	 sd	 MC error	2.5%	median	97.5%	start	sample
	deviance	9601.0	3.032	0.0779	9599.0	9600.0	9607.0	2	2997
	population.mean	600.1	0.9501	0.01912	598.2	600.1	602.0	2	2997
	population.sd	29.46	0.6804	0.01407	28.24	29.41	30.8	2	2997
	population.variance	868.2	40.58	0.8468	797.5	865.1	948.7	2	2997


After approx. 10,000 iterations:
Node statistics
	 node	 mean	 sd	 MC error	2.5%	median	97.5%	start	sample
	deviance	9601.0	2.13	0.01487	9599.0	9600.0	9606.0	2	29997
	population.mean	600.1	0.9381	0.005661	598.3	600.1	602.0	2	29997
	population.sd	29.44	0.6642	0.003665	28.18	29.43	30.77	2	29997
	population.variance	867.1	39.21	0.2173	794.1	865.9	946.9	2	29997


You see that with increasing number of MCMC samples from the posterior distributions of the parameters,
you decrease the Monte Carlo error, but that the estimates of the posterior distribution are hardly 
changed. In particular, the posterior standard deviation (sd) does not go down with increasing MCMC 
sample size. The posterior sd is a measure of the precision of the estimate of a parameter, i.e., of 
how concentrated ('narrow') the posterior distribution is. This is only a function of the data and 
of the priors (and, the model), but not of the length of the Markov chains with which you explore that 
posterior distribution.

To look at the values of the Brooks-Gelman-Rubin statistics, you can repeatedly run the analysis with 
different chain lengths and then look at the summary of the analysis produced by the bugs() function. 
You will see that with increasing chain length, the value of this convergence criterion will come down 
and become essentially equal to 1 with the largest chain length.




4. Posterior standard deviation and Monte Carlo (MC) error 
   -------------------------------------------------------
    
Look at the summaries of the node statistics above. You will see that the posterior sd does not 
systematically change as you increase chain length, but that the MC error becomes smaller and smaller. 
According to a rule of thumb, the MC error should be smaller that about 1-5% of the posterior sd for
acceptable chain length.




5. Derived quantities (compute CV); and also compute the residuals
   ---------------------------------------------------------------

Exercise:
We compute the CV and, in addition to what is asked for in the book, also add a line in the BUGS code to
compute the residuals. Note that in the book I ask for a "point estimate (with SE)". This is not 
strictly correct, but what I mean is that you should report the posterior mean as a point estimate and 
the posterior standard deviation as a measure of the precision of that estimate. The latter is analogous
to a standard error (SE) in ML inference.

Solution:
To compute the CV (in %) and the residuals, we simply add two lines that define these two estimands 
(called 'node' by WinBUGS). When you change the name of the BUGS model file (as I do below), remember to
make the appropriate change also when you call the bugs function.


# Save BUGS description of the model to working directory
sink("model_cv_resi.txt")
cat("
model {

# Priors
 population.mean ~ dunif(0,5000)
 precision <- 1 / population.variance
 population.variance <- population.sd * population.sd
 population.sd ~ dunif(0,100)
 cv <- 100*(population.sd / population.mean) 	# Line to compute the CV in %

# Likelihood
 for(i in 1:nobs){
    mass[i] ~ dnorm(population.mean, precision)
    resi[i] <- mass[i] - population.mean  	# Line to compute the residuals
 }
}
",fill=TRUE)
sink()

# Package all the stuff to be handed over to WinBUGS
# Bundle data
win.data <- list(mass = y10, nobs = length(y10)) 


# Function to generate starting values
inits <- function()
  list (population.mean = rnorm(1,600), population.sd = runif(1, 1, 30))

# Parameters to be monitored (= to estimate)
params <- c("population.mean", "population.sd", "population.variance", "precision", "cv", "resi")

# MCMC settings
nc <- 3
ni <- 1200
nb <- 200
nt <- 2

# Start Gibbs sampler: Run model in WinBUGS and save results in object called out
out <- bugs(data = win.data, inits = inits, parameters.to.save = params, model.file = "model_cv_resi.txt",
n.thin=nt, n.chains=nc, n.burnin=nb, n.iter=ni, debug=TRUE, DIC=TRUE, working.directory=getwd())

# Look at the results
print(out, dig = 3)
> print(out, dig = 3)
Inference for Bugs model at "model_cv_resi.txt", fit using WinBUGS,
 3 chains, each with 1200 iterations (first 200 discarded), n.thin = 2
 n.sims = 1500 iterations saved
                        mean      sd    2.5%     25%      50%      75%    97.5%  Rhat n.eff
population.mean      615.411  12.934 589.795 607.200  615.600  623.700  642.005 1.000  1500
population.sd         37.841  10.796  23.044  30.165   35.890   43.105   64.310 1.002   830
population.variance 1548.392 991.761 531.112 910.050 1288.000 1857.750 4136.287 1.002   830
precision              0.001   0.000   0.000   0.001    0.001    0.001    0.002 1.002   830
cv                     6.153   1.770   3.738   4.890    5.829    7.014   10.557 1.002   820
resi[1]                3.738  12.933 -22.831  -4.577    3.512   11.970   29.357 1.000  1500
resi[2]               25.865  12.933  -0.706  17.550   25.640   34.093   51.482 1.000  1500
resi[3]               13.933  12.933 -12.641   5.618   13.705   22.163   39.552 1.000  1500
resi[4]               25.859  12.933  -0.712  17.547   25.635   34.090   51.477 1.000  1500
resi[5]               15.590  12.933 -10.981   7.275   15.365   23.820   41.212 1.000  1500
resi[6]              -40.275  12.933 -66.846 -48.590  -40.500  -32.047  -14.658 1.000  1500
resi[7]               21.267  12.933  -5.304  12.950   21.040   29.492   46.887 1.000  1500
resi[8]              -68.865  12.933 -95.435 -77.180  -69.090  -60.638  -43.248 1.000  1500
resi[9]              -13.444  12.933 -40.011 -21.760  -13.670   -5.215   12.172 1.000  1500
resi[10]              21.156  12.933  -5.415  12.840   20.930   29.383   46.772 1.000  1500
deviance              99.291   2.485  96.730  97.460   98.535  100.400  106.152 1.002  1400
[...]

# So you might report the results of this analysis as follows: 
"The CV (in %, posterior mean and sd) of body mass was estimated at 6.15 (1.77)."


# To draw a picture of the posterior distribution of the CV of body mass, you could produce a histogram:
hist(out$sims.list$cv, xlab = 'Coefficient of variation (%) of body mass', ylab = 'Frequency',
breaks = 40, las = 1, col = 'grey')

# You could also draw a smoothed version of this using the function density():
plot(density(out$sims.list$cv), xlab = 'Coefficient of variation (%) of body mass', 
ylab = 'Frequency', las = 1, col = 'grey', lwd = 2, main = '')

# To look at residuals you could do this:
plot(out$mean$resi, xlab = 'Observation number', ylab = '', las = 1, lwd = 2, main = '')
abline(h = 0, lwd = 1)

# Remember that in Bayesian inference, every unknown has a (posterior) distribution. So to summarize 
(graphically) the distribution of each residual, you can do this:

index <- matrix(rep(1:10, 1500), ncol = 10, byrow = TRUE)	#...
boxplot(out$sims.list$resi ~ index, col = 'gray', xlab = 'Observation number')
abline(h = 0, lwd = 1)

 
 
 
6. Swiss hare data: Mean density
   -----------------------------

Exercise:
Summarize what the data tell you about the mean hare density. You can use the same BUGS model description as 
before in this chapter and just fit the 'model of the mean' to the variable named 'mean.density'. To get an 
impression of what might be useful ranges for the uniform distributions used to convey our ignorance about 
the mean density, we first summarize the contents of that variable.
summary(hares$mean.density)

Solution:
First, don't forget to read in the data, for instance, like this:
hares <- read.table("F:/a_solutions WB book/hares.txt", header = TRUE)

# Save BUGS description of the model to working directory
sink("model.txt")
cat("
model {

# Priors
 population.mean ~ dunif(0,100)
 precision <- 1 / population.variance
 population.variance <- population.sd * population.sd
 population.sd ~ dunif(0,100)

# Likelihood
 for(i in 1:nobs){
    y[i] ~ dnorm(population.mean, precision)
 }
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(y = hares$mean.density, nobs = length(hares$mean.density)) 


# Function to generate starting values
inits <- function()
  list (population.mean = runif(1, 0,100), population.sd = runif(1, 1, 30))

# Parameters to be monitored (= to estimate)
params <- c("population.mean")

# MCMC settings
nc <- 3		# Number of chains
ni <- 6000	# Number of draws from posterior (for each chain)
nb <- 2000	# Number of draws to discard as burn-in
nt <- 4		# Thinning rate

# Start Gibbs sampler: Run model in WinBUGS and save results in object called out
out <- bugs(data = win.data, inits = inits, parameters.to.save = params, model.file = "model.txt", 
n.thin = nt, n.chains = nc, n.burnin = nb, n.iter = ni, debug = TRUE, DIC = TRUE, working.directory = getwd())

# Look at the results
print(out, dig = 3)

> print(out, dig = 3)
Inference for Bugs model at "model.txt", fit using WinBUGS,
 3 chains, each with 1200 iterations (first 200 discarded), n.thin = 2
 n.sims = 1500 iterations saved
                    mean    sd     2.5%      25%      50%      75%    97.5%  Rhat n.eff
population.mean    4.744 0.145    4.459    4.646    4.748    4.845    5.015 1.001  1500
deviance        3735.790 2.063 3734.000 3734.000 3735.000 3736.000 3741.000 1.002  1500


So, we might report the results of this analysis in the Results section of our paper as follows:
"Mean density of European Brown hares in Switzerland (posterior mean and sd) was estimated at
4.74 (0.15), with a 95% Bayesian credible interval of (4.46, 5.02). "








###############################################################
#                                                             #
#        Chapter 6: Basics of linear models                   #
#                                                             #
###############################################################




1. Fitting a custom-built design matrix
   ------------------------------------
 
Exercise:
Using R (lm() function), we want to fit a 'semi-interactive' ANCOVA model, using a custom-built design 
matrix, where the slope of mass on length is identical in population 1 and 3. For your convenience, here 
are part of the data again:

mass <- c(6, 8, 5, 7, 9, 11)
pop <- factor(c(1,1,2,2,3,3))
svl <- c(40, 45, 39, 50, 52, 57)

fm <- lm(mass ~ pop*svl)			# Fully interactive model which uses up all d.f.
summary(fm)

Solution:
Here is a quick solution to build the custom design matrix:
DM <- model.matrix(~ svl * pop - 1 - svl)	# Build a design matrix to start with
DM[5:6,4] <- DM[5:6,6]				# Write the last two elements in col. 6 into col 4
DM <- DM[,-6]					# Delete column 6

# Fit that custom design matrix (have to subtract intercept to override R's treatment parameterisation):
fm <- lm(mass ~ DM-1)
summary(fm)

This shows again that if you understand design matrices, you have full freedom to fit any 
kind of linear model you want, in R, WinBUGS or other useful stats packages.








###############################################################
#                                                             #
#        Chapter 7: t-tests                                   #
#                                                             #
###############################################################




1. t-test with heterogeneous variances
   ------------------------------------

Exercise:
Adapt the code to directly test for equal variances of wingspan.

Solution:
We simply add a line that computes the difference between male and female wingspan and then monitor that
node, called delta.sigma2 below. The following assumes that you have the data from chapter 7.2.2. 
in your R workspace.

sink("h.ttest.2.txt")
cat("
model {
   # Priors
   mu1 ~ dnorm(0,0.001)
   mu2 ~ dnorm(0,0.001)
   sigma1 ~ dunif(0, 1000)
   sigma2 ~ dunif(0, 1000)

   # Likelihood
   for (i in 1:n1) {
        y1[i] ~ dnorm(mu1, tau1) 
   }

   for (i in 1:n2) {
        y2[i] ~ dnorm(mu2, tau2) 
   }

   # Derived quantities
   delta.mu <- mu2 - mu1
   sigma2.1 <- sigma1 * sigma1
   sigma2.2 <- sigma2 * sigma2
   tau1 <- 1 / (sigma1 * sigma1)
   tau2 <- 1 / (sigma2 * sigma2)
   delta.sigma2 <- sigma2.2 - sigma2.1
}
",fill=TRUE)
sink()

data <- list("y1", "y2", "n1", "n2")

inits <- function(){ list(mu1=rnorm(1), mu2=rnorm(1), sigma1 = rlnorm(1), sigma2 = rlnorm(1))}

parameters <- c("mu1","mu2", "delta.mu", "delta.sigma2", "sigma1", "sigma2", "sigma2.1", "sigma2.2")

# MCMC settings
nc <- 3
ni <- 2000
nb <- 500
nt <- 1

out <- bugs (data, inits, parameters, "h.ttest.2.txt", n.thin=nt,n.chains=nc, n.burnin=nb, n.iter=ni, 
debug = TRUE)

print(out, dig = 3)

# Look at posterior for some of the variance parameters
par(mfrow = c(3,1))
hist(out$sims.list$sigma2.1)
hist(out$sims.list$sigma2.2)
hist(out$sims.list$delta.sigma2)


Since the posterior distribution of the difference between the variance of male and female wingspan overlaps
zero (which means that there is no difference), we don't have much evidence for a sex-related variability
in the wingspan of peregrine falcons




2. Effects of assumption violations on the t-test with homogeneous variances
   -------------------------------------------------------------------------

Exercise:
Simulate data that differ between two groups in the variance, but not in the mean, and see whether this 
screws a t-test that assumes constant variance.

Solution:
We will conduct a little simulation exercise wherein we repeatedly generate and then analyse data sets.
For this, it is convenient to define functions for data generation and analysis.

We first define a function to simulate the data. To make everything as comparable as possible, we 
assume an even sample size.

data.fn <- function(n1 = 50, n2 = 50, mu = 100, sigma1 = 5, sigma2 = 3){
   n1 <- n1					# Number of females
   n2 <- n2					# Number of males
   mu <- mu					# Population mean for both sexes
   sigma1 <- sigma1				# Population SD for females
   sigma2 <- sigma2				# Population SD for males
   n <- n1+n2					# Total sample size
   y1 <- rnorm(n1, mu, sigma1)			# Data for females
   y2 <- rnorm(n2, mu, sigma2)			# Data for males
   y <- c(y1, y2)					# Aggregate both data sets
   x <- rep(c(0,1), c(n1, n2))			# Indicator for male
   boxplot(y ~ x, col = "grey", xlab = "Male", ylab = "Wingspan (cm)", las = 1)
   return(list(x=x, y=y))
}

To define this function, you execute everything between the line starting with data.fn and the closing 
curly bracket. To execute it, you then type 'data.fn().'

Second, we define another function to use WinBUGS to fit the linear model underlying a t-test. Depending
on what magnitude of sigma you choose in the data generation, you will have to adapt the upper limit of
the uniform prior for that parameter:

fit.fn <- function(x = x, y = y) {
# Define BUGS model
sink("ttest.txt")
cat("
model {

# Priors
 mu1 ~ dnorm(0,0.001)
 delta ~ dnorm(0,0.001)
 tau <- 1/ (sigma * sigma)
 sigma ~ dunif(0, 50)

# Likelihood
 for (i in 1:n) {
    y[i] ~ dnorm(mu[i], tau) 
    mu[i] <- mu1 + delta *x[i]
 }

# Derived quantities: one of the greatest things about a Bayesian analysis
 mu2 <- mu1 + delta		# Difference in wingspan
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(x=x, y=y, n = length(x))

# Inits function
inits <- function(){list(mu1=rnorm(1), delta=rnorm(1), sigma = runif(1))}

# Parameters to estimate
params <- c("mu1","mu2", "delta", "sigma")

# MCMC settings
nc <- 3
ni <- 600		
nb <- 100	
nt <- 1

# Start Gibbs sampler
out <- bugs(data = win.data, inits = inits, parameters = params, model = "ttest.txt", 
n.thin = nt, n.chains = nc, n.burnin = nb, n.iter = ni, debug = FALSE, working.directory = getwd())

out
}


And third, we set up structures to contain the estimates from n.iter executions of these two functions and
then run the simulation for n.iter replicates. We will save the summary table from each analysis in a 3D
array.

n.iter <- 50		# The number of simreps
esti.table <- array(NA, dim = c(5,9,n.iter))

Assuming we have run this model once already and have the results in an object called out, we can do the
following to name the two first dimensions of the array.

rownames(esti.table) <-  rownames(out$summary)
colnames(esti.table) <-  colnames(out$summary)

start.time <- Sys.time()
for(i in 1:n.iter){
   print(i)
   data <- data.fn(n1 = 50, n2 = 50, mu = 100, sigma1 = 20, sigma2 = 10)
   fm <- fit.fn(data$x, data$y)
   esti.table[,,i] <- fm$summary
}
end.time = Sys.time()
cat('\nProcessing time: ',round(difftime(end.time,start.time,units='mins'),dig=2),' minutes\n',sep='')

We look at the n.iter simulations
esti.table

Then we summarize results of simulation:
apply(esti.table, c(1,2), mean)

>From this there does not appear to be a great sensitivity to heterogeneous variances. Not sure whether this is generally true, though.




3. Homogenous and heterogeneous-variance test for landuse effects on hare density (Swiss hare data)
   ------------------------------------------------------------------------------------------------

Exercise:
Fit the two t-test models to mean.density in the Swiss hare data set. 

Solution:
To avoid committing pseudo-replication or having to use a mixed model (with a site random effect) we
first average mean.density over years within a site.

MMD <- aggregate(hares$mean.density, by = list(hares$site), FUN = mean, na.rm = TRUE)$x
LANDUSE <- aggregate(as.numeric(hares$landuse), by = list(hares$site), FUN = mean, na.rm = TRUE)$x -1
# LANDUSE = 1 is grass and LANDUSE = 0 is arable

We fit the t-test model with homogenous variance

# Define BUGS model
sink("ttest.txt")
cat("
model {

# Priors
 mu1 ~ dnorm(0,0.001)
 delta ~ dnorm(0,0.001)
 tau <- 1/ (sigma * sigma)
 sigma ~ dunif(0, 10)

# Likelihood
 for (i in 1:n) {
    y[i] ~ dnorm(mu[i], tau) 
    mu[i] <- mu1 + delta *x[i]
 }

# Derived quantities
 mu2 <- mu1 + delta	# Mean density in grassland areas
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(x = LANDUSE, y = MMD, n = length(MMD))

# Inits function
inits <- function(){list(mu1=rnorm(1), delta=rnorm(1), sigma = rlnorm(1))}

# Parameters to estimate
params <- c("mu1","mu2", "delta", "sigma")

# MCMC settings
nc <- 3
ni <- 1200
nb <- 200
nt <- 2

# Start Gibbs sampler
out <- bugs(data = win.data, inits = inits, parameters = params, model = "ttest.txt", n.thin = nt, 
n.chains = nc, n.burnin = nb, n.iter = ni, debug = TRUE, working.directory = getwd())

print(out, dig = 3)
> print(out, dig = 3)
Inference for Bugs model at "ttest.txt", fit using WinBUGS,
 3 chains, each with 1200 iterations (first 200 discarded), n.thin = 2
 n.sims = 1500 iterations saved
            mean    sd    2.5%     25%     50%     75%   97.5%  Rhat n.eff
mu1        4.933 0.493   3.928   4.630   4.933   5.258   5.852 1.003  1500
mu2        2.954 0.786   1.480   2.399   2.949   3.489   4.512 1.003   830
delta     -1.980 0.937  -3.742  -2.627  -2.040  -1.334  -0.115 1.002  1000
sigma      3.165 0.305   2.629   2.942   3.145   3.345   3.836 1.000  1500
deviance 286.239 2.538 283.400 284.400 285.500 287.300 292.352 1.006  1200

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = Dbar-Dhat)
pD = 2.9 and DIC = 289.1
DIC is an estimate of expected predictive error (lower deviance is better).


So the mean density in arable areas (mu1) is estimated to be higher than that in grassland areas (mu2),
with the 95% CRI for the difference in mean density (delta) being (-3.742, -0.115). This analysis 
assumes constant variance between the two types of landuse. 



Next, we fit the model that relaxes this assumption. In addition, we add a line that computes the 
difference in the variance of mean density between the two types of landuse:

# Define BUGS model
sink("h.ttest.txt")
cat("
model {

# Priors
 mu1 ~ dnorm(0,0.001)
 mu2 ~ dnorm(0,0.001)
 tau1 <- 1 / ( sigma1 * sigma1)
 sigma1 ~ dunif(0, 100)
 tau2 <- 1 / ( sigma2 * sigma2)
 sigma2 ~ dunif(0, 100)

# Likelihood
 for (i in 1:n1) {
    y1[i] ~ dnorm(mu1, tau1) 
 }

 for (i in 1:n2) {
    y2[i] ~ dnorm(mu2, tau2) 
 }

# Derived quantities
 delta.mean <- mu2 - mu1
 delta.variance <- (1/tau2) - (1/tau1)
 
}
",fill=TRUE)
sink()

# Bundle data: have to select the right data
win.data <- list(y1 = MMD[LANDUSE == 0], y2 = MMD[LANDUSE == 1], n1 = length(MMD[LANDUSE == 0]),
 n2 = length(MMD[LANDUSE == 1]))

# Inits function
inits <- function(){ list(mu1=rnorm(1), mu2=rnorm(1), sigma1 = rlnorm(1), sigma2 = rlnorm(1))}

# Parameters to estimate
params <- c("mu1","mu2", "delta.mean", "delta.variance", "sigma1", "sigma2")

# MCMC settings
nc <- 3
ni <- 2200
nb <- 200
nt <- 2

# Start Gibbs sampler
out <- bugs(data = win.data, inits = inits, parameters = params, model = "h.ttest.txt", 
n.thin = nt, n.chains = nc, n.burnin = nb, n.iter = ni, debug = TRUE, working.directory = getwd())

print(out, dig = 3)
> print(out, dig = 3)
Inference for Bugs model at "h.ttest.txt", fit using WinBUGS,
 3 chains, each with 2200 iterations (first 200 discarded), n.thin = 2
 n.sims = 3000 iterations saved
                  mean    sd    2.5%     25%     50%     75%   97.5%  Rhat n.eff
mu1              4.921 0.559   3.770   4.559   4.930   5.289   6.036 1.002  1500
mu2              2.968 0.568   1.855   2.599   2.976   3.343   4.059 1.001  3000
delta.mean      -1.953 0.809  -3.529  -2.475  -1.958  -1.427  -0.386 1.001  3000
delta.variance  -7.102 3.935 -15.180  -9.444  -7.005  -4.705   0.545 1.001  3000
sigma1           3.510 0.423   2.797   3.208   3.466   3.764   4.487 1.002  3000
sigma2           2.274 0.475   1.586   1.941   2.196   2.520   3.427 1.001  2400
deviance       282.517 3.167 278.600 280.200 281.800 284.100 290.302 1.003  1100

So there is quite strong evidence for heterogeneous variances of counts in arable and grassland 
sites. The probability that the variance is different is this (for my MCMC sample):

mean(out$sims.list$delta.variance<0)
> mean(out$sims.list$delta.variance<0)
[1] 0.968




4. Modeling the variance
   ---------------------

Exercise:
We model the variance in the counts as a function of the elevation of a site.

Solution:
For this, we also need the elevation of each site. Then, we model a linear effect of elevation on the log
of the variance, that is, as for the mean in a linear model, we specify a linear model for a transformation
of the variance of the model.

MMD <- aggregate(hares$mean.density, by = list(hares$site), FUN = mean, na.rm = TRUE)$x
LANDUSE <- aggregate(as.numeric(hares$landuse), by = list(hares$site), FUN = mean, na.rm = TRUE)$x
# LANDUSE = 1 is arable
ELEV <- aggregate(as.numeric(hares$elevation), by = list(hares$site), FUN = mean, na.rm = TRUE)$x

Missing values in the explanatory variables of a model need special attention. Either we must specify
a prior distribution for them in the model, or we have to 'impute' them. Here, we simply replace NA's
with the mean elevation.

ELEV[ELEV == 'NaN'] <- mean(ELEV, na.rm = TRUE)

We change the notation somewhat to clarify that a linear model is used to explain variation both in the
mean AND in the variance of the response (mean density). Note that we also normalise the elevation 
covariate.


# Define model
sink("model.txt")
cat("
model {

# Priors
 alpha1 ~ dnorm(0,0.001)
 beta1 ~ dnorm(0,0.001)
 alpha2 ~ dnorm(0,0.001)
 beta2 ~ dnorm(0,0.001)

# Likelihood
 for (i in 1:n) {
    y[i] ~ dnorm(mu[i], tau[i])	# Note that both the mean and the variance are indexed by individual
        mu[i] <- alpha1 + beta1 *GRASS[i]
        tau[i] <- 1 / sigma2[i]
        log(sigma2[i]) <- alpha2 + beta2 *ELEV[i]
 }
}
",fill=TRUE)
sink()

# Bundle data
mean.elev <- mean(ELEV)
sd.elev = sd(ELEV)
st.ELEV <- (ELEV - mean.elev) / sd.elev
win.data <- list(y = MMD, GRASS = LANDUSE-1, ELEV = st.ELEV, n = length(MMD))

# Inits function
inits <- function(){list(alpha1=rnorm(1), beta1=rnorm(1), alpha2=rnorm(1), beta2=rnorm(1))}

# Parameters to estimate
params <- c("alpha1","beta1","alpha2","beta2")

# MCMC settings
nc <- 3
ni <- 6000
nb <- 2000
nt <- 2

# Start Gibbs sampler
out <- bugs(data = win.data, inits = inits, parameters = params, model = "model.txt", n.thin = nt, 
n.chains = nc, n.burnin = nb, n.iter = ni, debug = TRUE, working.directory = getwd(), bugs.directory = bd)

print(out, dig = 3)
> print(out, dig = 3)
Inference for Bugs model at "model.txt", fit using WinBUGS,
Inference for Bugs model at "model.txt", fit using WinBUGS,
 3 chains, each with 6000 iterations (first 2000 discarded), n.thin = 2
 n.sims = 6000 iterations saved
            mean    sd    2.5%     25%     50%     75%   97.5%  Rhat n.eff
alpha1     4.889 0.478   3.953   4.567   4.884   5.211   5.825 1.002  1700
beta1     -2.097 0.839  -3.728  -2.653  -2.107  -1.545  -0.416 1.001  6000
alpha2     2.231 0.196   1.876   2.093   2.222   2.361   2.637 1.002  1900
beta2     -0.355 0.211  -0.754  -0.499  -0.368  -0.222   0.089 1.001  5800
deviance 284.032 2.866 280.500 281.900 283.400 285.500 291.200 1.001  6000


So we estimate the mean density in arable areas to be 4.888 and that in grassland areas to be 2.098 less.
In addition, there is a hint of a negative relationship between the variance of the counts and elevation: sites
at greater elevations do appear to be less heterogeneous, as can be seen by the negative sign of beta2.
We can look at the 95% CRI for that parameter and see what proportion of the mass of the 
posterior is on either side of zero:

mean(out$sims.list$beta2 < 0)
> mean(out$sims.list$beta2 < 0)
[1]  0.9436667

So actually, there is quite strong evidence for such a variance-elevation relationship in these data. In
biological terms, this might suggest that higher sites vary less among each other in the suitability of their
habitat (as expressed by observed hare density).







 
###############################################################
#                                                             #
#        Chapter 8: Linear regression                         #
#                                                             #
###############################################################




1. Toy frog problem 
   ----------------

Exercise:
Fit the linear regression model to a toy data set composed of mass-length data of 5 frogs (exercise
courtesy of Oscar Ramos).

Solution:

# The data
mass <- c(10, 20, 23, 32, 35)
length <- c(5,7,10, 12, 15)
n <- length(mass)
plot(length, mass)


# Solution with R
fm <- lm(mass ~ length)
fm
abline(fm)

# Solution with WinBUGS

# Write model
sink("linreg.txt")
cat("
model {

# Priors
 alpha ~ dnorm(0,0.001)
 beta ~ dnorm(0,0.001)
 sigma ~ dunif(0, 100)

# Likelihood
 for (i in 1:n) {
    y[i] ~ dnorm(mu[i], tau) 
    mu[i] <- alpha + beta*x[i]
 }

# Derived quantities
 tau <- 1/ (sigma * sigma)

# Assess model fit using a sums-of-squares-type discrepancy
 for (i in 1:n) {
    residual[i] <- y[i]-mu[i]		# Residuals for observed data
    predicted[i] <- mu[i]		# Predicted values
    sq[i] <- pow(residual[i], 2)	# Squared residuals for observed data

# Generate replicate data and compute fit stats for them
    y.new[i] ~ dnorm(mu[i], tau) # one new data set at each MCMC iteration
    sq.new[i] <- pow(y.new[i]-predicted[i], 2)	# Squared residuals for new data
 }
 fit <- sum(sq[])			# Sum of squared residuals for actual data set
 fit.new <- sum(sq.new[])	# Sum of squared residuals for new data set
 test <- step(fit.new - fit)	# Test whether new data set more extreme, mean is Bayesian p-value
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(x=length, y=mass, n=length(length))

# Inits function
inits <- function(){ list(alpha=rnorm(1), beta=rnorm(1), sigma = rlnorm(1))}

# Parameters to estimate
params <- c("alpha", "beta", "sigma", "fit", "fit.new", "test", "predicted")

# MCMC settings
nc = 3  ;  ni=1200  ;  nb=200  ;  nt=1

# Start Gibbs sampler
out <- bugs(data = win.data, inits = inits, parameters = params, model = "linreg.txt", n.thin = nt, 
n.chains = nc, n.burnin = nb, n.iter = ni, debug = TRUE, bugs.directory="c:/Program files/WinBUGS14/")

print(out, dig = 3)
> print(out, dig = 3)
Inference for Bugs model at "linreg.txt", fit using WinBUGS,
 3 chains, each with 1200 iterations (first 200 discarded)
 n.sims = 3000 iterations saved
                mean       sd    2.5%    25%    50%     75%    97.5%  Rhat n.eff
alpha          0.105    8.496 -17.551 -3.495  0.207   3.907   17.592 1.007  1500
beta           2.435    0.872   0.640  2.076  2.421   2.785    4.144 1.012  3000
sigma          5.978    6.222   1.819  2.956  4.191   6.520   20.843 1.003  2300
fit          158.776 1001.987  25.829 33.367 49.140  95.080  772.495 1.005  1700
fit.new      379.442 2145.839   6.411 32.122 75.090 198.525 2374.349 1.001  2500
test           0.608    0.488   0.000  0.000  1.000   1.000    1.000 1.001  3000
predicted[1]  12.277    5.087   1.726 10.258 12.320  14.430   22.250 1.007   810
predicted[2]  17.147    4.242   9.296 15.630 17.190  18.812   24.691 1.014   620
predicted[3]  24.450    4.191  17.854 23.240 24.490  25.780   30.872 1.029   620
predicted[4]  29.319    4.988  21.258 27.887 29.360  30.853   37.000 1.028   820
predicted[5]  36.623    6.874  24.709 34.400 36.580  38.960   48.220 1.022  1300
deviance      28.546    5.149  22.760 24.760 27.125  30.832   41.840 1.003  1800
[...]

# Add line to plot
lines(length, out$mean$predicted, lwd = 3, col = "red")

Again, ML and Bayesian estimates are virtually identical numerically.




2. Get predictions for frogs with length 16 to 20 length units 
   -----------------------------------------------------------

One simple way to get such predictions is to simply add the five numbers 16:20 to the length data and
fill in the mass data accordingly with missing values. These NA's will then be updated automatically 
as part of the Bayesian model fitting. So we simply change the 'data statement' of the analysis and
repeat the rest. Specifying the response, y, as a node to be monitored by WinBUGS implicitly defines
only the missing elements of y to be treated as unknowns.


# Bundle data
x <- c(5,7,10,12,15, 16, 17, 18, 19, 20)
y <- c(10,20,23,32,35, NA, NA, NA, NA, NA)
n <- length(x)
win.data <- list("x","y", "n")

# Inits function
inits <- function(){ list(alpha=rnorm(1), beta=rnorm(1), sigma = rlnorm(1))}

# Parameters to estimate
params <- c("alpha","beta", "sigma", "fit", "fit.new", "predicted", "y")

# MCMC settings
nc = 3  ;  ni=1200  ;  nb=200  ;  nt=1

# Start Gibbs sampler
out <- bugs(data = win.data, inits = inits, parameters = params, model = "linreg.txt", 
n.thin = nt, n.chains = nc, n.burnin = nb, n.iter = ni, debug = TRUE, 
bugs.directory="c:/Program files/WinBUGS14/")

print(out, dig = 3)
> print(out, dig = 3)
Inference for Bugs model at "linreg.txt", fit using WinBUGS,
 3 chains, each with 1200 iterations (first 200 discarded)
 n.sims = 3000 iterations saved
                 mean       sd    2.5%    25%     50%     75%    97.5%  Rhat n.eff
alpha           0.105    7.827 -16.713 -3.485   0.261   4.030   15.761 1.002  1200
beta            2.438    0.775   0.866  2.068   2.418   2.798    4.083 1.002  1200
sigma           5.674    4.494   1.840  2.993   4.252   6.676   18.020 1.002  1000
fit           372.312  832.299  38.129 75.060 139.050 317.150 2414.349 1.002  1200
fit.new       506.481 1218.433  22.458 77.260 167.800 421.400 3463.299 1.002  1100
predicted[1]   12.295    4.596   2.729 10.200  12.310  14.460   21.860 1.003  1600
predicted[2]   17.171    3.672   9.746 15.540  17.155  18.793   24.921 1.003  2500
predicted[3]   24.486    3.323  17.929 23.210  24.480  25.790   30.770 1.001  3000
predicted[4]   29.362    3.927  21.409 27.777  29.325  30.850   37.190 1.002  3000
predicted[5]   36.676    5.568  25.495 34.388  36.650  38.972   48.291 1.003  2400
predicted[6]   39.114    6.213  26.359 36.500  39.080  41.690   52.030 1.003  2100
predicted[7]   41.552    6.884  27.339 38.638  41.485  44.460   55.982 1.003  1900
predicted[8]   43.990    7.576  28.295 40.785  43.900  47.270   59.830 1.003  1800
predicted[9]   46.428    8.282  29.248 42.850  46.325  50.073   63.655 1.003  1700
predicted[10]  48.866    9.000  30.308 44.895  48.745  52.840   67.902 1.003  1600
y[6]           39.307    9.089  20.129 35.278  39.270  43.133   58.311 1.001  3000
y[7]           41.713    9.129  23.748 37.267  41.450  45.790   61.744 1.003  2400
y[8]           43.857   10.453  22.055 39.545  43.800  48.140   64.451 1.001  3000
y[9]           46.578   11.401  25.187 41.818  46.245  51.103   69.991 1.001  3000
y[10]          48.920   11.661  25.336 43.977  48.795  53.700   73.092 1.002  2200
deviance       28.528    4.766  22.750 24.867  27.340  31.070   40.420 1.004   620


We see that the means of predicted[6:10] are the same as y[6:10], at least up to MCMC accuracy, but 
that the posterior sd's differ. This is because the predicted values refer to the expected mass for 
a frog of given length, while the y refer to the prediction of the mass of an individual frog, with 
which there is more uncertainty associated.




3. Regression of mean density on year in grassland areas (Swiss hare data)
   -----------------------------------------------------------------------

hares <- read.table("F:\\a_solutions WB book\\hares.txt", header = TRUE)

First we have to aggregate the data over years and restrict them to the grassland sites.

grass.data <- hares[hares$landuse == "grass",]
MMD <- aggregate(grass.data$mean.density, by = list(grass.data$year), FUN = mean, na.rm = TRUE)$x
year <- 1992:2008

Then we plot the data:
plot(year, MMD, type = "b", lwd = 3, col = "blue", main = "Mean observed density in grassland areas")

We fit fit first a linear and second a quadratic regression model using WinBUGS:


# Write model
sink("linreg.txt")
cat("
model {

# Priors
 alpha ~ dnorm(0,0.001)
 beta ~ dnorm(0,0.001)
 sigma ~ dunif(0, 100)
 tau <- 1/ (sigma * sigma)

# Likelihood
 for (i in 1:n) {
    y[i] ~ dnorm(mu[i], tau) 
    mu[i] <- alpha + beta*x[i]
    predicted.linear[i] <- mu[i]
 }
}
",fill=TRUE)
sink()

# Bundle data
# win.data <- list(x = year, y=MMD, n=length(MMD))  # Failure to standardise year leads to spurious convergence !
win.data <- list(x = ((year-mean(year)) / sd(year)), y=MMD, n=length(MMD))

# Inits function
inits <- function(){ list(alpha=rnorm(1), beta=rnorm(1), sigma = rlnorm(1))}

# Parameters to estimate
params <- c("alpha", "beta", "sigma", "predicted.linear")

# MCMC settings
nc = 3  ;  ni=1200  ;  nb=200  ;  nt=1

# Start Gibbs sampler
out <- bugs(data = win.data, inits = inits, parameters = params, model = "linreg.txt", n.thin = nt, 
n.chains = nc, n.burnin = nb, n.iter = ni, debug = TRUE, bugs.directory="c:/Program files/WinBUGS14/")

print(out, dig = 3)

We then plot the predicted mean density under the linear (straight-line) model:

lines(year, out$mean$predicted.linear, lwd = 3, col = "red")



Next, the quadratic (curvilinear) model:

# Write model
sink("quadreg.txt")
cat("
model {

# Priors
 alpha ~ dnorm(0,0.001)
 beta ~ dnorm(0,0.001)
 gamma ~ dnorm(0, 0.001)
 sigma ~ dunif(0, 100)
 tau <- 1/ (sigma * sigma)

# Likelihood
 for (i in 1:n) {
    y[i] ~ dnorm(mu[i], tau) 
    mu[i] <- alpha + beta*x[i] + gamma * pow(x[i], 2)
    predicted.quadratic[i] <- mu[i]
 }
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(x=((year-mean(year)) / sd(year)), y=MMD, n=length(MMD))

# Inits function
inits <- function(){ list(alpha=rnorm(1), beta=rnorm(1), sigma = rlnorm(1))}

# Parameters to estimate
params <- c("alpha", "beta", "gamma", "sigma", "predicted.quadratic")

# MCMC settings
nc = 3  ;  ni=1200  ;  nb=200  ;  nt=1

# Start Gibbs sampler
out <- bugs(data = win.data, inits = inits, parameters = params, model = "quadreg.txt", n.thin = nt, 
n.chains = nc, n.burnin = nb, n.iter = ni, debug = TRUE, bugs.directory="c:/Program files/WinBUGS14/")

print(out, dig = 3)

We then plot the predicted mean density under the quadratic model:
lines(year, out$mean$predicted.quadratic, lwd = 3, col = "brown")

There is a strong visual impression that the population decline of the Brown hare in grassland sites 
did not evolve in a straight-line fashion, but that there was a decline, followed by a stabilisation 
or even slight recovery. We can test for the quadratic effect in the trend by looking at the 
posterior distribution of the gamma parameter

hist(out$sims.list$gamma, breaks = 40, col = "grey")

The graph and the summary (above) show that zero is a very untypical value for gamma, so that we can 
assume that the population decline did not evolve linearly.

Note: This exercise provides another example for WinBUGS erroneously indicating convergence: this 
happened to me first when I forgot to normalise year in the analysis of the model with a linear 
effect of year, and all the Rhat values were close to one. It was only when I plotted the estimated 
regression line into the observed data that I noticed that something was wrong with the solution 
that WinBUGS gave me.








###############################################################
#                                                             #
#        Chapter 9: One-way ANOVA                             #
#                                                             #
###############################################################




1. Convert the ANOVA to an ANCOVA 
   ------------------------------

We want to add a single continuous explanatory variate into the (fixed-effects) one-way ANOVA. This 
transforms our model to what is often called an ANCOVA model. We won't bother with recreating the 
data to build in an effect of that covariate but just invent it and fit it to show how this is done. 
I assume that you have the data set in your R workspace already (otherwise, go and take the code to 
recreate one).

ngroups <- 5			# Number of populations
nsample <- 10			# Number of snakes in each
pop.means <- c(50, 40, 45, 55, 60) # Population mean SVL
sigma <- 3			# Residual sd
eps <- rnorm(ngroups * nsample, 0, sigma)	# Residuals 
x <- rep(1:5, rep(nsample, ngroups)) # Indicator for population
X <- as.matrix(model.matrix(~ as.factor(x)-1)) # Create design matrix
y <- as.numeric(X %*% as.matrix(pop.means) + eps)
boxplot(y~x, col="grey", xlab="Population", ylab="SVL", main="", las = 1)

hab <- rnorm(50)	# This is the covariate


# Write model
sink("ancova.txt")		# Change model name (optional)
cat("
model {
   # Priors
   for (i in 1:5){
      alpha[i] ~ dnorm(0, 0.001)
   }
   beta ~ dnorm(0, 0.01)	# Add prior for coefficient of hab
   sigma ~ dunif(0, 100)

   # Likelihood
   for (i in 1:50) {
        y[i] ~ dnorm(mean[i], tau) 
        mean[i] <- alpha[x[i]] + beta * hab[i]	# Adapt linear predictor
   }
tau <- 1/(sigma*sigma)

}
",fill=TRUE)
sink()

# Package data
win.data <- list("y", "x", "hab")	# Need to add hab to packaged data

# Define function to randomly choose inits	# Add hab covariate
inits <- function(){ list(alpha = rnorm(5, mean = mean(y)), beta = rnorm(1), sigma = rlnorm(1) )}

# Select parameters you want to estimate
params <- c("alpha", "sigma", "beta")

# MCMC settings
ni <- 1200
nb <- 200
nt <- 2
nc <- 3

# Start Gibbs sampling
out <- bugs (win.data, inits, params, "ancova.txt", n.thin=nt, n.chains=nc, n.burnin=nb, n.iter=ni, 
debug = TRUE)

print(out, dig = 3) 




2. Watch shrinkage happen 
   ----------------------

Exercise:
Fit both a fixed- and a random-effects ANOVA to the same data set and compare the group mean estimates.

Solution:
We take the data just generated in exercise 1 and fit the following model:

# Write fixed-effects ANOVA model
sink("fixefanova.txt")
cat("
model {
   # Priors
   for (i in 1:5){
      alpha[i] ~ dnorm(0, 0.001)
   }
   sigma ~ dunif(0, 100)

   # Likelihood
   for (i in 1:50) {
        y[i] ~ dnorm(mean[i], tau) 
        mean[i] <- alpha[x[i]]
   }
tau <- 1/(sigma*sigma)
}
",fill=TRUE)
sink()

# Package data
win.data <- list("y", "x")

# Inits function
inits <- function(){ list(alpha = rnorm(5, mean = mean(y)), sigma = rlnorm(1) )}

# Select parameters you want to estimate
params <- c("alpha", "sigma")

# MCMC settings
ni <- 1200   ;    nb <- 200   ;    nt <- 2    ;    nc <- 3

# Start Gibbs sampling
out.fix <- bugs (win.data, inits, params, "fixefanova.txt", n.thin=nt, n.chains=nc, n.burnin=nb, 
n.iter=ni, debug = TRUE)

print(out.fix, dig = 3) 

# Add fixed-effects group mean estimates into the plot
points(1:5, out.fix$mean$alpha, col = 'blue', cex = 2, lwd = 3)
abline(h = 50)




# Write random-effects ANOVA model
sink("ranefanova.txt")
cat("
model {
for (i in 1:5){
    pop.mean[i] ~ dnorm(mu, tau.group) # Prior for population means
    effe[i] <- pop.mean[i] - mu # Population effects as derived quant's
 }
 mu ~ dnorm(0,0.001)		# Hyperprior for grand mean svl
 sigma.group ~ dunif(0, 10)	# Hyperprior for sd of population effects
 sigma.res ~ dunif(0, 15)	# Prior for residual sd

   # Likelihood
   for (i in 1:50) {
        y[i] ~ dnorm(mean[i], tau.res) 
        mean[i] <- pop.mean[x[i]]
   }
 tau.group <- 1 / (sigma.group * sigma.group)
 tau.res <- 1 / (sigma.res * sigma.res)
}
",fill=TRUE)
sink()

# Package data
win.data <- list("y", "x")

# Inits function
inits <- function(){ list(pop.mean = rnorm(5, mean = mean(y)), sigma.res = rlnorm(1) )}

# Select parameters you want to estimate
params <- c("pop.mean", "sigma.res", "sigma.group")

# MCMC settings
ni <- 1200   ;    nb <- 200   ;    nt <- 2    ;    nc <- 3

# Start Gibbs sampling
out.ran <- bugs (win.data, inits, params, "ranefanova.txt", n.thin=nt, n.chains=nc, n.burnin=nb, 
n.iter=ni, debug = TRUE)

print(out.ran, dig = 3) 

# Add random-effects group mean estimates into the plot
points(1:5, out.ran$mean$pop.mean, col = 'red', cex = 2, lwd = 3, pch = 20)


So we see that the random-effects estimates are very slightly pulled in towards the grand mean. That is, 
they are 'shrunk' to the mean of 50.



We will observe more shrinkage by discarding most of the information in one group. That way, 
we will induce much more shrinkage in that group. We will turn the measurements of 8 snakes in 
population 5 into NA's.  Note that you won't see much shrinkage if you do this for population 1, 
whose mean is very close to the grand mean anyway.

yy <- y
yy[42:50] <- NA
boxplot(y~x, col="grey", xlab="Population", ylab="SVL", main="", las = 1)


# Fixed-effects ANOVA
# Package data
win.data <- list(y = yy, x = x)

# Inits function
inits <- function(){ list(alpha = rnorm(5, mean = mean(y)), sigma = rlnorm(1) )}

# Select parameters you want to estimate
params <- c("alpha", "sigma")

# MCMC settings
ni <- 1200   ;    nb <- 200   ;    nt <- 2    ;    nc <- 3

# Start Gibbs sampling
out.fix <- bugs (win.data, inits, params, "fixefanova.txt", n.thin=nt, n.chains=nc, n.burnin=nb, 
n.iter=ni, debug = FALSE)

print(out.fix, dig = 3) 

# Add fixed-effects group mean estimates into the plot
points(1:5, out.fix$mean$alpha, col = 'blue', cex = 2, lwd = 3)
abline(h=50)


# Random-effects ANOVA

# Inits function
inits <- function(){ list(pop.mean = rnorm(5, mean = mean(y)), sigma.res = rlnorm(1) )}

# Select parameters you want to estimate
params <- c("pop.mean", "sigma.res", "sigma.group")

# MCMC settings
ni <- 1200   ;    nb <- 200   ;    nt <- 2    ;    nc <- 3

# Start Gibbs sampling
out.ran <- bugs (win.data, inits, params, "ranefanova.txt", n.thin=nt, n.chains=nc, n.burnin=nb, 
n.iter=ni, debug = FALSE)

print(out.ran, dig = 3) 

# Add random-effects group mean estimates into the plot
points(1:5, out.ran$mean$pop.mean, col = 'red', cex = 2, lwd = 3, pch = 20)

In random-effects models, there is more shrinkage for groups about which we have less information. 
So in this case, the estimate of the mean of the last group was more shrunk than that for the other groups.




3. One-way ANOVA for mean density (Swiss hares) 
   -------------------------------------------

Exercise:
Fit both a fixed- and a random-effects ANOVA to the mean observed counts, treating years as 
replicates. Note: there are 17 years.

Exercise:
boxplot(hares$mean.density~hares$year, col="grey", xlab="Year", ylab="Mean counts", main="", las = 1)
abline(h = mean(hares$mean.density, na.rm = TRUE))

# Write fixed-effects ANOVA model
sink("fixefanova.txt")
cat("
model {
   # Priors
   for (i in 1:17){
      alpha[i] ~ dnorm(0, 0.001)
   }
   sigma ~ dunif(0, 100)

   # Likelihood
   for (i in 1:n) {
        y[i] ~ dnorm(mean[i], tau) 
        mean[i] <- alpha[x[i]]
   }
tau <- 1/(sigma*sigma)
}
",fill=TRUE)
sink()

# Package data
win.data <- list(y=hares$mean.density , x= as.numeric(factor(hares$year)), 
n = length(hares$mean.density))

# Inits function
inits <- function(){ list(alpha = rnorm(17, mean = mean(y)), sigma = rlnorm(1) )}

# Select parameters you want to estimate
params <- c("alpha", "sigma")

# MCMC settings
ni <- 1200   ;    nb <- 200   ;    nt <- 2    ;    nc <- 3

# Start Gibbs sampling
out.fix <- bugs (win.data, inits, params, "fixefanova.txt", n.thin=nt, n.chains=nc, n.burnin=nb, 
n.iter=ni, debug = TRUE)

print(out.fix, dig = 3) 

# Add fixed-effects group mean estimates into the plot
points(1:17, out.fix$mean$alpha, col = 'blue', cex = 2, lwd = 3)

# Note: the estimated means are always greater than the medians, shown by the boxplots. This is 
because the counts are skewed.




# Write random-effects ANOVA model
sink("ranefanova.txt")
cat("
model {
for (i in 1:17){
    pop.mean[i] ~ dnorm(mu, tau.group) # Prior for population means
    effe[i] <- pop.mean[i] - mu # Population effects as derived quant's
 }
 mu ~ dnorm(0,0.001)		# Hyperprior for grand mean svl
 sigma.group ~ dunif(0, 10)	# Hyperprior for sd of population effects
 sigma.res ~ dunif(0, 15)	# Prior for residual sd

   # Likelihood
   for (i in 1:n) {
        y[i] ~ dnorm(mean[i], tau.res) 
        mean[i] <- pop.mean[x[i]]
   }
 tau.group <- 1 / (sigma.group * sigma.group)
 tau.res <- 1 / (sigma.res * sigma.res)
}
",fill=TRUE)
sink()

# Package data
win.data <- list(y=hares$mean.density , x= as.numeric(factor(hares$year)), 
n = length(hares$mean.density))

# Inits function
inits <- function(){ list(pop.mean = rnorm(17, mean = mean(y)), sigma.res = rlnorm(1) )}

# Select parameters you want to estimate
params <- c("pop.mean", "sigma.res", "sigma.group")

# MCMC settings
ni <- 1200   ;    nb <- 200   ;    nt <- 2    ;    nc <- 3

# Start Gibbs sampling
out.ran <- bugs (win.data, inits, params, "ranefanova.txt", n.thin=nt, n.chains=nc, n.burnin=nb, 
n.iter=ni, debug = TRUE)

print(out.ran, dig = 3) 

# Add random-effects group mean estimates into the plot
points(1:17, out.ran$mean$pop.mean, col = 'red', cex = 2, lwd = 3, pch = 20)


Again, we clearly see the shrinkage towards the grand mean, shown by the thin line.








###############################################################
#                                                             #
#        Chapter 10: Two-way ANOVA                            #
#                                                             #
###############################################################




1. Fit the two-way ANOVA with interaction (snake toy data set)
   -----------------------------------------------------------

We fit a two-way ANOVA with interaction to the snake toy data set (chapter 6). Here are the data again:
mass <- c(6, 8, 5, 7, 9, 11)
pop <- factor(c(1,1,2,2,3,3))
region <- factor(c(1,1,1,1,2,2))
hab <- factor(c(1,2,3,1,2,3))
svl <- c(40, 45, 39, 50, 52, 57)
We want to fit a model like the one specified in R by typing lm(mass ~ pop * hab)

We may start with the code in chapter 10.5.2.


# Write model
sink("model.txt")
cat("
model {

# Priors
 for (i in 1:2){
    for(j in 1:3) {
       group.mean[i,j] ~ dnorm(0, 0.0001)
    }
 }
 sigma ~ dunif(0, 100)

# Likelihood
 for (i in 1:6) {
    mass[i] ~ dnorm(mean[i], tau) 
    mean[i] <- group.mean[region[i], hab[i]]
 }

# Derived quantities
 tau <- 1 / ( sigma * sigma)
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(mass=mass, hab = as.numeric(hab), region = as.numeric(region))

# Inits function
inits <- function(){list(sigma = rlnorm(1) )}

# Parameters to estimate
params <- c("group.mean", "sigma")

# MCMC settings
ni <- 6000
nb <- 2000
nt <- 2
nc <- 3

# Start Gibbs sampling
out <- bugs(win.data, inits, params, "model.txt", n.thin=nt, n.chains=nc, n.burnin=nb, n.iter=ni, 
debug = TRUE)

# Print estimates
print(out, dig = 3)
> print(out, dig = 3)
Inference for Bugs model at "model.txt", fit using WinBUGS,
 3 chains, each with 6000 iterations (first 2000 discarded), n.thin = 2
 n.sims = 6000 iterations saved
                  mean      sd     2.5%     25%    50%    75%   97.5%  Rhat n.eff
group.mean[1,1]  6.262  14.632  -27.362   4.089  6.466  8.663  38.055 1.009  5300
group.mean[1,2]  7.666  19.216  -37.341   4.618  7.992 11.060  49.904 1.002  4400
group.mean[1,3]  4.986  19.469  -40.871   1.823  5.037  8.178  47.974 1.003  6000
group.mean[2,1]  2.014 100.338 -187.507 -66.692  0.953 70.670 199.107 1.001  5000
group.mean[2,2]  8.528  19.195  -35.532   5.553  8.994 12.102  48.761 1.004  6000
group.mean[2,3] 10.497  20.405  -34.370   7.439 10.920 13.832  54.461 1.005  6000
sigma           13.282  17.854    0.520   1.941  5.667 16.552  67.890 1.008   280
deviance        37.076  16.149    9.026  24.250 36.745 49.740  66.010 1.006   360

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = Dbar-Dhat)
pD = -5.0 and DIC = 32.1
DIC is an estimate of expected predictive error (lower deviance is better).

Non-identifiability in a Bayesian analysis is perhaps slightly less easily diagnosed than in a 
classical analysis, since there is always some information about a parameter: if there is none in 
the data, then it comes from the prior. In our analysis, we have no information on snakes in habitat 
1 in region 2, hence the posterior distribution for the associated parameter is determined by the 
prior. We recognize that by the huge range of the posterior of group.mean[2,1].




2. Fit the two-way ANOVA with interaction (Swiss hare data)
   --------------------------------------------------------

Exercise:
Fit the model corresponding to what we would specify in R as lm(mean.density ~ landuse * region)

Solution:
We may again start with the code in chapter 10.5.2. We note that we have 8 regions and two landuse 
classes. We look at the data first:

attach(hares)
aggregate(mean.density, by = list(region, landuse), FUN = mean, na.rm = TRUE)
table(region, landuse)

So there are no grassland sites in three of the regions: CH.N, CH.SW and CH.W.

As a simple way to check whether the landuse effects are the same in all regions, we compute a 
derived variable called diff.density and monitor that.


# Write model
sink("model.txt")
cat("
model {

# Priors
 for (i in 1:2){
    for(j in 1:8) {
       group.mean[i,j] ~ dnorm(0, 0.0001)
    }
 }
 sigma ~ dunif(0, 100)

# Likelihood
 for (i in 1:n) {
    mean.density[i] ~ dnorm(mean[i], tau) 
    mean[i] <- group.mean[landuse[i], region[i]]
 }

# Derived quantities
 tau <- 1 / ( sigma * sigma)
 
 for(j in 1:8){
    diff.density[j] <- group.mean[1,j] - group.mean[2,j]
 }
 
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(mean.density=mean.density, region = as.numeric(region), landuse = as.numeric(landuse),
n = length(mean.density))

# Inits function
inits <- function(){list(sigma = rlnorm(1) )}

# Parameters to estimate
params <- c("group.mean", "sigma", "diff.density")

# MCMC settings
ni <- 6000
nb <- 2000
nt <- 2
nc <- 3

# Start Gibbs sampling
out <- bugs(win.data, inits, params, "model.txt", n.thin=nt, n.chains=nc, n.burnin=nb, n.iter=ni, 
debug = TRUE)

# Print estimates
print(out, dig = 3)
> print(out, dig = 3)
Inference for Bugs model at "model.txt", fit using WinBUGS,
 3 chains, each with 6000 iterations (first 2000 discarded), n.thin = 2
 n.sims = 6000 iterations saved
                    mean      sd     2.5%      25%      50%      75%    97.5%  Rhat n.eff
group.mean[1,1]    5.188   0.293    4.611    4.992    5.189    5.384    5.749 1.002  1800
group.mean[1,2]    3.136   0.908    1.366    2.533    3.128    3.749    4.949 1.001  6000
group.mean[1,3]    2.600   0.448    1.737    2.297    2.591    2.898    3.503 1.001  6000
group.mean[1,4]    3.232   0.720    1.792    2.750    3.226    3.724    4.628 1.001  2900
group.mean[1,5]    5.251   0.385    4.501    4.990    5.251    5.508    6.009 1.001  6000
group.mean[1,6]    9.625   0.406    8.839    9.349    9.623    9.901   10.420 1.001  6000
group.mean[1,7]    5.127   0.325    4.496    4.907    5.128    5.338    5.791 1.001  6000
group.mean[1,8]    4.464   0.673    3.180    3.999    4.451    4.917    5.787 1.001  6000
group.mean[2,1]    2.446   0.789    0.869    1.924    2.446    2.973    4.000 1.001  6000
group.mean[2,2]    2.863   0.653    1.572    2.430    2.864    3.286    4.136 1.001  6000
group.mean[2,3]    2.627   0.530    1.600    2.271    2.627    2.982    3.655 1.001  6000
group.mean[2,4]    3.819   0.335    3.159    3.592    3.818    4.050    4.477 1.001  6000
group.mean[2,5]    1.545 100.131 -196.305  -64.917    3.191   68.343  195.602 1.001  4200
group.mean[2,6]   -0.034  99.123 -191.400  -67.010   -0.792   67.582  194.020 1.001  3700
group.mean[2,7]    1.294 100.715 -198.205  -66.855    0.496   67.867  195.005 1.001  6000
group.mean[2,8]    0.557   1.263   -1.938   -0.310    0.548    1.395    3.025 1.001  3900
sigma              3.322   0.091    3.149    3.260    3.320    3.382    3.508 1.001  3200
diff.density[1]    2.742   0.839    1.101    2.174    2.756    3.309    4.398 1.001  6000
diff.density[2]    0.274   1.112   -1.912   -0.452    0.278    1.029    2.449 1.001  6000
diff.density[3]   -0.027   0.691   -1.371   -0.496   -0.034    0.444    1.323 1.001  6000
diff.density[4]   -0.587   0.797   -2.149   -1.123   -0.599   -0.051    0.966 1.001  3500
diff.density[5]    3.706 100.138 -190.305  -63.150    2.153   70.190  201.310 1.001  4200
diff.density[6]    9.659  99.138 -184.700  -57.932   10.255   76.825  201.105 1.001  3700
diff.density[7]    3.832 100.706 -189.905  -62.553    4.543   71.830  203.405 1.001  6000
diff.density[8]    3.907   1.447    1.080    2.936    3.918    4.872    6.762 1.001  4300
deviance        3545.099   5.291 3537.000 3541.000 3545.000 3548.000 3557.000 1.001  3200

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = Dbar-Dhat)
pD = 14.0 and DIC = 3559.1
DIC is an estimate of expected predictive error (lower deviance is better).


We see that the estimates for the primary parameters, for which there is no information in the data, 
the group.means for [2,5], [2,6] and [2,7] are totally influenced by their priors (see huge posterior 
sd's and 95% credible intervals). As an aside, we also  see that the effective number of parameters,
pD, is estimated correctly, because we estimate 13 mean parameters and a variance.

In addition, the difference in mean density between the two landuse types in the eight regions seems to 
differ quite a bit, as we can see from the estimates of the diff.density nodes. (Those for which there 
is no information for both arable and grassland again have huge posterior sd's and 95% CRI's). 








###############################################################
#                                                             #
#        Chapter 11: General linear model (ANCOVA)            #
#                                                             #
###############################################################




1. Probability of a parameter
   --------------------------

Exercise:
What's the probability that the mass-length slope is inferior in the Jura than in the Massif Central ?

Solution:
The derived quantity named "test1" provides the answer for this question since, this is exactly the 
contrast between the slope in the Jura mountains and that in the Massif Central. So the posterior 
distribution for that parameter gives us the answer. We plot the posterior distribution first:

hist(out$sims.list$test1, col = "gray")
abline(v = 0, lwd = 3, col = "red")

A fairly large part of the area of the posterior distribution lies to the left of the red line. This 
part corresponds to the probability that the slope is smaller in the Jura than in the Massif Central.
To get a numerical estimate of that probability, we can do the following. Note how useful it is in R 
to  be able to do calculations on logical values, where TRUE is interpreted as 1 and FALSE as 0.

mean(out$sims.list$test1 < 0)
> mean(out$sims.list$test1 < 0)
[1] 0.976

Another possibility would be to use the step function within WinBUGS.

NOTE: As always, your solutions will be numerically different, because you have a slightly different
data set and also, your MCMC samples are different from mine.




2. 'Neighbouring' models
   -------------------

It is very insightful to practice "going left and right" around a given model, i.e., go one step more
or less complex. Here, we reduce the ANCOVA with heterogeneous slopes first to an ANCOVA with 
homogeneous slopes and then to a simple linear regression. Try to do this without looking at the 
code in the book.

(b) First we make the slopes the same over all groups. This is done by removing the index of beta.

# Define model
sink("lm.txt")
cat("
model {

# Priors
 for (i in 1:n.group){		
    alpha[i] ~ dnorm(0, 0.001)	# Intercepts
 }
 beta ~ dnorm(0, 0.001)		# There is no only a single slope, so take that out of the loop
 sigma ~ dunif(0, 100)		# Residual standard deviation
 tau <- 1 / ( sigma * sigma)

# Likelihood
 for (i in 1:n) {
    mass[i] ~ dnorm(mu[i], tau)
    mu[i] <- alpha[pop[i]] + beta * length[i]  # remove the index of beta
 }
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(mass = as.numeric(mass), pop = as.numeric(pop), length = as.numeric(scale(length)), 
n.group = max(as.numeric(pop)), n = n)

# Inits function  # Also note that only a single init for beta now
inits <- function(){ list(alpha = rnorm(n.groups, 0, 2), beta = rnorm(1, 1, 1), sigma = rlnorm(1))}

# Parameters to estimate
parameters <- c("alpha", "beta", "sigma")	# No change here, since WB interprets beta as a scalar now

# MCMC settings
ni <- 1200
nb <- 200
nt <- 2
nc <- 3

# Start Markov chains
out <- bugs(win.data, inits, parameters, "lm.txt", n.thin=nt, n.chains=nc, n.burnin=nb, n.iter=ni, 
debug = TRUE)




(b) Next, we also make the intercept identical between groups. This brings us back to the simple 
linear regression. 

# Define model
sink("lm.txt")
cat("
model {

# Priors
 alpha ~ dnorm(0, 0.001)	# Intercept is also made a scalar now, no loop anymore
 beta ~ dnorm(0, 0.001)
 sigma ~ dunif(0, 100)
 tau <- 1 / ( sigma * sigma)

# Likelihood
 for (i in 1:n) {
    mass[i] ~ dnorm(mu[i], tau)
    mu[i] <- alpha + beta* length[i]  # no index on either alpha or beta
 }
}
",fill=TRUE)
sink()

# Bundle data # pop not needed anymore. WB will crash when provided with data for pop. Also, n.pop obsolete.
win.data <- list(mass = as.numeric(mass), length = as.numeric(scale(length)), n = n)

# Inits function  # Single init for alpha and for beta now
inits <- function(){ list(alpha = rnorm(1, 0, 2), beta = rnorm(1, 1, 1), sigma = rlnorm(1))}

# Parameters to estimate
parameters <- c("alpha", "beta", "sigma")	# No change here

# MCMC settings
ni <- 1200
nb <- 200
nt <- 2
nc <- 3

# Start Markov chains
out <- bugs(win.data, inits, parameters, "lm.txt", n.thin=nt, n.chains=nc, n.burnin=nb, n.iter=ni, 
debug = TRUE)




3. Add a quadratic term to the ANCOVA
   ----------------------------------

We have to decide whether we assume that the quadratic effect of length is to be the same in all three 
populations or not. Here we assume that it is different in all three populations.

# Define model
sink("lm.txt")
cat("
model {

# Priors
 for (i in 1:n.group){		
    alpha[i] ~ dnorm(0, 0.001)	# Intercepts
    beta1[i] ~ dnorm(0, 0.001)	# Linear effects
    beta2[i] ~ dnorm(0, 0.001)	# Quadratic effects of length
   
 }
 sigma ~ dunif(0, 100)		# Residual standard deviation
 tau <- 1 / ( sigma * sigma)

# Likelihood
 for (i in 1:n) {
    mass[i] ~ dnorm(mu[i], tau)
    mu[i] <- alpha[pop[i]] + beta1[pop[i]]* length[i] + beta2[pop[i]]* pow(length[i],2)
 }
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(mass = as.numeric(mass), pop = as.numeric(pop), length = as.numeric(scale(length)), 
n.group = max(as.numeric(pop)), n = n)

# Inits function
inits <- function(){ list(alpha = rnorm(n.groups, 0, 2), beta1 = rnorm(n.groups, 1, 1), 
beta2 = rnorm(n.groups, 1, 1), sigma = rlnorm(1))}

# Parameters to estimate
parameters <- c("alpha", "beta1", "beta2", "sigma")

# MCMC settings
ni <- 1200
nb <- 200
nt <- 2
nc <- 3

# Start Markov chains
out <- bugs(win.data, inits, parameters, "lm.txt", n.thin=nt, n.chains=nc, n.burnin=nb, n.iter=ni, 
debug = TRUE)

print(out, dig = 3)




4. ANCOVA pop*year for mean.density (Swiss hare data)
   --------------------------------------------------

Exercise:
Fit a pop * year model to mean density

Solution:

# Define model
sink("lm.txt")
cat("
model {

# Priors
 for (i in 1:n.group){		
    alpha[i] ~ dnorm(0, 0.001)	# Intercepts
    beta1[i] ~ dnorm(0, 0.001)	# Linear effects
   
 }
 sigma ~ dunif(0, 100)		# Residual standard deviation
 tau <- 1 / ( sigma * sigma)

# Likelihood
 for (i in 1:n) {
    mean.density[i] ~ dnorm(mu[i], tau)
    mu[i] <- alpha[pop[i]] + beta1[pop[i]]* year[i]
    residual[i] <- mean.density[i] - mu[i]
 }
}
",fill=TRUE)
sink()

# Bundle data
attach(hares)
win.data <- list(mean.density = mean.density, pop = as.numeric(site), year = as.numeric(scale(year)),
n.group = max(as.numeric(site)), n = length(mean.density))



# Inits function
inits <- function(){ list(alpha = rnorm(56, 0, 2),  beta1 = rnorm(56, 1, 1), sigma = rlnorm(1))}

# Parameters to estimate
parameters <- c("alpha", "beta1", "sigma", "residual")

# MCMC settings
ni <- 1200
nb <- 200
nt <- 2
nc <- 3

# Start Markov chains
out <- bugs(win.data, inits, parameters, "lm.txt", n.thin=nt, n.chains=nc, n.burnin=nb, n.iter=ni, 
debug = TRUE, DIC = FALSE)


print(out, dig = 3)

NOTE: 
This was a really tricky one: without the DIC=FALSE argument in the function bugs(), WinBUGS crashed 
and said something about 

-------------------------------
NIL dereference (read)

 DeviancePlugin.Monitor.Update   [00000080H] 
	.chain	INTEGER	0
	.i	INTEGER	0
	.monitor	DeviancePlugin.Monitor	[01003E90H] 
 UpdaterActions.UpdateMonitors   [0000011CH] 
	.chain	INTEGER	0
	[ MUCH more down here ]
-------------------------------

Because this cryptic error message said something about the deviance and in a similar case, switching 
off the calculation of the DIC had helped, I tried that ---- and it DID help. That's one of the 
things that one just has to know in order to succeed with some analyses with WinBUGS !


So to inspect the residuals, we can plot them:

plot(out$mean$residual)
abline(h = c(-4,-2, 0, 2, 4))
sum(abs(out$mean$residual) >4 )

We see that there are some outliers (defined here as units with absolute residuals greater than 4).
However, given that they only make up 13/952, and hence, about 1.4% of the data, this is probably 
not alarming.








###############################################################
#                                                             #
#        Chapter 12: Linear mixed model                       #
#                                                             #
###############################################################




1. Turning the LMM into an ANCOVA
   ------------------------------

Exercise:
Turn the LMM back into an ANCOVA.

Solution:
The LMM has explanatory variables pop*length, and the coefficients of both sets of coefficients
(alpha for pop and beta for length) are assumed to come from one normal distribution each. To turn
this model back into an ANCOVA, we simply remove these two distributional assumptions. I take my 
data set generated in 12.2. and the code in 12.4.2. and do the following:


# Define model
sink("model.txt")
cat("
model {

# Priors
 for (i in 1:ngroups){		
    alpha[i] ~ dnorm(0, 0.00001)	# Change the priors (don't choose them too tight !)
    beta[i] ~ dnorm(0, 0.00001)		# No hyperparams anymore
 }

# Delete the code for the hyperparams
 #mu.int ~ dnorm(0, 0.001)	# Mean hyperparameter for random intercepts
 #tau.int <- 1 / (sigma.int * sigma.int)
 #sigma.int ~ dunif(0, 100)	# SD hyperparameter for random intercepts

 #mu.slope ~ dnorm(0, 0.001)	# Mean hyperparameter for random slopes
 #tau.slope <- 1 / (sigma.slope * sigma.slope)
 #sigma.slope ~ dunif(0, 100)	# SD hyperparameter for slopes

 tau <- 1 / ( sigma * sigma)	# Residual precision
 sigma ~ dunif(0, 100)		# Residual standard deviation
 

 # Likelihood			# Note no change at all here
 for (i in 1:n) {
    mass[i] ~ dnorm(mu[i], tau)
    mu[i] <- alpha[pop[i]] + beta[pop[i]]* length[i]
 }
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(mass = as.numeric(mass), pop = as.numeric(pop), length = length, 
ngroups = max(as.numeric(pop)), n = n)

# Inits function # No inits anymore for hyperparams (they're not in the model anymore)
inits <- function(){ list(alpha = rnorm(n.groups, 0, 2), beta = rnorm(n.groups, 10, 2), 
sigma = rlnorm(1))}

# Parameters to estimate
parameters <- c("alpha", "beta", "sigma")

# MCMC settings
ni <- 2000
nb <- 500
nt <- 2
nc <- 3

# Start Gibbs sampling
out <- bugs(win.data, inits, parameters, "model.txt", n.thin=nt, n.chains=nc, n.burnin=nb, 
n.iter=ni, debug = TRUE)

print(out, dig = 2)

For fun, we can plot the data and the regression lines:

plot(length, mass, col= "grey", cex = 1.5)
for(i in 1:max(as.numeric(pop))){
   abline(out$mean$alpha[i], out$mean$beta[i], col = i)
}





2. Fit a random-coefficients regression to mean density (Swiss hares) 
   ------------------------------------------------------------------

Exercise:
Fit the model without intercept-slope correlation (mean.density ~ pop * year)

Solution:

# Define model
sink("model.txt")
cat("
model {

# Priors
 for (i in 1:ngroups){		
    alpha[i] ~ dnorm(mu.int, tau.int)	# Random intercepts
    beta[i] ~ dnorm(mu.slope, tau.slope) # Random slopes
 }

 mu.int ~ dnorm(0, 0.001)	# Mean hyperparameter for random intercepts
 tau.int <- 1 / (sigma.int * sigma.int)
 sigma.int ~ dunif(0, 100)	# SD hyperparameter for random intercepts

 mu.slope ~ dnorm(0, 0.001)	# Mean hyperparameter for random slopes
 tau.slope <- 1 / (sigma.slope * sigma.slope)
 sigma.slope ~ dunif(0, 100)	# SD hyperparameter for slopes

 tau <- 1 / ( sigma * sigma)	# Residual precision
 sigma ~ dunif(0, 100)		# Residual standard deviation

# Likelihood
 for (i in 1:n) {
    mean.density[i] ~ dnorm(mu[i], tau)
    mu[i] <- alpha[pop[i]] + beta[pop[i]]* year[i]
 }
}
",fill=TRUE)
sink()


# Bundle data
attach(hares)
win.data <- list(mean.density = mean.density, pop = as.numeric(site), year = as.numeric(scale(year)), 
ngroups = max(as.numeric(site)), n = length(mean.density))

# Inits function
inits <- function(){ list(alpha = rnorm(n.groups, 0, 2), beta = rnorm(n.groups, 10, 2), 
mu.int = rnorm(1, 0, 1), sigma.int = rlnorm(1), mu.slope = rnorm(1, 0, 1), sigma.slope = rlnorm(1),
sigma = rlnorm(1))}

# Parameters to estimate
parameters <- c("alpha", "beta", "mu.int", "sigma.int", "mu.slope", "sigma.slope", "sigma")

# MCMC settings
ni <- 2000
nb <- 500
nt <- 2
nc <- 3

# Start Gibbs sampling
out <- bugs(win.data, inits, parameters, "model.txt", n.thin=nt, n.chains=nc, n.burnin=nb, 
n.iter=ni, debug = TRUE)

print(out, dig = 2)

# compare with REML solutions
library('lme4')
lme.fit2 <- lmer(mean.density ~ scale(year) + (1 | site) + ( 0+ scale(year) | site))
lme.fit2








###############################################################
#                                                             #
#        Chapter 13: Introduction to the GLM                  #
#                                                             #
###############################################################




1. Predict mean hare density in arable areas
   -----------------------------------------

Exercise:
Add a code line that computes mean observed density in arable areas.

Solution:
We simply add a line that adds alpha and beta and exponentiate the result.

# Define model
sink("Poisson.t.test.txt")
cat("
model {

# Priors
 alpha ~ dnorm(0,0.001)
 beta ~ dnorm(0,0.001)

# Likelihood
 for (i in 1:n) {
    C[i] ~ dpois(lambda[i]) 
    log(lambda[i]) <- alpha + beta *x[i]

# Fit assessments
    Presi[i] <- (C[i] - lambda[i]) / sqrt(lambda[i])		# Pearson residuals
    C.new[i] ~ dpois(lambda[i])					# Replicate data set
    Presi.new[i] <- (C.new[i] - lambda[i]) / sqrt(lambda[i]) # Pearson resi
    D[i] <- pow(Presi[i], 2)
    D.new[i] <- pow(Presi.new[i], 2)
 }

# Compute mean hare density in arable areas
mean.arable <- exp(alpha + beta)

# Add up discrepancy measures
 fit <- sum(D[])
 fit.new <- sum(D.new[])
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(C = C, x = as.numeric(x)-1, n = length(x))

# Inits function
inits <- function(){ list(alpha=rlnorm(1), beta=rlnorm(1))}

# Parameters to estimate
params <- c("lambda","alpha", "beta", "mean.arable", "Presi", "fit", "fit.new")

# MCMC settings
nc <- 3
ni <- 3000
nb <- 1000
nt <- 2

# Start Gibbs sampler
out <- bugs(data=win.data, inits=inits, parameters.to.save=params, model.file="Poisson.t.test.txt", n.thin=nt, n.chains=nc, 
n.burnin=nb, n.iter=ni, debug = TRUE)

print(out, dig = 3)




2. Density difference as a derived quantity
   ----------------------------------------

Exercise:
Estimate the difference in density between arable and grassland areas

Solution:
We add another line of code

# Define model
sink("Poisson.t.test.txt")
cat("
model {

# Priors
 alpha ~ dnorm(0,0.001)
 beta ~ dnorm(0,0.001)

# Likelihood
 for (i in 1:n) {
    C[i] ~ dpois(lambda[i]) 
    log(lambda[i]) <- alpha + beta *x[i]

# Fit assessments
    Presi[i] <- (C[i] - lambda[i]) / sqrt(lambda[i])		# Pearson residuals
    C.new[i] ~ dpois(lambda[i])					# Replicate data set
    Presi.new[i] <- (C.new[i] - lambda[i]) / sqrt(lambda[i]) # Pearson resi
    D[i] <- pow(Presi[i], 2)
    D.new[i] <- pow(Presi.new[i], 2)
 }

# Compute difference in mean hare density
difference <- exp(alpha) - exp(alpha + beta)

# Add up discrepancy measures
 fit <- sum(D[])
 fit.new <- sum(D.new[])
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(C = C, x = as.numeric(x)-1, n = length(x))

# Inits function
inits <- function(){ list(alpha=rlnorm(1), beta=rlnorm(1))}

# Parameters to estimate
params <- c("lambda","alpha", "beta", "difference", "Presi", "fit", "fit.new")

# MCMC settings
nc <- 3
ni <- 3000
nb <- 1000
nt <- 2

# Start Gibbs sampler
out <- bugs(data=win.data, inits=inits, parameters.to.save=params, model.file="Poisson.t.test.txt", 
n.thin=nt, n.chains=nc, n.burnin=nb, n.iter=ni, debug = TRUE)

print(out, dig = 3)

To summarize the posterior distribution of the density, we plot it.

hist(out$sims.list$difference, breaks = 50, col = "grey", xlim = c(-10, 5), 
xlab = "Density difference grassland - arable land")
abline(v = 0, col = "red", lwd = 2)




3. Migrating raptors
   -----------------

Exercise:
Assume that we are counting raptors at some point where bird migration concentrates, for instance 
at an isthmus like Gibraltar or Panama, at a mountain pass like Col de Bretolet or along a ridge like
Hawk Mountain. We have seen not a single individual of some special raptor species A during 10 consecutive 
days. What is the probability to see two or more individuals of species A on day 11 ? Make explicit your 
reasoning and list the assumptions of your model used to answer the question (this exercise is inspired by
Bernardo 2003).

Solution:
We assume that raptors travel independently and that there is a constant probability that one is seen 
at the observatory. If this is so, then the data (ten zeroes) can be assumed to represent 10 observations 
from a Poisson distribution, whose mean we want to estimate. Using Bayesian posterior predictive distributions, 
it is possible the estimate the probability that on the 11th day, or indeed on any other 
particular day, at least one bird of species A is seen. 


# Define the data, i.e., the counts of raptor A counts over 10 days
C <-c(rep(0,10))


# Write model
sink("raptors.txt")
cat("
model {

 # Prior
 lambda ~ dunif(0,5)
 
 # Likelihood
 for (i in 1:n) {		# Loop over 10 observations
     C[i] ~ dpois(lambda) 	# They are assumed to come from a Poisson
}
y ~ dpois(lambda)		# To get a prediction for a new day
}
",fill=TRUE)
sink()

# Package data
win.data <- list(C = C, n = length(C))

# Inits
inits <- function(){ list(lambda=rlnorm(1))}

# Stuff to get estimates from
params <- c("lambda", "y")

# MCMC settings
nc <- 3
ni <- 15000
nb <- 3000
nt <- 2

out <- bugs(data=win.data, inits=inits, parameters.to.save=params, model.file="raptors.txt", 
n.thin=nt,n.chains=nc,n.burnin=nb, n.iter=ni, debug = TRUE, bugs.directory = "C:/Program files/WinBUGS14/")

print(out, dig = 5)
> print(out, dig = 5)
Inference for Bugs model at "raptors.txt", fit using WinBUGS,
 3 chains, each with 15000 iterations (first 3000 discarded), n.thin = 2
 n.sims = 18000 iterations saved
            mean      sd    2.5%    25%    50%    75%   97.5%    Rhat n.eff
lambda   0.10034 0.10100 0.00270 0.0283 0.0696 0.1387 0.37001 1.00100 18000
y        0.10128 0.33354 0.00000 0.0000 0.0000 0.0000 1.00000 1.00103 18000
deviance 2.00672 2.01991 0.05401 0.5660 1.3920 2.7750 7.40022 1.00100 18000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = Dbar-Dhat)
pD = 0.0 and DIC = 2.0
DIC is an estimate of expected predictive error (lower deviance is better).

First, don't be confused by the numerical similarity of lambda and the variable y. Lambda is the average 
number of raptors per day (i.e., the Poisson parameter), while y is a random draw, or actually, a large 
number of random draws, from the Poisson distribution characterised by that mean.

Then note what may appear to be a paradox to some, namely that we can get a fairly large number 
of zero observations from a Poisson process and yet, the mean of that process is not 0. To answer the 
question about the probability of at least one raptor of species A on a given day, we use Bayesian posterior 
inference, i.e., the fact that a Bayesian mode of inference, unlike frequentist inference that 
estimates a single point using, e.g., maximum likelihood, gives us a whole distribution that describes 
the likely magnitude of an unobservable. That unobservable here is the number of raptors per day in a 
Poisson process with mean equal to lambda. By drawing a single observation from that Poisson process at every
step of the Gibbs sampler for lambda, we furthermore fully incorporate in our assessment the entire 
uncertainty about lambda. Let's plot the posterior distribution of the number of raptors A per day in this 
Poisson process:

hist(out$sims.list$y, col = "grey", xlab = "Number of raptors A per day")

It seems that under a Poisson(0.10034) process, the most likely observation is indeed zero, but that 
sometimes we also observe one's and rarely two's or even higher counts sometimes:

table(out$sims.list$y)
> table(out$sims.list$y)

    0     1     2     3 
16352  1480   161     7 

To estimate the probability that we see at least one raptor of species A during any given day, we express 
the proportion of cases with one or more individuals seen among all random draws from the modeled process.

more.than.0 <- out$sims.list$y >0	# Note calculation with logical funct'n
prob.one.or.more <- sum(more.than.0) / length(more.than.0)
prob.one.or.more
> prob.one.or.more
[1] 0.09155556

Hence, there is a 9% probability to see one or more individuals per day.




4. Modeling mean density when only zeroes are observed in one group
   ----------------------------------------------------------------

Exercise:
Take the simulated hare data from this chapter and turn all counts in one of the landuse types

Solution:
We may get another sample of hare counts and turn the first 10 to zero:
n.sites <- 10
x <- gl(n = 2, k = n.sites, labels = c("grassland", "arable"))
n <- 2*n.sites
lambda <- exp(0.69 + 0.92*(as.numeric(x)-1)) # x has levels 1 and 2, not 0 and 1
C <- rpois(n = n, lambda = lambda)			# Add Poisson noise
C[1:10] <- 0
C

First, the analysis using maximum likelihood:
poisson.t.test <- glm(C ~ x, family = poisson)	# Fit the model
summary(poisson.t.test)					# t-Test
anova(poisson.t.test, test = "Chisq")		# Likelihood ratio test (LRT)

We see that something went wrong with the algorithm, since the standard errors of the estimates are huge,
and furthermore, the estimates look nothing like the numbers we put into the data set.


Next, the analysis using Bayesian posterior inference:

# Define model
sink("Poisson.t.test.txt")
cat("
model {

# Priors
 alpha ~ dnorm(0,0.001)
 beta ~ dnorm(0,0.001)

# Likelihood
 for (i in 1:n) {
    C[i] ~ dpois(lambda[i]) 
    log(lambda[i]) <- alpha + beta *x[i]
 }
 
# Compute difference in mean hare density
mean.grass <- exp(alpha)
mean.arable <- exp(alpha + beta)
difference <- mean.grass - mean.arable
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(C = C, x = as.numeric(x)-1, n = length(x))

# Inits function
inits <- function(){ list(alpha=rlnorm(1), beta=rlnorm(1))}

# Parameters to estimate
params <- c("alpha", "beta", "mean.grass", "mean.arable", "difference")

# MCMC settings
nc <- 3
ni <- 20000
nb <- 10000
nt <- 10

# Start Gibbs sampler
out <- bugs(data=win.data, inits=inits, parameters.to.save=params, model.file="Poisson.t.test.txt", 
n.thin=nt, n.chains=nc, n.burnin=nb, n.iter=ni, debug = TRUE)
print(out, dig = 3)

We see that Bayesian inference is more robust here.




5. Swiss hare counts in arable and grassland areas
   -----------------------------------------------

Exercise:
Fit a 'Poisson t-test' to the Swiss hare counts (not the density) for year 2000.

Solution:
Get min counts at each site and landuse indicators for the year 2000
C <- apply(cbind(hares$count1[hares$year == 2000], hares$count2[hares$year == 2000]), 1, min, na.rm = TRUE) 
C[C == Inf] <- NA	# These are sites without counts in 2000
grass <- as.numeric(hares$landuse[hares$year == 2000]) - 1

# Define model
sink("Poisson.t.test.txt")
cat("
model {

# Priors
 alpha ~ dnorm(0,0.001)
 beta ~ dnorm(0,0.001)

# Likelihood
 for (i in 1:n) {
    C[i] ~ dpois(lambda[i]) 
    log(lambda[i]) <- alpha + beta *x[i]
 }
 
# Compute difference in mean hare density
mean.arable <- exp(alpha)
mean.grass <- exp(alpha + beta)
difference <- mean.arable - mean.grass
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(C = C, x = grass, n = length(C))

# Inits function
inits <- function(){ list(alpha=rlnorm(1), beta=rlnorm(1))}

# Parameters to estimate
params <- c("alpha", "beta", "mean.grass", "mean.arable", "difference")

# MCMC settings
nc <- 3
ni <- 3000
nb <- 1000
nt <- 2

# Start Gibbs sampler
out <- bugs(data=win.data, inits=inits, parameters.to.save=params, model.file="Poisson.t.test.txt", 
n.thin=nt, n.chains=nc, n.burnin=nb, n.iter=ni, debug = TRUE)
print(out, dig = 3)


It may be inadequate to assume that all the variability between counts will be accounted for by the Poisson 
variance. Sites vary in size and other things, and this will introduce much overdispersion. We can simply 
see this by comparing the residual deviance with the residual d.f, when fitting the GLM per maximum 
likelihood.

poisson.t.test <- glm(C ~ grass, family = poisson)	# Fit the model
summary(poisson.t.test)					# t-Test

The residual deviance is about 20 times greater than the residual d.f., whereas it should be about the same
magnitude under the Poisson assumption.








###############################################################
#                                                             #
#        Chapter 14: GLM specialities                         #
#                                                             #
###############################################################




1. Estimating a coefficient for an offset variable
   -----------------------------------------------

Exercise:
Declaring log(A) as an offset means that log(A) enters the linear predictor of a GLM with a coefficient fixed
at 1. This entails a precise interpretation of the response as a density then. In this sense, it may be 
confusing to say 'we estimate a coefficient for an offset', since some might say that the variable then
ceases to be an offset. Whatever, an offset is equivalent to the assumption that there is a constant
proportionality between y and A in a model such as y = g(theta), log(theta) = log(A) + stuff. This may well
not be true and we may want to test this assumption.

Solution:
We generate a data set or take the old one again.
n.site <- 10
A <- runif(n = 2*n.site, 2,5)		# Areas range in size from 2 to 5 km2
x <- gl(n = 2, k = n.site, labels = c("grassland", "arable"))
linear.predictor <- log(A) + 0.69 +(0.92*(as.numeric(x)-1))
lambda <- exp(linear.predictor)
C <- rpois(n = 2*n.site, lambda = lambda)	# Add Poisson noise

Then we specify a coefficient, alpha0, for the explanatory variable logA.

# Define model
sink("model.txt")
cat("
model {
# Priors
 alpha0 ~ dnorm(0,0.001) 
 alpha ~ dnorm(0,0.001)
 beta ~ dnorm(0,0.001)

# Likelihood
 for (i in 1:n) {
    C[i] ~ dpois(lambda[i]) 
    log(lambda[i]) <- alpha0 * logA[i] + alpha + beta *x[i]	# Note former 'offset'
 }
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(C = C, x = as.numeric(x)-1, logA = log(A), n = length(x))

# Inits function
inits <- function(){ list(alpha=rlnorm(1), beta=rlnorm(1))}

# Parameters to estimate
params <- c("lambda", "alpha0", "alpha", "beta")

# MCMC settings
nc <- 3
ni <- 11000
nb <- 1000
nt <- 2

# Start Gibbs sampling
out <- bugs(data=win.data, inits=inits, parameters.to.save=params, model.file="model.txt", n.thin=nt, 
n.chains=nc,n.burnin=nb, n.iter=ni, debug = TRUE)
print(out, dig = 3)

To test whether alpha0 could also be assumed to be equal to 1 we can plot the posterior distribution and see
how likely values other then 1 are.

hist(out$sims.list$alpha0, col = "grey", breaks = 50)
abline(v = 1, col = "red", lwd = 2)

We see that 1 is not an unusual value, so we can just as well take log(A) as an offset.




2. Swiss hare counts in arable and grassland areas
   -----------------------------------------------

Exercise:
Fit a 'Poisson t-test' to the Swiss hare counts for year 2000 accounting for variable area size by an offset 
and overdispersion by a random site effect. Also test for a constant proportionality between counts and area.

Solution:
Get min counts at each site and landuse indicators for the year 2000
C <- apply(cbind(hares$count1[hares$year == 2000], hares$count2[hares$year == 2000]), 1, min, na.rm = TRUE) 
C[C == Inf] <- NA	# These are sites without counts in 2000
GRASS <- as.numeric(hares$landuse[hares$year == 2000]) - 1
AREA <- as.numeric(hares$area[hares$year == 2000])


# Define model
sink("model.txt")
cat("
model {

# Priors
 alpha ~ dnorm(0,0.001)
 beta ~ dnorm(0,0.001)
 tau <- 1 / (sd * sd)
 sd ~ dunif(0, 3)

# Likelihood
 for (i in 1:n) {
    C[i] ~ dpois(lambda[i]) 
    log(lambda[i]) <- logA[i] + alpha + beta * x[i] + eps[i]
    eps[i] ~ dnorm(0, tau)
 }
 
# Compute difference in mean hare density
mean.arable <- exp(alpha)
mean.grass <- exp(alpha + beta)
difference <- mean.arable - mean.grass
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(C = C, x = GRASS, n = length(C), logA = log(AREA))

# Inits function
inits <- function(){ list(alpha=rlnorm(1), beta=rlnorm(1))}

# Parameters to estimate
params <- c("alpha", "beta", "mean.grass", "mean.arable", "difference", "sd")

# MCMC settings
nc <- 3
ni <- 3000
nb <- 1000
nt <- 2

# Start Gibbs sampler
out <- bugs(data=win.data, inits=inits, parameters.to.save=params, model.file="model.txt", 
n.thin=nt, n.chains=nc, n.burnin=nb, n.iter=ni, debug = TRUE)
print(out, dig = 3)
> print(out, dig = 3)
Inference for Bugs model at "model.txt", fit using WinBUGS,
 3 chains, each with 3000 iterations (first 1000 discarded), n.thin = 2
 n.sims = 3000 iterations saved
               mean    sd    2.5%     25%     50%     75%   97.5%  Rhat n.eff
alpha         1.295 0.168   0.969   1.185   1.298   1.408   1.627 1.046    49
beta         -0.771 0.290  -1.311  -0.971  -0.770  -0.582  -0.201 1.017   130
mean.grass    1.737 0.418   1.062   1.434   1.695   1.980   2.680 1.004  1000
mean.arable   3.704 0.623   2.635   3.270   3.664   4.090   5.091 1.050    46
difference    1.967 0.755   0.510   1.473   1.962   2.455   3.525 1.036    66
sd            0.845 0.109   0.652   0.768   0.837   0.914   1.074 1.001  3000
deviance    272.773 9.967 255.700 265.600 272.000 278.925 294.000 1.001  3000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = Dbar-Dhat)
pD = 43.3 and DIC = 316.1
DIC is an estimate of expected predictive error (lower deviance is better).

Note that now mean.grass and mean.arable are densities (i.e., counts per unit area defined by the variable A).



Next, we fit a coefficient for log(A) to see whether density is constant over all area sizes.

# Define model
sink("model.txt")
cat("
model {

# Priors
 alpha0 ~ dnorm(0,0.01)		# Reduced uncertainty of priors a bit
 alpha ~ dnorm(0,0.01)
 beta ~ dnorm(0,0.01)
 tau <- 1 / (sd * sd)
 sd ~ dunif(0, 2)

# Likelihood
 for (i in 1:n) {
    C[i] ~ dpois(lambda[i]) 
    log(lambda[i]) <- alpha0 * logA[i] + alpha + beta * x[i] + eps[i]
    eps[i] ~ dnorm(0, tau)
 }
 
# Compute difference in mean hare density
mean.arable <- exp(alpha)
mean.grass <- exp(alpha + beta)
difference <- mean.arable - mean.grass
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(C = C, x = GRASS, n = length(C), logA = log(AREA))

# Inits function
inits <- function(){ list(alpha=rlnorm(1), beta=rlnorm(1))}

# Parameters to estimate
params <- c("alpha0", "alpha", "beta", "mean.grass", "mean.arable", "difference", "sd")

# MCMC settings # Note longer chains 
nc <- 3
ni <- 30000
nb <- 10000
nt <- 20

# Start Gibbs sampler
out <- bugs(data=win.data, inits=inits, parameters.to.save=params, model.file="model.txt", 
n.thin=nt, n.chains=nc, n.burnin=nb, n.iter=ni, debug = TRUE)
print(out, dig = 3)

> print(out, dig = 3)
Inference for Bugs model at "model.txt", fit using WinBUGS,
 3 chains, each with 30000 iterations (first 10000 discarded), n.thin = 20
 n.sims = 3000 iterations saved
               mean    sd    2.5%     25%     50%     75%   97.5%  Rhat n.eff
alpha0        1.358 0.241   0.885   1.197   1.349   1.518   1.842 1.036    74
alpha         0.663 0.449  -0.253   0.378   0.691   0.963   1.538 1.025    94
beta         -0.795 0.284  -1.377  -0.982  -0.785  -0.599  -0.249 1.002  1100
mean.grass    0.993 0.524   0.331   0.625   0.886   1.242   2.269 1.026    88
mean.arable   2.144 0.998   0.776   1.459   1.996   2.621   4.653 1.025    94
difference    1.151 0.662   0.282   0.716   1.021   1.431   2.869 1.030   160
sd            0.844 0.112   0.654   0.765   0.833   0.911   1.105 1.002  1000
deviance    271.934 9.939 254.297 265.100 271.200 278.000 293.100 1.001  3000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = Dbar-Dhat)
pD = 43.2 and DIC = 315.1
DIC is an estimate of expected predictive error (lower deviance is better).

Note that now, mean.grass and mean.arable are no longer the mean density in the two landuse types.

mean(out$sims.list$alpha0 > 1)

So there is quite a bit of evidence for higher densities in larger sites.








###############################################################
#                                                             #
#        Chapter 15: Poisson ANCOVA                           #
#                                                             #
###############################################################




1. Multiple Poisson regression
   ---------------------------

Exercise:
Add another explanatory variable into the model.

Solution:
We create 300 values for our new explanatory variable.

nonsense <- runif(300, -2, 2)


# Define model
sink("glm.txt")
cat("
model {

# Priors
 for (i in 1:n.groups){		
    alpha[i] ~ dnorm(0, 0.01)		# Intercepts
    beta[i] ~ dnorm(0, 0.01)		# Slopes
 }
 delta ~ dnorm(0, 0.01)			# Slope of nonsense
 

# Likelihood
 for (i in 1:n) {
    C[i] ~ dpois(lambda[i])		# The random variable
    lambda[i] <- exp(alpha[pop[i]] + beta[pop[i]] * length[i] + delta * nonsense[i])
 }	# Note double-indexing: alpha[pop[i]]
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(C = C, pop = as.numeric(pop), n.groups = n.groups, length = length, n = n,
nonsense = nonsense)

# Inits function
inits <- function(){list(alpha=rlnorm(n.groups,3,1), beta=rlnorm(n.groups,2,1))}

# Parameters to estimate
params <- c("alpha", "beta", "delta")

# MCMC settings
ni <- 4500
nb <- 1500
nt <- 5
nc <- 3

# Start Gibbs sampling
out <- bugs(data = win.data, inits = inits, parameters.to.save = params, model.file = "glm.txt", 
n.thin = nt, n.chains = nc, n.burnin = nb, n.iter = ni, debug = TRUE)
print(out, dig = 3)

The posterior distribution of the coefficient for nonsense sits on zero, therefore we have not detected
any nonsense effects on the mite counts.




2. Polynomial Poisson regression
   -----------------------------

Exercise:
We check for the effect of wing-length squared on the mite counts, without building in this effect into the
data set.

Solution:
We fit a model with population-specific quadratic terms of length

# Define model
sink("glm.txt")
cat("
model {

# Priors
 for (i in 1:n.groups){		
    alpha[i] ~ dnorm(0, 0.01)		# Intercepts
    beta1[i] ~ dnorm(0, 0.01)		# Slopes of linear terms
    beta2[i] ~ dnorm(0, 0.01)		# Slopes of quadratic terms
 }
 
# Likelihood
 for (i in 1:n) {
    C[i] ~ dpois(lambda[i])		# The random variable
    lambda[i] <- exp(alpha[pop[i]] + beta1[pop[i]] * length[i] + beta2[pop[i]] * pow(length[i], 2))
 }
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(C = C, pop = as.numeric(pop), n.groups = n.groups, length = length, n = n)

# Inits function
inits <- function(){list(alpha=rlnorm(n.groups,3,1), beta1=rlnorm(n.groups,2,1))}

# Parameters to estimate
params <- c("alpha", "beta1", "beta2")

# MCMC settings
ni <- 10000
nb <- 5000
nt <- 5
nc <- 3

# Start Gibbs sampling
out <- bugs(data = win.data, inits = inits, parameters.to.save = params, model.file = "glm.txt", 
n.thin = nt, n.chains = nc, n.burnin = nb, n.iter = ni, debug = TRUE)
print(out, dig = 3)

Note: I had to use quite long chains to achieve convergence.




3. Poisson ANCOVA for Swiss hare counts
   ------------------------------------

Exercise:
Fit a Poisson regression that expresses counts as a function of site and year (note: this is essentially the
model that people use to analyse bird counts all over Europe in monitoring programs). You can account for
variable area by fitting log(area) as an offset.

Solution:
# Define model
sink("glm.txt")
cat("
model {

# Priors
 for (i in 1:n.groups){		
    alpha[i] ~ dnorm(0, 0.01)		# Intercepts
    beta1[i] ~ dnorm(0, 0.01)		# Slopes of linear terms
 }
 
# Likelihood
 for (i in 1:n) {
    C[i] ~ dpois(lambda[i])		# The random variable
    lambda[i] <- exp(logA[i] + alpha[pop[i]] + beta1[pop[i]] * year[i])
 }
}
",fill=TRUE)
sink()

# Bundle data
attach(hares)
npop <- max(as.numeric(site))
win.data <- list(C = count1, pop = as.numeric(site), n.groups = npop, 
year = as.numeric(scale(year)), n = length(count1), logA = log(area))

# Inits function
inits <- function(){list(alpha=rnorm(npop), beta1=rnorm(npop))}

# Parameters to estimate
params <- c("alpha", "beta1", "lambda")

# MCMC settings
ni <- 1000
nb <- 500
nt <- 5
nc <- 3

# Start Gibbs sampling
out <- bugs(data = win.data, inits = inits, parameters.to.save = params, model.file = "glm.txt", 
n.thin = nt, n.chains = nc, n.burnin = nb, n.iter = ni, debug = TRUE, DIC = FALSE)
# Note: got NIL dereference error, so switched DIC off

print(out, dig = 3)

# Quick plot of expected count (note lambda contains logA)
plot(1992:2008, out$mean$lambda[1:17], ylim = c(0, 200), main = "Expected counts (not density)", 
ylab = "Exp. count", xlab = "Year", lwd = 1, type = "l")

for(i in 2:56){
   start.index <- (i - 1) * 17 + 1
   end.index <- i * 17
   lines(1992:2008, out$mean$lambda[start.index:end.index], col = i)
}

# Compare with ML solutions
yscale <- scale(year)

fm <- glm(count1 ~ site * yscale - 1 -yscale, family = poisson, offset = log(area))
summary(fm)
predict.glm(fm)


# Compare estimates of intercepts and slopes: numerically about the same, as usual
par(mfrow = c(2,1))
plot(fm$coef[1:56], out$mean$alpha)
abline(0,1)
plot(fm$coef[57:112], out$mean$beta)
abline(0,1)








###############################################################
#                                                             #
#        Chapter 16: Poisson GLMM                             #
#                                                             #
###############################################################




1. Random-coefficients Poisson regression (Swiss hare data)
   --------------------------------------------------------

Exercise:
Fit the random-coefficients model (site * year) to the counts. Take a single count per year and site and
also include log(area) as an offset.

Solution:

# REML solution for fun
library('lme4')
glmm.fit <- glmer(count1 ~ year + (1 | site) + ( 0 + year | site), family = poisson, offset = log(area))
# Does not converge. Try scaling year

yr <- as.numeric(scale(year))

glmm.fit <- glmer(count1 ~ yr + (1 | site) + ( 0 + yr | site), family = poisson, offset = log(area))
glmm.fit

So we see that classical numerical solutions may often go astray.



Next the solution in BUGS

# Define model
sink("glmm.txt")
cat("
model {

# Priors
 for (i in 1:n.groups){		
    alpha[i] ~ dnorm(mu.int, tau.int)	# Intercepts
    beta[i] ~ dnorm(mu.beta, tau.beta)	# Slopes
}
 mu.int ~ dnorm(0, 0.00001)	# Hyperparameter for random intercepts
 tau.int <- 1 / (sigma.int * sigma.int)
 sigma.int ~ dunif(0, 5)

 mu.beta ~ dnorm(0, 0.00001)	# Hyperparameter for random slopes
 tau.beta <- 1 / (sigma.beta * sigma.beta)
 sigma.beta ~ dunif(0, 5)


# Poisson likelihood
 for (i in 1:n) {
    C[i] ~ dpois(lambda[i])
    lambda[i] <- A[i] * exp(alpha[pop[i]] + beta[pop[i]]* year[i])
    # lambda[i] <- exp(log(A[i]) + alpha[pop[i]] + beta[pop[i]]* year[i]) # Alternative formulation for offset
 }
}
",fill=TRUE)
sink()

# Bundle data
n.groups = max(as.numeric(site))
win.data <- list(C = count1, pop = as.numeric(site), year = yr, n.groups = n.groups, n = length(count1),
A = area)

# Inits function
inits <- function(){ list(alpha = rnorm(n.groups, 0, 2), beta = rnorm(n.groups, 0, 2), mu.int = rnorm(1, 0, 1), 
sigma.int = rlnorm(1), mu.beta = rnorm(1, 0, 1), sigma.beta = rlnorm(1))}

# Parameters to estimate
parameters <- c("alpha", "beta", "mu.int", "sigma.int", "mu.beta", "sigma.beta")

# MCMC settings
ni <- 2000
nb <- 500
nt <- 2
nc <- 3

# Start Gibbs sampling
out <- bugs(win.data, inits, parameters, "glmm.txt", n.thin=nt, n.chains=nc, n.burnin=nb, n.iter=ni, 
debug = TRUE)
print(out, dig = 3)

glmm.fit

We get the usual, numerically comparable estimates.








2. Random-coefficients Poisson regression with landuse-dependence of hyperparams (Swiss hares)
   -------------------------------------------------------------------------------------------

Exercise:
Now make the hyperparams dependent on landuse.

Solution:
Perhaps this could also be done with REML using lmer(), but I don't know how. In BUGS we simply define the 
hyperparameters as vectors of length two, rather than as scalars as before, and make them dependent on landuse
as an explanatory variable at the level of the sites. That is, we introduce LU as an explanatory variable of 
length 56.

attach(hares)
yr <- as.numeric(scale(year))
LU <- as.numeric(tapply(as.numeric(landuse), as.numeric(site), mean))	
# without the leading as.numeric() WinBUGS would crash


# Define model
sink("glmm.txt")
cat("
model {

# Priors
 for (i in 1:n.groups){	
       alpha[i] ~ dnorm(mu.int[LU[i]], tau.int[LU[i]])	# Intercepts 
       beta[i] ~ dnorm(mu.beta[LU[i]], tau.beta[LU[i]])	# Slopes
 }

 for(j in 1:2){
    mu.int[j] ~ dnorm(0, 0.00001)	# Hyperparameter for random intercepts
    tau.int[j] <- 1 / (sigma.int[j] * sigma.int[j])
    sigma.int[j] ~ dunif(0, 5)

    mu.beta[j] ~ dnorm(0, 0.00001)	# Hyperparameter for random slopes
    tau.beta[j] <- 1 / (sigma.beta[j] * sigma.beta[j])
    sigma.beta[j] ~ dunif(0, 5)
 }

# Poisson likelihood
 for (i in 1:n) {
    C[i] ~ dpois(lambda[i])
    lambda[i] <- A[i] * exp(alpha[pop[i]] + beta[pop[i]]* year[i])
    # lambda[i] <- exp(log(A[i]) + alpha[pop[i]] + beta[pop[i]]* year[i]) # Alternative formulation for offset
 }
}
",fill=TRUE)
sink()

# Bundle data
n.groups = max(as.numeric(site))
win.data <- list(C = count1, pop = as.numeric(site), year = yr, n.groups = n.groups, n = length(count1),
A = area, LU = LU)

# Inits function  # Must get two values for the four hyperparams
inits <- function(){ list(alpha = rnorm(n.groups, 0, 2), beta = rnorm(n.groups, 0, 2), mu.int = rnorm(2, 0, 1), 
sigma.int = runif(2), mu.beta = rnorm(2, 0, 1), sigma.beta = runif(2))}

# Parameters to estimate
parameters <- c("alpha", "beta", "mu.int", "sigma.int", "mu.beta", "sigma.beta")

# MCMC settings
ni <- 2000
nb <- 500
nt <- 2
nc <- 3

# Start Gibbs sampling
out <- bugs(win.data, inits, parameters, "glmm.txt", n.thin=nt, n.chains=nc, n.burnin=nb, n.iter=ni, 
debug = TRUE)
print(out, dig = 3)

>From looking at the posterior summaries, we conclude that, indeed, the hyperparameters for describing the 
inter-site variation in intercepts and slopes of the trends in density seem to vary between the two 
landuse types. We could easily test this formally by looking at the posterior distributions for 
derived quantities.

This is an excellent example for how WinBUGS frees the ecological modeller in you. Fitting such a custom model,
which tests a very specific ecological hypothesis, would be very challenging using other software, I believe.








###############################################################
#                                                             #
#        Chapter 17: Binomial GLM for two groups              #
#                                                             #
###############################################################




1. Bernoulli data as 'disaggregated' Binomial data
   -----------------------------------------------

Exercise:
Simulate and analyse the data in the chapter at a site-by-site level.

Solution:
We will treat the surveys for the two species at the same site as independent.

Generate the data:
n <- 50				# Number of sites
p.cr <- 13/50			# Occurrence.AND.detection probability Cross-leaved
p.ch <- 29/50			# Occurrence.AND.detection probability Chiltern gentian

C.cr <- rbinom(n, 1, prob = p.cr)   ;   C.cr		# Add Binomial noise
C.ch <- rbinom(n, 1, prob = p.ch)   ;   C.ch		# Add Binomial noise
C <- c(C.cr, C.ch)
species <-  gl(n = 2, k = 50, length = 100, labels = c("Cross-leaved gentian","Chiltern gentian"))

tapply(C, species, mean)



And here is the analysis using WinBUGS:

# Define model
sink("model.txt")
cat("
model {

# Priors
 alpha ~ dnorm(0,0.01)
 beta ~ dnorm(0,0.01)

# Likelihood
 for (i in 1:n) {
    C[i] ~ dbern(p[i])		# Bernoulli has a single param
    logit(p[i]) <- alpha + beta * species[i]
 }
# Derived quantities
 Occ.cross <- exp(alpha) / (1 + exp(alpha))
 Occ.chiltern <- exp(alpha + beta) / (1 + exp(alpha + beta))
 Occ.Diff <- Occ.chiltern - Occ.cross		# Test quantity
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(C = C, n = 100, species = as.numeric(species)-1)

# Inits function
inits <- function(){ list(alpha=rlnorm(1), beta=rlnorm(1))}

# Parameters to estimate
params <- c("alpha", "beta", "Occ.cross", "Occ.chiltern", "Occ.Diff")

# MCMC settings
nc <- 3
ni <- 1200
nb <- 200
nt <- 2

# Start Gibbs sampling
out <- bugs(data=win.data, inits=inits, parameters.to.save=params, model.file="model.txt", n.thin=nt, 
n.chains=nc, n.burnin=nb, n.iter=ni, debug = TRUE)

print(out, dig = 3)

This illustrates the fact that a binomial is simply a sum of N Bernoullis.




2. Probability of occurrence of a 'large' count (Swiss hares)
   ----------------------------------------------------------

Exercise:
Model the landuse-dependence of the occurrence of a 'large' count. Define 'large' appropriately.

Solution:
We will analyse the first count only and first look at the distribution of those counts to decide on what 
'large' could mean. We will assume counts from different years in the same area as independent and not bother
about site effects.

attach(hares)
hist(count1, breaks = 40, col = "grey")

One might choose 100 as a threshold. Check how many counts are greater than 100.

sum(count1 > 100, na.rm = TRUE)
> sum(count1 > 100, na.rm = TRUE)
[1] 22

That's not too many, so we try a lower threshold, 50 say. 
sum(count1 > 50, na.rm = TRUE)

It turns out there are 106 such counts, and we will use that as a threshold to designate a count as large.
large <- as.numeric(count1 > 50)

To analyse the landuse-dependence of the occurrence of a large count, we can simply adapt the code from 
exercise 1 above.


# Define model
sink("model.txt")
cat("
model {

# Priors
 alpha ~ dnorm(0,0.01)
 beta ~ dnorm(0,0.01)

# Likelihood
 for (i in 1:n) {
    large[i] ~ dbern(p[i])
    logit(p[i]) <- alpha + beta * grass[i]
 }
# Derived quantities
 mean.arable <- exp(alpha) / (1 + exp(alpha))
 mean.grass <- exp(alpha + beta) / (1 + exp(alpha + beta))
 difference <- mean.arable - mean.grass		# Test quantity
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(large = large, n = length(large), grass = as.numeric(landuse)-1)

# Inits function
inits <- function(){ list(alpha=rlnorm(1), beta=rlnorm(1))}

# Parameters to estimate
params <- c("alpha", "beta", "mean.arable", "mean.grass", "difference")

# MCMC settings
nc <- 3
ni <- 1200
nb <- 200
nt <- 2

# Start Gibbs sampling
out <- bugs(data=win.data, inits=inits, parameters.to.save=params, model.file="model.txt", n.thin=nt, 
n.chains=nc, n.burnin=nb, n.iter=ni, debug = TRUE)

print(out, dig = 3)
> print(out, dig = 3)
Inference for Bugs model at "model.txt", fit using WinBUGS,
 3 chains, each with 1200 iterations (first 200 discarded), n.thin = 2
 n.sims = 1500 iterations saved
               mean    sd    2.5%     25%     50%     75%   97.5%  Rhat n.eff
alpha        -1.522 0.117  -1.750  -1.599  -1.518  -1.441  -1.300 1.001  1500
beta         -0.651 0.265  -1.181  -0.825  -0.647  -0.480  -0.137 1.001  1500
mean.arable   0.180 0.017   0.148   0.168   0.180   0.191   0.214 1.001  1500
mean.grass    0.104 0.023   0.066   0.088   0.103   0.119   0.152 1.001  1500
difference    0.075 0.028   0.019   0.058   0.076   0.094   0.126 1.001  1500
deviance    579.717 1.986 577.800 578.300 579.100 580.500 584.900 1.002  1500

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = Dbar-Dhat)
pD = 2.0 and DIC = 581.7
DIC is an estimate of expected predictive error (lower deviance is better).


Conclusion: If we define a count of 51 and greater as 'large', then there is a 18% chance that a count in an
arable site is large, compared to a 10.4% chance in grassland sites. This difference is 'significant', because
the 95% credible interval for the difference (0.019-0.126) does not cover zero. 








###############################################################
#                                                             #
#        Chapter 18: Binomial ANCOVA                          #
#                                                             #
###############################################################




1. Add a further explanatory variable X
   ------------------------------------

Exercise:
Modify the model in this chapter by adding to it the main effect of a further explanatory variable, say X, 
along with its interaction with the explanatory variable named wet.

Solution:
We will not reassemble the data but use the one we got by executing the code in the chapter. We simply
generate a new random explanatory variable called X and then build the product of it with wetness.

X <- runif(length(wetness))
Xwet <- X * wetness


Here is the analysis using WinBUGS:

# Define model
sink("glm.txt")
cat("
model {

# Priors
 for (i in 1:n.groups){		
    alpha[i] ~ dnorm(0, 0.01)		# Intercepts
    beta1[i] ~ dnorm(0, 0.01)		# Slopes of wetness 
 }
    beta2 ~ dnorm(0, 0.01)		# Slope of X
    gamma ~ dnorm(0, 0.01)		# Slope of interaction X with wet
 
# Likelihood
 for (i in 1:n) {
    C[i] ~ dbin(p[i], N[i])
    logit(p[i]) <- alpha[pop[i]] + beta1[pop[i]]* wetness[i] + beta2 * X[i] + gamma * Xwet[i] 
 }

# Derived quantities
# Recover the effects relative to baseline level (no. 1)
 a.effe2 <- alpha[2] - alpha[1]		# Intercept Black Forest vs. Jura
 a.effe3 <- alpha[3] - alpha[1]		# Intercept Alps vs. Jura
 b.effe2 <- beta1[2] - beta1[1]		# Slope Black Forest vs. Jura
 b.effe3 <- beta1[3] - beta1[1]		# Slope Alps vs. Jura

}
",fill=TRUE)
sink()


# Bundle data
win.data <- list(C = C, N = N, pop = as.numeric(pop), n.groups = n.groups, wetness = wetness, n = n, 
X = X, Xwet = Xwet)

# Inits function
inits <- function(){ list(alpha = rlnorm(n.groups, 3, 1), beta1 = rlnorm(n.groups, 2, 1))}

# Parameters to estimate
params <- c("alpha", "beta1", "beta2", "gamma", "a.effe2", "a.effe3", "b.effe2", "b.effe3")

# MCMC settings
ni <- 5000
nb <- 1000
nt <- 4
nc <- 3

# Start Gibbs sampler
out <- bugs(data = win.data, inits = inits, parameters.to.save = params, model.file = "glm.txt", n.thin = nt,
n.chains = nc, n.burnin = nb, n.iter = ni, debug = FALSE)

print(out, dig = 3)

# Compare with MLE's
fm <- glm(C/N ~ pop * wetness + X + Xwet, weights = N, family = binomial)
summary(fm)

We note slight differences that may be due to the effects of the priors on the Bayesian estimates or to
small-sample bias in the MLE's.




2. Fit a binomial ANCOVA to the occurrence of large counts (Swiss hares)
   ---------------------------------------------------------------------

Exercise:
Fit a model with interacting effects of landuse and year (continuous). Also check for non-linear effects of year
by fitting a quadratic year effect in addition to the linear one.

Solution:
We choose 5 as a threshold for a large count this time. 

attach(hares)
large <- as.numeric(count1 > 5)

Here are the MLE's.
yr <- as.numeric(scale(year))
fm <- glm(large ~ landuse * yr - 1 - yr, family = binomial)
summary(fm)


Here is the Bayesian analysis using WinBUGS. Note that we use the numerical trick of 'stabilizing' the logit
to avoid numerical over- or underflow. Absent that, WinBUGS will often crash. Also, flat normal priors did not
lead to well-mixing chains, so I used uniform distributions instead.

# Define model
sink("glm.txt")
cat("
model {

# Priors
 for (i in 1:2){		
    alpha[i] ~ dunif(-5, 5)		# Intercepts for landuse
    beta[i] ~ dunif(-5, 5)		# Annual trend estimate
 }
 
# Likelihood
 for (i in 1:n) {
    large[i] ~ dbern(plim[i])
    logit(plim[i]) <- lplim[i]
    lplim[i] <- min(999, max(-999, logitp[i]))
    logitp[i] <- alpha[landuse[i]] + beta[landuse[i]]* yr[i] 
 }
}
",fill=TRUE)
sink()


# Bundle data
win.data <- list(large = large, landuse = as.numeric(landuse), yr = yr, n = length(count1))

# Inits function
inits <- function(){ list(alpha = rnorm(2), beta = rnorm(2))}

# Parameters to estimate
params <- c("alpha", "beta")

# MCMC settings
ni <- 5000
nb <- 1000
nt <- 4
nc <- 3

# Start Gibbs sampler
out <- bugs(data = win.data, inits = inits, parameters.to.save = params, model.file = "glm.txt", n.thin = nt,
n.chains = nc, n.burnin = nb, n.iter = ni, debug = TRUE)

print(out, dig = 5)
summary(fm)

We see the usual numerical similarity of Bayesian estimates and those obtained by the method of maximum 
likelihood.


We next check for non-linear annual trends in the occurrence of large counts. To do this, we simply add a 
quadratic effect of year.

Here are the MLE's.
yr <- as.numeric(scale(year))
yr2 <- yr^2
fm <- glm(large ~ landuse * (yr + yr2) - 1 - yr -yr2, family = binomial)
summary(fm)


Here is the Bayesian analysis. The vague, flat normal priors did not lead to very good mixing of the Markov 
chains, so as an alternative I tried uniform priors, and they worked better.

# Define model
sink("glm.txt")
cat("
model {

# Priors
 for (i in 1:2){		
#    alpha[i] ~ dnorm(0, 0.001)		# Intercepts for landuse
#    beta1[i] ~ dnorm(0, 0.001)		# Annual trend estimate (linear)
#    beta2[i] ~ dnorm(0, 0.001)		# Annual trend estimate (quadratic)

    alpha[i] ~ dunif(-5, 5)		# Intercepts for landuse
    beta1[i] ~ dunif(-1, 1)		# Annual trend estimate (linear)
    beta2[i] ~ dunif(-1, 1)		# Annual trend estimate (quadratic)

 }
 
# Likelihood
 for (i in 1:n) {
    large[i] ~ dbern(plim[i])
    logit(plim[i]) <- lplim[i]
    lplim[i] <- min(999, max(-999, logitp[i]))
    logitp[i] <- alpha[landuse[i]] + beta1[landuse[i]]* yr[i] + beta2[landuse[i]]* yr2[i] 
 }
}
",fill=TRUE)
sink()


# Bundle data
win.data <- list(large = large, landuse = as.numeric(landuse), yr = yr, yr2 = yr2, n = length(count1))

# Inits function
inits <- function(){ list(alpha = rnorm(2), beta1 = rnorm(2))}

# Parameters to estimate
params <- c("alpha", "beta1", "beta2")

# MCMC settings
ni <- 5000
nb <- 1000
nt <- 4
nc <- 3

# Start Gibbs sampler
out <- bugs(data = win.data, inits = inits, parameters.to.save = params, model.file = "glm.txt", n.thin = nt,
n.chains = nc, n.burnin = nb, n.iter = ni, debug = TRUE)

print(out, dig = 5)
summary(fm)

Now the similarity between Bayesian and ML estimates is somewhat more moderate, although the uncertainty intervals
overlap easily. In any case, there is no evidence for a quadratic effect of year.








###############################################################
#                                                             #
#        Chapter 19: Binomial GLMM                            #
#                                                             #
###############################################################




0. Extra exercise: do a series of GLM 'conversions'
   ------------------------------------------------

Exercise:
(a) Take the binomial GLMM and turn in back into a binomial GLM. Then turn that back into (b) a Poisson GLM 
and finally (c) into a normal GLM. 

(a) illustrates the difference between a random-effects model and a fixed-effects model.
(b) illustrates the differences between a binomial and a Poisson GLM
(c) illustrates the difference between a (Poisson) GLM and a simple normal linear model

You'll have to (re-) create the data set from chapter 19 first.

Solutions:

(a) Conversion Binomial GLMM into a binomial GLM
------------------------------------------------


Binomial GLMM (from the book chapter)
-------------------------------------

# Define model
sink("bin_glmm.txt")
cat("
model {

# Priors
 for (i in 1:n.groups){		
    alpha[i] ~ dnorm(mu.int, tau.int)	# Intercepts
    beta[i] ~ dnorm(mu.beta, tau.beta)	# Slopes
}
 mu.int ~ dnorm(0, 0.001)	# Hyperparameter for random intercepts
 tau.int <- 1 / (sigma.int * sigma.int)
 sigma.int ~ dunif(0, 10)

 mu.beta ~ dnorm(0, 0.001)	# Hyperparameter for random slopes
 tau.beta <- 1 / (sigma.beta * sigma.beta)
 sigma.beta ~ dunif(0, 10)

# Binomial likelihood
 for (i in 1:n) {
    C[i] ~ dbin(p[i], N[i])
    logit(p[i]) <- alpha[pop[i]] + beta[pop[i]]* prec[i]
 }
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(C = C, N = N, pop = as.numeric(pop), prec = prec, n.groups = n.groups, n = n)

# Inits function
inits <- function(){ list(alpha = rnorm(n.groups, 0, 2), beta = rnorm(n.groups, 1, 1), 
mu.int = rnorm(1, 0, 1), mu.beta = rnorm(1, 0, 1))}

# Parameters to estimate
params <- c("alpha", "beta", "mu.int", "sigma.int", "mu.beta", "sigma.beta")

# MCMC settings
ni <- 2000
nb <- 500
nt <- 2
nc <- 3

# Start Gibbs sampling
out.bin.glmm <- bugs(win.data, inits, params, "bin_glmm.txt", n.thin=nt, n.chains=nc, n.burnin=nb, 
n.iter=ni, debug = TRUE)
print(out.bin.glmm, dig = 2)



Binomial GLM
------------

# Define model
sink("bin_glm.txt")
cat("
model {

# Priors
 for (i in 1:n.groups){		
    alpha[i] ~ dnorm(0, 0.0001)	# Intercepts
    beta[i] ~ dnorm(0, 0.0001)	# Slopes
}

# Binomial likelihood
 for (i in 1:n) {
    C[i] ~ dbin(p[i], N[i])
    logit(p[i]) <- alpha[pop[i]] + beta[pop[i]]* prec[i]
 }
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(C = C, N = N, pop = as.numeric(pop), prec = prec, n.groups = n.groups, n = n)

# Inits function
inits <- function(){ list(alpha = rnorm(n.groups, 0, 2), beta = rnorm(n.groups, 1, 1))}

# Parameters to estimate
params <- c("alpha", "beta")

# Start Gibbs sampling
out.bin.glm <- bugs(win.data, inits, params, "bin_glm.txt", n.thin=nt, n.chains=nc, n.burnin=nb, 
n.iter=ni, debug = TRUE)

print(out.bin.glm, dig = 2)




(b) Conversion Binomial GLM into a Poisson GLM
------------------------------------------------

# Define model
sink("Poisson_glm.txt")
cat("
model {

# Priors
 for (i in 1:n.groups){		
    alpha[i] ~ dnorm(0, 0.0001)	# Intercepts
    beta[i] ~ dnorm(0, 0.0001)	# Slopes
}

# Poisson likelihood
 for (i in 1:n) {
    C[i] ~ dpois(lambda[i])
    log(lambda[i]) <- alpha[pop[i]] + beta[pop[i]]* prec[i]
#    lambda[i] <- exp(alpha[pop[i]] + beta[pop[i]]* prec[i])
 }
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(C = C, pop = as.numeric(pop), prec = prec, n.groups = n.groups, n = n)

# Inits function
inits <- function(){ list(alpha = rnorm(n.groups, 0, 2), beta = rnorm(n.groups, 1, 1))}

# Parameters to estimate
params <- c("alpha", "beta", "lambda")

# Start Gibbs sampling
out.Poisson.glm <- bugs(win.data, inits, params, "Poisson_glm.txt", n.thin=nt, n.chains=nc, n.burnin=nb, n.iter=ni, debug = TRUE)
print(out.Poisson.glm, dig = 2)




(c) Conversion Poisson GLM into a normal GLM (= LM)
---------------------------------------------------

# Define model
sink("lm.txt")
cat("
model {

# Priors
 for (i in 1:n.groups){		
    alpha[i] ~ dnorm(0, 0.0001)	# Intercepts
    beta[i] ~ dnorm(0, 0.0001)	# Slopes
}

tau <- 1/(sigma*sigma)
sigma ~ dunif(0, 100)

# Normal likelihood
 for (i in 1:n) {
    C[i] ~ dnorm(mu[i], tau)
    mu[i] <- alpha[pop[i]] + beta[pop[i]]* prec[i]
 }
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(C = C, pop = as.numeric(pop), prec = prec, n.groups = n.groups, n = n)

# Inits function
inits <- function(){ list(alpha = rnorm(n.groups, 0, 2), beta = rnorm(n.groups, 1, 1), sigma = rlnorm(1))}

# Parameters to estimate
params <- c("alpha", "beta", "sigma")

# Start Gibbs sampling
out.lm <- bugs(win.data, inits, params, "lm.txt", n.thin=nt, n.chains=nc, n.burnin=nb, n.iter=ni, debug = TRUE)
print(out.lm, dig = 2)




1. Get predictions of expected breeding success
   --------------------------------------------

Exercise:
For thw woodchat shrike example, form predictions of the expected breeding success and overlay this on a 
plot of the observed breeding success.

Solution:
Though this may be possible using lattice graphics, we try a simpler solution 'by hand' instead. We redraw
a grid of plots, one for each population. Since a 4 by 4 panel plot may appear too crammed, we can choose
other configurations by the par() command, for instance by selecting mfrow = c(2,2), we produce four 2 by 2 panel plots.

par(mfrow = c(4,4), mar = c(4, 4, 1, 2) + 0.1)
for(i in 1:16){
   print(i)
   realised.breeding.success <- C[((i-1)*10+1):(i*10)] / N[((i-1)*10+1):(i*10)] 
   precipitation <- prec[((i-1)*10+1):(i*10)]
   plot(precipitation, realised.breeding.success, ylim = c(0,1), xlim = c(0,1), las = 1,
   ylab = "Prop. successful", xlab = "Precipitation", cex = 1.5, col = "blue")
   browser()
}


All we have to do in WinBUGS to obtain estimates of the expected breeding success is to monitor p !

# Define model
sink("glmm.txt")
cat("
model {

# Priors
 for (i in 1:n.groups){		
    alpha[i] ~ dnorm(mu.int, tau.int)	# Intercepts
    beta[i] ~ dnorm(mu.beta, tau.beta)	# Slopes
}
 mu.int ~ dnorm(0, 0.001)	# Hyperparameter for random intercepts
 tau.int <- 1 / (sigma.int * sigma.int)
 sigma.int ~ dunif(0, 10)

 mu.beta ~ dnorm(0, 0.001)	# Hyperparameter for random slopes
 tau.beta <- 1 / (sigma.beta * sigma.beta)
 sigma.beta ~ dunif(0, 10)

# Binomial likelihood
 for (i in 1:n) {
    C[i] ~ dbin(p[i], N[i])
    logit(p[i]) <- alpha[pop[i]] + beta[pop[i]]* prec[i]
 }
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(C = C, N = N, pop = as.numeric(pop), prec = prec, n.groups = n.groups, n = n)

# Inits function
inits <- function(){ list(alpha = rnorm(n.groups, 0, 2), beta = rnorm(n.groups, 1, 1), mu.int = rnorm(1, 0, 1), mu.beta = rnorm(1, 0, 1))}

# Parameters to estimate
params <- c("alpha", "beta", "mu.int", "sigma.int", "mu.beta", "sigma.beta", "p")

# MCMC settings
ni <- 2000
nb <- 500
nt <- 2
nc <- 3

# Start Gibbs sampling
out <- bugs(win.data, inits, params, "glmm.txt", n.thin=nt, n.chains=nc, n.burnin=nb, n.iter=ni, debug = TRUE)
print(out, dig = 3)


We now adapt the graphics code to also draw the lines representing the expected breeding success. As a 
graphical trick we will order all three vectors in this graph by ascending values of precipitation.

par(mfrow = c(4,4), mar = c(4, 4, 1, 2) + 0.1)
for(i in 1:16){
   print(i)
   
   # Get values for each population
   precipitation <- prec[((i-1)*10+1):(i*10)]
   realised.breeding.success <- C[((i-1)*10+1):(i*10)] / N[((i-1)*10+1):(i*10)] 
   expected.breeding.success <- out$mean$p[((i-1)*10+1):(i*10)]
   
   # Determine order of prec
   ooo <- order(precipitation)
   
   # Plot stuff
   plot(precipitation[ooo], realised.breeding.success[ooo], ylim = c(0,1), xlim = c(0,1), las = 1,
   ylab = "Prop. successful", xlab = "Precipitation", cex = 1.5, col = "blue")
   lines(precipitation[ooo], expected.breeding.success[ooo], col = "red", lwd = 3)
   browser()
}




2. Fixed and random (shrinkage)
   ----------------------------

Exercise:
Refit the model by dropping the distributional assumptions about the intercepts and slopes of the relationship
between breeding success and precipitation, i.e., fit the fixed-effects counterpart of the model in this
chapter, and compare the estimates.

Solution:
We first save the output of the GLMM under a different name. 

out.glmm <- out

Then we refit the fixed-effects counterpart of the model.

# Define model
sink("glm.txt")
cat("
model {

# Priors
 for (i in 1:n.groups){		
    alpha[i] ~ dnorm(0, 0.001)	# Intercepts
    beta[i] ~ dnorm(0, 0.001)	# Slopes
}

# Binomial likelihood
 for (i in 1:n) {
    C[i] ~ dbin(p[i], N[i])
    logit(p[i]) <- alpha[pop[i]] + beta[pop[i]]* prec[i]
 }
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(C = C, N = N, pop = as.numeric(pop), prec = prec, n.groups = n.groups, n = n)

# Inits function
inits <- function(){ list(alpha = rnorm(n.groups), beta = rnorm(n.groups))}

# Parameters to estimate
params <- c("alpha", "beta")

# MCMC settings
ni <- 2000
nb <- 500
nt <- 2
nc <- 3

# Start Gibbs sampling
out.glm <- bugs(win.data, inits, params, "glm.txt", n.thin=nt, n.chains=nc, n.burnin=nb, n.iter=ni, debug = TRUE)
print(out.glm, dig = 3)

Now we plot both sets of estimates in the same graph and inspect them first in a table.
alpha.fix <- out.glm$mean$alpha
beta.fix <- out.glm$mean$beta
alpha.random <- out.glmm$mean$alpha
beta.random <- out.glmm$mean$beta

cbind(alpha.fix, alpha.random, beta.fix, beta.random)

plot(alpha.fix)
points(alpha.random)

par(mfrow = c(2,1), mar = c(4, 4, 1, 2) + 0.1)
plot(alpha.fix, ylim = c(-3,3), las = 1, ylab = "Intercept estimates", xlab = "Population", cex = 1.5, 
col = "blue") 
abline(h = mean(alpha.fix), col = "blue", lwd = 2)
points(alpha.random, col = "red", lwd = 2, cex = 1.5)
segments(1:16, alpha.fix, 1:16, alpha.random, lwd = 2, col = "green")

plot(beta.fix, ylim = c(-5,1), las = 1, ylab = "Slope estimates", xlab = "Population", cex = 1.5, 
col = "blue") 
abline(h = mean(beta.fix), col = "blue", lwd = 2)
points(beta.random, col = "red", lwd = 2, cex = 1.5)
segments(1:16, beta.fix, 1:16, beta.random, lwd = 2, col = "green")

We clearly see that, on average, under the random-effects model, more extreme group means (population means),
or those estimated with less precision, are pulled in more towards the overall mean. Here, the blue line 
shows the mean of the fixed-effects estimates.








###############################################################
#                                                             #
#        Chapter 20: Site-occ model                           #
#                                                             #
###############################################################




1. Prior sensitivity
   -----------------

Exercise:
Compare the uniform and the normal priors in terms of the posterior means of the primary model params.

Solution: 
As usual, I assume that the data generated float around in your R workspace. We then fit the model using 
first the uniform(-10, 10) and then the flat normal priors.

# Define model
sink("model1.txt")
cat("
model {

# Priors
 alpha.occ ~ dunif(-10, 10)	# Set A of priors
 beta.occ ~ dunif(-10, 10)
 alpha.p ~ dunif(-10, 10)
 beta.p ~ dunif(-10, 10)

# Likelihood
 for (i in 1:R) { #start initial loop over the R sites
 # True state model for the partially observed true state
    z[i] ~ dbern(psi[i])		# True occupancy z at site i
    logit(psi[i]) <- alpha.occ + beta.occ * humidity[i]

    for (j in 1:T) { # start a second loop over the T replicates
       # Observation model for the actual observations
       y[i,j] ~ dbern(eff.p[i,j])	# Detection-nondetection at i and j
       eff.p[i,j] <- z[i] * p[i,j]
       logit(p[i,j]) <- alpha.p + beta.p * humidity[i]
    }
 }
# Derived quantities
 occ.fs <- sum(z[])
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(y = y, humidity = humidity, R = dim(y)[1], T = dim(y)[2])

# Inits function
zst <- apply(y, 1, max)		#Good starting values for latent states essential !
inits <- function(){list(z = zst, alpha.occ=runif(1, -5, 5), beta.occ = runif(1, -5, 5), 
alpha.p = runif(1, -5, 5), beta.p = runif(1, -5, 5))}

# Parameters to estimate
params <- c("alpha.occ","beta.occ", "alpha.p", "beta.p", "occ.fs")

# MCMC settings
nc <- 3
nb <- 2000
ni <- 12000
nt <- 5

# Start Gibbs sampler
out1 <- bugs(win.data, inits, params, "model1.txt", n.chains=nc, n.iter=ni, n.burn = nb, 
n.thin=nt, debug = TRUE)
print(out1, dig = 3)


Next, we fit the model using the flat normal priors. (If you get the dreaded Trap 66 (postcondition violation),
try to stabilize the logit; see book appendix. Or just set off the Markov chains another time and hope that
it does not occur.)

# Define model
sink("model2.txt")
cat("
model {

# Priors
 alpha.occ ~ dnorm(0, 0.01) # Set B of priors
 beta.occ ~ dnorm(0, 0.01)
 alpha.p ~ dnorm(0, 0.01)
 beta.p ~ dnorm(0, 0.01)

# Likelihood
 for (i in 1:R) { #start initial loop over the R sites
 # True state model for the partially observed true state
    z[i] ~ dbern(psi[i])		# True occupancy z at site i
    logit(psi[i]) <- alpha.occ + beta.occ * humidity[i]

    for (j in 1:T) { # start a second loop over the T replicates
       # Observation model for the actual observations
       y[i,j] ~ dbern(eff.p[i,j])	# Detection-nondetection at i and j
       eff.p[i,j] <- z[i] * p[i,j]
       logit(p[i,j]) <- alpha.p + beta.p * humidity[i]
    }
 }
# Derived quantities
 occ.fs <- sum(z[])
}
",fill=TRUE)
sink()

# Start Gibbs sampler
out2 <- bugs(win.data, inits, params, "model2.txt", n.chains=nc, n.iter=ni, n.burn = nb, 
n.thin=nt, debug = TRUE)

Now we compare the estimates under both prior specifications.
print(out1, dig = 3)
print(out2, dig = 3)

We conclude that there is hardly any sensitvitiy to these choices of priors.


And finally, we check whether the uniform(-10, 10) priors did not 'squeeze in' the marginal posterior distributions
of the primary parameters by plotting them

par(mfrow = c(2,2))
hist(out1$sims.list$alpha.occ, breaks = 50, col = "grey", main = "alpha.occ", xlab = "", ylab = "",
xlim = c(-10, 10))
abline(v = c(-10, 10), col = "red")
hist(out1$sims.list$beta.occ, breaks = 50, col = "grey", main = "beta.occ", xlab = "", ylab = "",
xlim = c(-10, 10))
abline(v = c(-10, 10), col = "red")
hist(out1$sims.list$alpha.p, breaks = 50, col = "grey", main = "alpha.p", xlab = "", ylab = "",
xlim = c(-10, 10))
abline(v = c(-10, 10), col = "red")
hist(out1$sims.list$beta.p, breaks = 50, col = "grey", main = "beta.p", xlab = "", ylab = "",
xlim = c(-10, 10))
abline(v = c(-10, 10), col = "red")

>From looking at the range over which each marginal posterior distribution has support, it appears obvious 
that the chosen bounds of the uniform priors for the primary model parameters were sufficiently wide so
as to be uninformative.




2. Site and survey covariates
   --------------------------

Exercise:
Usually, in the analysis of a site-occ model, you will have covariates that vary by site only (site covariates),
but also covariates that vary by site and individual survey (here called survey covariates). You must
know then how to specify the analysis.

As an extra, we see what happens when there are missing values in the covariates (i.e., when the design 
is not balanced, that is, when not all sites were surveyed an equal number of times)

Solution:
Survey covariates have the same dimension as the response array, i.e., the matrix containing the 
detection-nondetection data y. 

We generate a survey covariate and fill it with random numbers. Then, we randomly turn 50 values into NA's.

X <- matrix(round(runif(450), dig = 2), ncol = 3)
NA.data <- sort(sample((1:450), size = 50))
X[NA.data] <- NA


# Define model
sink("model.txt")
cat("
model {

# Priors
 alpha.occ ~ dunif(-10, 10)	# Set A of priors
 beta.occ ~ dunif(-10, 10)
 alpha.p ~ dunif(-10, 10)
 beta.p ~ dunif(-10, 10)
 beta.X ~ dunif(-10, 10)

# Likelihood
 for (i in 1:R) {
 # True state model for the partially observed true state
    z[i] ~ dbern(psi[i])		# True occupancy z at site i
    logit(psi[i]) <- alpha.occ + beta.occ * humidity[i]

    for (j in 1:T) {
       # Observation model for the actual observations
       y[i,j] ~ dbern(eff.p[i,j])	# Detection-nondetection at i and j
       eff.p[i,j] <- z[i] * p[i,j]
       logit(p[i,j]) <- alpha.p + beta.p * humidity[i] + beta.X * X[i,j]
    }
 }
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(y = y, humidity = humidity, R = dim(y)[1], T = dim(y)[2], X = X)

# Inits function
zst <- apply(y, 1, max)
inits <- function(){list(z = zst, alpha.occ=runif(1, -5, 5), beta.occ = runif(1, -5, 5), 
alpha.p = runif(1, -5, 5), beta.p = runif(1, -5, 5))}

# Parameters to estimate
params <- c("alpha.occ","beta.occ", "alpha.p", "beta.p", "beta.X")

# MCMC settings
nc <- 3
nb <- 200
ni <- 1200
nt <- 5

# Start Gibbs sampler
out <- bugs(win.data, inits, params, "model.txt", n.chains=nc, n.iter=ni, n.burn = nb, n.thin=nt, 
debug = TRUE)

Executing this code, WinBUGS will crash saying 'made use of undefined node X'. The reason for this is that
we have missing values in an explanatory variable, and they need to be dealt with by either specifying a model
for them (e.g., defining a prior for X in our case) or by eliminating them before we start the analysis. 
The simplest way to get rid of the missing covariate values is by 'mean-imputing' them, i.e., by replacing
them with the mean of the observed covariates. Unless the number of missing values is excessively great,
this should not affect the results too much.

So here we first mean-impute the missing X's.

X[is.na(X)] <- mean(X, na.rm = TRUE)

Then, we repeat the analysis. We need not execute all of the above R code:

# Bundle data
win.data <- list(y = y, humidity = humidity, R = dim(y)[1], T = dim(y)[2], X = X)

# Start Gibbs sampler
out <- bugs(win.data, inits, params, "model.txt", n.chains=nc, n.iter=ni, n.burn = nb, n.thin=nt, 
debug = TRUE)
print(out, dig = 3)


Note that when we have missing values in the response, i.e., the array containing the detection-nondetection 
data, then that's no problem at all.




3. Occurrence of 'large populations' (Swiss hare data)
   ---------------------------------------------------

Exercise:
Collapse the hare counts of one particular year (e.g., 2000) to binary data where a 1 indicates the 
observation of a 'large population' (say, a count greater or equal 10). Estimate the proportion of sites 
inhabited by a 'large population', i.e., one that is capable of producing a count greater or equal 10. Also
fit a site covariate into the occupancy parameter (psi).

Solution:
We extract the count and elevation data from 2000 and then turn the counts into an indicator variable for
counts greater than 9.

attach(hares)
counts2000 <- cbind(count1[year == 2000], count2[year == 2000])
counts2000[counts2000 < 10] <- 0
counts2000[counts2000 > 9] <- 1
y <- counts2000

We directly fit a site-occupancy model with elevation as a covariate on the probability to be a 
'large population'. For this, we also extract the elevation data.

ELEV <- elevation[year == 2000]

We also must mean-impute some missing elevations and then standardise the covariate.
ELEV <- as.numeric(scale(ELEV))
ELEV[is.na(ELEV)] <- 0


# Define model
sink("model.txt")
cat("
model {

# Priors
 alpha.occ ~ dunif(-10, 10)
 beta.occ ~ dunif(-10, 10)
 p ~ dunif(0, 1)

# Likelihood
 for (i in 1:R) { #start initial loop over the R sites
 # True state model for the partially observed true state
    z[i] ~ dbern(psi[i])		# True occupancy z at site i
    logit(psi[i]) <- alpha.occ + beta.occ * ELEV[i]

    for (j in 1:T) { # start a second loop over the T replicates
       # Observation model for the actual observations
       y[i,j] ~ dbern(eff.p[i,j])	# Detection-nondetection at i and j
       eff.p[i,j] <- z[i] * p
    }
 }
# Derived quantities
 occ.fs <- sum(z[])	# Number of 'large populations' among 56
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(y = y, ELEV = ELEV, R = dim(y)[1], T = dim(y)[2])

# Inits function
zst <- apply(y, 1, max)		#Good starting values for latent states essential !
zst[is.na(zst)] <- 1
inits <- function(){list(z = zst, alpha.occ=runif(1, -5, 5), beta.occ = runif(1, -5, 5))}

# Parameters to estimate
params <- c("alpha.occ","beta.occ", "p", "occ.fs")

# MCMC settings
nc <- 3
nb <- 2000
ni <- 12000
nt <- 5

# Start Gibbs sampler
out <- bugs(win.data, inits, params, "model.txt", n.chains=nc, n.iter=ni, n.burn = nb, n.thin=nt, debug = TRUE)
print(out, dig = 3)
> print(out, dig = 3)
Inference for Bugs model at "model.txt", fit using WinBUGS,
 3 chains, each with 12000 iterations (first 2000 discarded), n.thin = 5
 n.sims = 6000 iterations saved
            mean    sd   2.5%    25%    50%    75%  97.5%  Rhat n.eff
alpha.occ  1.213 0.390  0.523  0.953  1.188  1.442  2.019 1.001  4000
beta.occ   0.024 0.438 -0.735 -0.265 -0.005  0.274  0.987 1.002  1400
p          0.901 0.044  0.798  0.875  0.907  0.933  0.967 1.001  5200
occ.fs    42.388 1.806 39.000 41.000 42.000 43.000 46.000 1.001  6000
deviance  38.073 8.834 29.670 30.500 35.380 42.232 60.800 1.002  1700

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 39.0 and DIC = 77.1
DIC is an estimate of expected predictive error (lower deviance is better).

Hence, we estimate that about 42 sites are occupied by populations that can produce counts greater than 9.
The detection probability is estimated at about 90%. This means that a 'large population' will produce counts
of 9 and less about 10% of the time. There does not seem to be a relationship between 'large populations' 
(as defined here) and the elevation of a site.




4. Simulation study
   ----------------

This is a bigger project that may well be part of a thesis.








###############################################################
#                                                             #
#        Chapter 21: N-mix model                              #
#                                                             #
###############################################################




1. Survey covariates
   -----------------

Exercise:
Generate a variation of the lizard data, wherein you build in a negative effect of the temperature during a 
survey on the detection probability of an individual lizard. Then analyse these data using an N-mix model.
We assume that surveys are conducted only during favourable temperatures (e.g., between 15 and 25 degrees)
and that lizards expose themselves more during colder weather. During surveys on hot days, they simply
stay concealed in the vegetation.

Solution: 
We will analyse the data in the horizontal format, i.e., retain the matrix format of the data.

R <- 200		# Number of surveyed sites
T <- 3			# Number of replicate counts at each site

vege <- sort(runif(n = R, min = -1.5, max =1.5))
alpha.lam <- 2		# Intercept
beta1.lam <- 2		# Linear effect of vegetation
beta2.lam <- -2		# Quadratic effect of vegetation
lam <- exp(alpha.lam + beta1.lam * vege + beta2.lam * (vege^2))
N <- rpois(n = R, lambda = lam)
table(N)		# Distribution of abundances across sites
sum(N > 0) / n.site	# Empirical occupancy

alpha.p <- 1		# Intercept
beta1.p <- -4		# Linear effect of vegetation
beta2.p <- -2		# Linear effect of temperature

y <- p <- array(NA, dim=(c(R, T)))			# Create arrays for y and p
temp <- array(runif(R*T, 15, 25), dim = c(R, T))	# Generate the temperature survey covariate
mean.temp <- mean(temp)					# Then we standardise the covariate
sd.temp <- sd(temp)
TEMP <- (temp - mean.temp) / sd.temp

for(j in 1:T){
     p[,j] <- plogis(alpha.p + beta1.p * vege + beta2.p * TEMP[,j]) 
     y[,j] <- rbinom(n = R, size = N, prob = p[,j])
}
y

sum(apply(y, 1, sum) > 0)	# Apparent distribution (proportion occupied sites)
sum(N > 0)			# True occupancy


Now the analysis using WinBUGS, and without stacking the data on top of each other. That is, we analyse the model
in the same array format that we saw in the site-occ examples.


# Define model
sink("Nmix.txt")
cat("
model {

# Priors
 alpha.lam ~ dunif(-10, 10)
 beta1.lam ~ dunif(-10, 10)
 beta2.lam ~ dunif(-10, 10)
 alpha.p ~ dunif(-10, 10)
 beta1.p ~ dunif(-10, 10)
 beta2.p ~ dunif(-10, 10)

# Likelihood
# Biological model for true abundance
 for (i in 1:R) {			# Loop over R sites
   N[i] ~ dpois(lambda[i])
   log(lambda[i]) <- alpha.lam + beta1.lam * vege[i] + beta2.lam * vege2[i]

 # Observation model for replicated counts
    for (j in 1:T) {			# Loop over all n observations
      y[i,j] ~ dbin(p[i,j], N[i])
      logit(p[i,j]) <- alpha.p + beta1.p * vege[i] + beta2.p * TEMP[i,j]
    }
 }

# Derived quantities
 totalN <- sum(N[])	# Estimate total population size across all sites
}
",fill=TRUE)
sink()

# Bundle data
vege2 = (vege * vege)
win.data <- list(R = R, T = T, y = y, vege = vege, vege2 = vege2, TEMP = TEMP)

# Inits function
Nst <- apply(y, 1, max) + 1
inits <- function(){list(N = Nst, alpha.lam=rnorm(1, 0, 1), beta1.lam=rnorm(1, 0, 1), 
beta2.lam=rnorm(1, 0, 1), alpha.p=rnorm(1, 0, 1), beta1.p=rnorm(1, 0, 1), beta2.p=rnorm(1, 0, 1))}

# Parameters to estimate
params <- c("N", "totalN", "alpha.lam", "beta1.lam", "beta2.lam", "alpha.p", "beta1.p", "beta2.p")

# MCMC settings
nc <- 3
ni <- 1200
nb <- 200
nt <- 5

# Start Gibbs sampler
out <- bugs(win.data, inits, params, "Nmix.txt", n.chains=nc, n.iter=ni, n.burn = nb, 
n.thin=nt, debug = TRUE)
print(out, dig = 3)
> print(out, dig = 3)
Inference for Bugs model at "Nmix.txt", fit using WinBUGS,
 3 chains, each with 1200 iterations (first 200 discarded), n.thin = 5
 n.sims = 600 iterations saved
             mean     sd    2.5%     25%     50%     75%    97.5%  Rhat n.eff
N[1]        0.000  0.000   0.000   0.000   0.000   0.000    0.000 1.000     1
N[2]        0.000  0.000   0.000   0.000   0.000   0.000    0.000 1.000     1
N[3]        0.000  0.000   0.000   0.000   0.000   0.000    0.000 1.000     1
N[4]        0.000  0.000   0.000   0.000   0.000   0.000    0.000 1.000     1
N[5]        0.000  0.000   0.000   0.000   0.000   0.000    0.000 1.000     1
 [  ...  ]
N[199]      2.160  1.737   0.000   1.000   2.000   3.000    6.000 1.004   600
N[200]      2.108  1.816   0.000   1.000   2.000   3.000    7.000 1.002   530
totalN    894.682 89.817 769.000 829.000 877.000 939.250 1120.050 1.015   120
alpha.lam   1.948  0.058   1.828   1.908   1.951   1.987    2.053 1.004   510
beta1.lam   1.931  0.140   1.654   1.836   1.941   2.027    2.185 1.001   600
beta2.lam  -1.871  0.197  -2.240  -2.008  -1.882  -1.733   -1.501 1.020   170
alpha.p     0.961  0.128   0.724   0.876   0.961   1.053    1.209 1.007   250
beta1.p    -4.016  0.281  -4.596  -4.207  -4.011  -3.829   -3.479 0.999   600
beta2.p    -2.112  0.131  -2.368  -2.190  -2.111  -2.019   -1.855 1.020    96
deviance  659.568 15.873 630.092 647.875 659.250 671.100  691.112 1.011   200


We compare that with the values we used to fake our data:
alpha.lam  ;  beta1.lam  ;  beta2.lam  ;   alpha.p  ;  beta1.p   ;   beta2.p
> alpha.lam  ;  beta1.lam  ;  beta2.lam  ;   alpha.p  ;  beta1.p   ;   beta2.p
[1] 2
[1] 2
[1] -2
[1] 1
[1] -4
[1] -2

And we are quite happy with these estimates -- and with this great model in general; thanks, Andy ;)




2. Prior sensitivity
   -----------------

Exercise:
Play around with different priors and see whether and how this affects the estimates. This is not shown.




3. Hare abundance in 56 Swiss study sites in 2000
   ----------------------------------------------

Exercise:
Fit the binomial mixture model to the hare data from a single year (e.g., 2000) and see whether there is a 
difference in the probability of detection in grassland and arable areas. By what proportion will mean or 
max counts underestimate true population size ?

We do three things in addition to what is asked for in the book:
1. We also fit an offset to account for variable size of the study sites. 
2. Since we strongly suspect density to be different in grassland and arable sites, we also fit that effect
into the abundance part of the model.
3. We compute a derived quantity that tests whether p is different in arable and grassland sites.

This example will then illustrate nicely one of the great advantages of the N-mix model: namely, that the
effects of a single covariate on both abundance and on detection can be teased apart.



Solution:
attach(hares)
y <- cbind(count1[year == 2000], count2[year == 2000])	# These are the counts
LU <- as.numeric(landuse[year == 2000])
A <- as.numeric(area[year == 2000])


# Define model
sink("Nmix.txt")
cat("
model {

# Priors
 for(j in 1:2){

    alpha.lam[j] ~ dunif(-10, 10)
    alpha.p[j] ~ dunif(-10, 10)
 }
 
# Likelihood
# Biological model for true abundance
 for (i in 1:R) {			# Loop over R sites
   N[i] ~ dpois(lambda[i])
   log(lambda[i]) <- log(A[i]) + alpha.lam[LU[i]]

 # Observation model for replicated counts
    for (j in 1:T) {			# Loop over all n observations
      y[i,j] ~ dbin(p[i,j], N[i])
      logit(p[i,j]) <- alpha.p[LU[i]]
    }
 }

# Derived quantities
 totalN <- sum(N[])	# Estimate total population size across all sites
 for(j in 1:2){
    mean.density[j] <- exp(alpha.lam[j])
    mean.p[j] <- exp(alpha.p[j])/(1+exp(alpha.p[j]))
 }
diff.p <- mean.p[1] - mean.p[2] 
 
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(R = dim(y)[1], T = dim(y)[2], y = y, LU = LU, A = A)

# Inits function
Nst <- apply(y, 1, max, na.rm = TRUE)+1
Nst[Nst == -Inf] <- 2
inits <- function(){list(N = Nst, alpha.lam=rnorm(2, 0, 1), alpha.p=rnorm(2, 0, 1))}

# Parameters to estimate
params <- c("totalN", "alpha.lam", "alpha.p", "mean.density", "mean.p", "diff.p")

# MCMC settings
nc <- 3
ni <- 12000
nb <- 2000
nt <- 10

# Start Gibbs sampler
out <- bugs(win.data, inits, params, "Nmix.txt", n.chains=nc, n.iter=ni, n.burn = nb, 
n.thin=nt, debug = TRUE)
print(out, dig = 3)
> print(out, dig = 3)
Inference for Bugs model at "Nmix.txt", fit using WinBUGS,
 3 chains, each with 12000 iterations (first 2000 discarded), n.thin = 10
 n.sims = 3000 iterations saved
                    mean      sd     2.5%      25%      50%      75%    97.5%  Rhat n.eff
totalN          2351.642 147.300 2112.000 2247.000 2332.000 2444.000 2678.025 1.016   140
alpha.lam[1]       2.070   0.076    1.933    2.016    2.064    2.121    2.227 1.017   150
alpha.lam[2]       1.240   0.078    1.097    1.186    1.237    1.290    1.401 1.004  2700
alpha.p[1]         0.651   0.207    0.249    0.505    0.655    0.786    1.054 1.011   260
alpha.p[2]         1.378   0.277    0.813    1.198    1.386    1.573    1.906 1.008   910
mean.density[1]    7.951   0.615    6.912    7.507    7.877    8.344    9.268 1.016   160
mean.density[2]    3.466   0.273    2.997    3.273    3.445    3.634    4.058 1.004  3000
mean.p[1]          0.656   0.046    0.562    0.624    0.658    0.687    0.742 1.014   160
mean.p[2]          0.795   0.046    0.693    0.768    0.800    0.828    0.871 1.006   770
diff.p            -0.139   0.065   -0.263   -0.182   -0.142   -0.098   -0.006 1.006   420
deviance         596.245  29.338  542.497  575.700  594.300  615.400  657.410 1.008   250

Hence, not only density but also the detection probability of individual hares differed by landuse type
in 56 Swiss study sites with predominantly arable or grassland use. The detection probability in the latter
in 2000 was estimated at about 80% while that in arable areas at about 66%. This difference was 'significant' 
judged by the 95% credible interval for the difference, which does not cover 0.

The total number of hares associated with all 56 study sites in 2000 over the 2-week survey period was 
estimated at 2352 hares. Let's compare this with two naive estimators of population size, the sum 
(over sites) of the mean or the max observed counts:

mean.counts <- apply(y, 1, mean, na.rm = TRUE)
mean.counts[mean.counts == 'NaN'] <- NA
max.counts <- apply(y, 1, max, na.rm = TRUE)
max.counts[max.counts == -Inf] <- NA

sum.mean <- sum(mean.counts, na.rm = TRUE)
sum.max <- sum(max.counts, na.rm = TRUE)

We note that there were no counts available in 2000 for 7 of the 56 sites. The Nmix model can elegantly estimate
abundance also for these sites, but the naive estimators will be biased. As a quick and dirty correction,
we increase the two naive estimators of total abundance proportionally, i.e., by 7/56.

sum.mean <- (1 + 7/56) * sum.mean
sum.max <- (1 + 7/56) * sum.max
sum.mean
sum.max

> sum.mean
[1] 1549.125
> sum.max
[1] 1647

sum.mean / out$mean$totalN
sum.max / out$mean$totalN
> sum.mean / out$mean$totalN
[1] 0.6630308
> sum.max / out$mean$totalN
[1] 0.7049216


Under the N-mix model, we estimate a total abundance of 2352 hares. Hence, the two naive estimators of abundance
underestimate true abundance by about one third. (Interestingly, it does not make a big difference whether
one takes the mean count or the max count ! )




4. Simulation exercise
   -------------------

This is a bigger project that may well be part of a thesis.