This program simulates presence/absence data to be input to programs MARK or PRESENCE. It can be used to get an idea of how precise the estimates are for given sample effort or design, or the bias of estimates when heterogeneity exists. See the following papers for a description of the methods involved in estimating parameters from presence/absence data:
1MacKenzie, D. I., J. D. Nichols, G. B. Lachman, S. Droege, J. A. Royle and C. A. Langtimm. 2002. Estimating site occupancy rates when detection probabilities are less than one. Ecology 83: 2248-2255
2MacKenzie, D. I., J. D. Nichols, J. E. Hines, M. G. Knutson and A. B. Franklin. Estimating site occupancy, colonization and local extinction probabilities when a species is not detected with certainty. (Submitted to Ecology)
3Bailey LL, Hines JE, Nichols JD, MacKenzie DI (2007) Sampling Design Trade-offs in Occupancy Studies with Imperfect Detection: Examples and Software. Ecological Applications: Vol. 17, No. 1 pp. 281–290
Both survey designs can be implemented by GENPRES. Default input values are provided for the multi-season design, and the single-season design can be implemented by changing all values of EPS to 0.0 and all values of GAMMA to 0.0. In fact, the program determines which surveys belong to which season by examining the values of EPS. Anytime the value of EPS is 0.0, the succeeding survey is in the same season as the preceeding survey. When EPS is greater than 0.0, the succeeding survey is the first survey of a new season.
This situation can be easily modeled in the program. Simply specify the parameters for one of the sub-groups, then click the 'Add Group' button and enter the values for the other group.
This situation can be handled by setting the detection probability to zero for a group of sites in a particular season(s). When data are generated, these sites will contain a '.' corresponding to the surveys when they were not visited.
Design 1 Design 2 Design 3 ______________________ ______________________ ______________________ Num Num Num of of of Sites 1 2 3 4 Sites 1 2 3 4 Sites 1 2 3 4 12 xx -- -- -- 6 xx xx xx xx 6 xx -- xx -- 12 -- xx -- -- 6 xx -- -- -- 6 -- xx -- xx 12 -- -- xx -- 6 -- xx -- -- 6 xx -- -- -- 12 -- -- -- xx 6 -- -- xx -- 6 -- xx -- -- 6 -- -- -- xx 6 -- -- xx -- 6 -- -- -- xx Total s=48 sites s=30 sites s=36 sites
The default scenario indicates that there are 100 sites which are visited a total of 5 times. Initial occupancy is 75% and detection probability is 50% for each survey.
To simulate heterogeneity amoung sites, create multiple groups with different occupancy (psi) or detection (p) probabilities. To test design methods (as in the example above), create groups with detection probabilities equal to zero for surveys which are skipped ('--' in the diagram), and the desired detection probability for surveys which were not skipped ('xx' above).
For example, to simulate 'Design 1' above, change the number of surveys to 8 and numbe of sites to 12. Enter the desired occupancy probability, and detection probabilities for surveys 1 and 2. Enter '0.' for the detection probabilities for surveys 3 through 8 Then, add a 2nd group and change the detection probabilites to 0. for surveys 1 and 2. Change the detection probabilities to the desired value for surveys 3 and 4, and leave the rest at 0. Add groups 3 and 4, changing the detection probabilities to 0. for surveys which are skipped for that group.
When the 'Analyze w/ expected values' button is clicked, data will be generated for this situation and analyzed with program MARK. The output from program MARK will appear in a new window. If you were to look at the input data file, you would see a sequence of '1's and '0's indicating detection (1) or non-detection (0) for each survey. The number following the detection history is the number of sites which had that exact detection history. (Although in the real world there cannot be fractions of a site, the expected number of sites can be a fraction depending on the input values.)
The parameter estimates from program MARK appear at the end of the output file (scroll down to the end using "cntl-end").
There are several models available for each model type. Once a model-type is selected (for generation of data), a specific model must be chosen to analyze the data. This is done by selecting models under the 'Model' menu. You can generate the data by changing the parameter values in the input table, then choose the model to use to compute the estimates. This could be used to investigate the bias of the parameter estimates when data are generated with different values for each occasion, but analyzed with a model assuming constant values over time.
One of the last models in the 'model' menu, 'user-defined' allows you to analyze the generated data with your own customized model (e.g., model PSI(.),p(T), where p(T) indicates that detection probabilities are forced to a linear trend (logit scale) using the design-matrix). Click 'Help' to see a sample of a model using the design matrix.
The menu choices, 'Define model by name' allow you to specify models similar to the ones in the menu by entering a model name and letting GENPRES create the MARK input file based on whether '(.)' or '(t)' appears after each of the parameters, Psi, Gam, Eps, or p.
@echo off REM This file should be saved as simpres2.cmd REM It should be called by simpres1.cmd with REM 2 arguments (p,sim#) REM modelPart1.txt needs to be created beforehand, REM containg 'PROC TITLE' and 'PROC MODEL' statements REM needed for MARK input file. Hint: generate a MARK REM input file using GENPRES4_INT and copy 1st two lines. REM modelPart2.txt needs to be created containing REM the 'PROC ESTIMATE', PIM's and Design matrix REM specifications for MARK. (see previous hint). copy modelPart1.txt sim%2.mrk c:\progra~1\presence\genpres4.exe 5 100 .75 %1 %1 %1 %1 %1 0 0 0 0 type genpres.inp >>sim%2.mrk type modelPart2.txt >> sim%2.mrk c:\progra~1\mark\mark /MINIMIZE i=sim%2.mrk l=sim%2.out lines=0 REM ------------------------- end of simpres2.cmd file ----------------
@echo off REM This file should be saved as simpres1.cmd REM The following lines call simpres2.cmd where REM the first argument is the value of p, REM and the 2nd argument is the simulation number call simpres2 .5 1 call simpres2 .6 2 call simpres2 .7 3 call simpres2 .8 4 call simpres2 .9 5 REM Input and output files will be saved by simpres1.cmd REM as 'simNNN.mrk' and 'simNNN.out'.Since Genpres4.exe only generates data, some input must be added to the input file before running MARK. To see what text needs to be added before and after the data, run the Genpres4_int.exe program, and examine the input file (genpres.mrk) it creates. Then, use cut/paste to create the two partial files needed by simpres2.cmd. To do the simulations, open a command window (click 'Start' button, click 'Run', and type 'cmd') and type simpres1.
The format of the genpres4 command can be seen by typing 'genpres4' in a command window. Here is the format of the command:
usage genpres4 T N psi p p p... eps eps eps... gam gam gam ...[opts] or genpres4 T N psiA psiB1 psiB2 pA pB rA rB1 rB2 2SP or genpres4 T N psi p p p... theta theta theta... METH=M [opts] or genpres4 T N psi ssPsi1 ssPsi2 p1 p2 SSOCC=1 [opts] or genpres4 T N psi0... psi1... psi2... R... p1... p2... dlta... MULTISTATE genpres4 T N psi(0),psi(1),...p(1),p(2),... NSTATES=x MULTISTATE genpres4 T N p LAMBDA=x genpres4 T N pi psiA psiB etaAA etaBA gamAA gamAB gamBA gamBB epsAA epsAB epsBA epsBB pA pB INTEGRATED where T=number of sampling occasions, N=total number of surveyed sites, psi=probability species is present at a site, p=detection probability eps=extinction rate from sample i to i+1, gam=prop of extinct sites which colonize between sample i and i+1, opts: STOCHASTIC, P0MISS, LIST, QUIET NOBS=n SIMTYPE=2 STOCHASTIC - data simulated instead of expected value data generated, P0MISS - missing value in history instead of zero when P=0, LIST - list histories on screen, QUIET - don't print much on screen, NOBS=n - set history frequency = n, instead of 1, SIMTYPE=2 - simulate by individual sites, instead of recursively by group. psi0= prob adults of species are present|not present in i-1 psi1= prob adults of species are present|adults present in i-1 psi2= prob adults of species are present|breeders present in i-1 R = prob breeders of species are present|adults present p1 = prob of detecting non-breeding adults p1 = prob of detecting breeding adults dlta= prob of detecting young with breeding adults psiAB= prob of both species present N psiA= prob species A present, regardless of species B / \psiA psiB= prob species B present, regardless of species A x A psiB1=prob species B present | species A present / \ / \psiB1 psiB2=prob species B present | species A not present 0 B A AB psiB=psiB2*(1-psiA)+psiAB pA=prob detect A | only A present pB=prob detect B | only B present rA=prob detect A, regardless of B rB1=prob detect B | detected A rB2=prob detect B | not detected A AB rAB=prob detect B | both species present / \rA rAb=prob detect A, not B|both species present 0 A raB=prob detect B, not A|both species present / \ / \rB1 rab=prob detect neither |both species present 0 2 1 3 psi(0)= vector of psi for just before occasion 1 psi(1)= matrix of psi for just before occasion 2 p(1)= matrix of p at occasion 1 psi(i,r,s)= [in state r @ i] [in state s @ i-1] p(i,r,s)= [cap in r @ i] [true state=s]The previous example is very basic and requires no additional software (except MARK). After running the simulations, you will have to decide what you would like to save from each output file.
The other method, which requires installing other software, allows you to do a lot more. The idea is to use a text manipulation/scripting language to create each input file in a loop, run MARK or PRESENCE, and save the desired output in a spreadsheet. I usually use gawk for this, but Perl, R, Python, or others will also work. (All of the ones listed are available for free.) Here's a sample script written in 'awk':
# # genpresBatchExample.awk - genpres batch example script # BEGIN { Q = sprintf("%c",34); ### Q is double-quote character ### loop with p=0.5, 0.6, 0.7, 0.8, 0.9... for (p=0.5; p<0.99; p=p+0.1) { isim++; print "\n\n** SIMULATION ",isim,"\n"; ### generate data by calling genpres4.exe with 5 surveys, 100 sites, psi=0.75, ### and value of p a="cmd /c Q "c:\\progra~1\\presence\\genpres4" Q " 5 100 .75 " p " " p " " p " " p " " p " 0 0 0 0 "; print a; system(a); ### next, tack on MARK commands at beginning of file ### and save to new MARK input file (genprestmp.mrk) b="proc title simulated data 5 100 .75 .5 .5 .5 .5 .5 0 0 0 0;\n" b=b "proc chmatrix occasions=5 groups=1 etype=Occupancy hist=32;\n" b=b "glabel(1)=Group 1;" print b > "genprestmp.mrk"; ### copy output from genpres4.exe to end of new MARK input file (genprestmp.mrk) while (getline < "genpres.inp") print > "genprestmp.mrk"; close("genpres.inp") ### next, tack on MARK commands at end of file... b="proc estimate link=Sin NOLOOP varest=2ndPart;\n" b=b " model={psi,p(.)};\n" b=b " group=1 Psi rows=1 cols=1 Square Constant; 1;\n" b=b " group=1 p Session 1 rows=1 cols=5 Square Constant=2;\n" b=b "design matrix constraints=2 covariates=2 identity;\n" b=b " blabel(1)=Psi; rlabel(1)=Psi;\n" b=b " blabel(2)=p; rlabel(2)=p;\n" b=b "proc stop;" print b > "genprestmp.mrk"; close("genprestmp.mrk") ### now, run MARK on new MARK input file... a="cmd /c " Q "c:\\progra~1\\mark\\mark" Q " /MINIMIZE i=genprestmp.mrk l=mark.out lines=0" print a; system(a) ### Scan MARK output file for estimates and output ### estimates to a spreadsheet file (genprestmp.csv) while (getline < "mark.out") { if (index($0," 1:Psi")+index($0," 2:p")>0) print isim "," p "," $1 "," $2 "," $3 > "genprestmp.csv" } close("mark.out"); } close("genprestmp.csv"); }The install program will copy these example files to the "c:\Program Files\Presence" folder. You will probably have to copy them to your "My Documents" (or some other folder) for them to work, as Windows usually restricts write access to the "Program Files" folders.