Lab 23 - PAUP* III - Likelihood, Modeltest

Zool 575 Introduction to Biosystematics, (Sikes) Winter 2006

This lab, and many of the subsequent labs, will not be graded. I will be available during lab to answer questions. You can complete this lab at your leisure but will need to use the computers in BI 182 in order to use PAUP* and Modeltest (although you can download Modeltest for free to any computer).

Introduction to Modeltest3.6

Modeltest is a program that interacts with PAUP* to test the fit of one's data to 56 different models of evolution. The program was written by David Posada and can be downloaded after filling out a quick registration form from the program website.

The basic overview of the procedure is as follows (but see further below for the details on how to do this):

1. Launch PAUP* and open and execute a datafile (without any search commands embedded inside). PAUP* should tell you that it's executed the file and will wait for further instructions.

2. Use PAUP* to open the file modelblockPAUPb10 which is inside the Modeltest3.6\paupblock folder. This file is full of commands to tell PAUP to analyze the loaded dataset with 56 different models of evolution. The scores of each analysis are saved into a file called model.scores that will appear in the same folder as the modelblockPAUPb10 file.

3. Get the program Modeltest to read in the model.scores file and it will spit out the results file and give it a name you specifiy which will contain the AIC and hLRT scores for the different models - telling you how well the different models fit your data. Open this output file in a text editor (PAUP will do) and you will see which model was chosen. Easy, right? In the results file there is also the PAUP commands to tell PAUP to use the best fitting model.

Refer to chapter 10 of your text for more information on this process - it was written by the author of the Modeltest program and is HIGHLY recommended.

How to run Modeltest on the PCs in the lab

1. Download the program:

go to

http://darwin.uvigo.es/software/modeltest.html

register and download the program (click get modeltest) - you will see a dialog box - click the SAVE button

Save it to the MY DOCUMENTS folder

The dialog box will appear when it done downloading - click the OPEN button (it will give you a warning message - click IGNORE)

2. Verify it is downloaded

Leave Internet Explorer and navigate to the MY DOCUMENTS folder

You should see a folder named Modeltest3.6 - open this folder, inside is a folder called Modeltest3.6 folder - open this one also

The application you will use (but via the Command Line - see below - do NOT double-click the modeltest program) is in the folder named BIN.

You don't need to do anything here except verify that the program downloaded OK and to learn where it is (this will be useful later).

3. Open PAUP* and execute the primate-mtDNA.nex file

The primate-mtDNA.nex file is in the PAUP* folder inside the SAMPLE NEXUS FILES folder. First open the file and look at the data - ALWAYS look at the datafiles to see what you are working with... when satisfied, execute the file.

You can use your own dataset instead if you prefer. However, be warned that to run 56 models of evolution on a large dataset can take some time, perhaps an hour or more. Also, this lab is written to work best with the primate datafile.

The primate dataset is too large to run quickly so you will exclude most of the data - do this by typing the command:

exclude 200-898;

This tells PAUP* to ignore the characters 200-898, leaving only 199 characters to be analyzed.

4. Use PAUP* to open in Edit mode the Modeltest PAUP block file

The block file is here:

My Documents\Modeltest3.6\Modeltest3.6 folder\paupblock\modelblockPAUPb10

open this file in PAUP* in edit mode and study it to see what it is - you will see a long series of commands that tell PAUP to conduct successive analyses on the loaded dataset using different models of evolution. Note that they begin with the most restrictive and least realistic (JC) models and proceed to the least restrictive and most realistic (GTR+I+G)

When you are done looking over the file execute it - PAUP will instantly begin running the analyses; for your reduced dataset of only 199 basepairs this should take less than 10 minutes to finish (note that the simple models go very fast and the complex models go much more slowly).

PAUP* will write the scores of each analysis into a file called model.scores that will be inside the same folder as the modelblockPAUPb10 file (inside the modeltest folder).

When PAUP* is done you can hide (minimize PAUP) and go to the paupblock folder inside the modeltest folder.

Move the file model.scores from the paupblock folder into the bin folder.

5. Use the command prompt (DOS) to run modeltest on the scores file you just made

Modeltest is not a graphical user interface (GUI) program on the PC - unfortunately you cannot use the mouse to run the program. But it is not hard to use - the hardest part is navigating through the windows file & directory system to get to the folder you need to be to run modeltest:

Go to the Start menu -> Programs -> Accessories -> Command Prompt

a black window will appear into which you will type commands to first navigate to the BIN folder (inside the modeltest folder) and once there, launch modeltest and have it analyze your scores file do this as follows:

navigate by using these DOS (disk operating system) commands

cd.. to go up (backwards)

cd foldername to go into that folder

dir to list the contents of a folder (see where you are)

See an example of my navigation results below. I have put comments in square brackets [ ].


Microsoft Windows 2000 [Version 5.00.2195]
(C) Copyright 1985-2000 Microsoft Corp.



C:\Documents and Settings\dsikes>cd..

[Navigate from the dsikes folder up one level to the Documents and Settings folder ].


C:\Documents and Settings>cd..

[ Navigate up one more level to the root of the C: drive].

C:\>dir

[ use 'dir' to list the contents of the root to see where you are and what's there].

Volume in drive C is Local Disk
Volume Serial Number is 0041-B557

Directory of C:\

02/02/2005 11:32a 320,518 admin
05/07/2003 03:35p <DIR> Bootpart
09/08/2004 02:48p 0 COMLOG.txt
02/24/2004 02:33p 671 devicetable.log
03/09/2005 03:03p <DIR> Documents and Settings
02/24/2004 09:34a <DIR> drv
02/26/2004 03:04p 151 liprefs.js
02/26/2004 03:02p <DIR> My Music
12/07/2004 09:41a <DIR> PopUp
02/08/2005 11:41a <DIR> Program Files
05/09/2003 02:01p <DIR> QUARANTINE
03/09/2005 03:05p <DIR> Temp
09/09/2004 10:40a <DIR> WUTemp
4 File(s) 321,340 bytes
9 Dir(s) 2,708,164,608 bytes free

[ Use cd to go into the Temp folder & then dir to list the contents of Temp].

C:\>cd Temp

C:\Temp>dir
Volume in drive C is Local Disk
Volume Serial Number is 0041-B557

Directory of C:\Temp

03/09/2005 03:05p <DIR> .
03/09/2005 03:05p <DIR> ..
02/16/2005 03:00p 465,900 10276.pdf
03/09/2005 03:05p <DIR> modeltest3.6
03/09/2005 03:04p 797,456 modeltest3.6.zip
11/17/2004 02:21p <DIR> My eBooks
02/18/2004 09:41a <DIR> My SAS Files
11/17/2004 02:29p <DIR> SAS Temporary Files
08/26/2004 09:27a <DIR> Security
02/18/2005 11:11a 616 z575PAUP1.nex
3 File(s) 1,263,972 bytes
7 Dir(s) 2,708,164,608 bytes free

[ If all has gone well you will see your destination - the Modeltest3.6 folder, cd into this folder].

C:\Temp>cd modeltest3.6

[ and use dir to see the contents].

C:\Temp\modeltest3.6>dir
Volume in drive C is Local Disk
Volume Serial Number is 0041-B557

Directory of C:\Temp\modeltest3.6

03/09/2005 03:05p <DIR> .
03/09/2005 03:05p <DIR> ..
03/09/2005 03:05p 82 ._Modeltest3.6 folder
03/09/2005 03:05p <DIR> Modeltest3.6 folder
03/09/2005 03:05p <DIR> __MACOSX
1 File(s) 82 bytes
4 Dir(s) 2,708,164,608 bytes free

[ cd into the Modeltest3.6 folder].

C:\Temp\modeltest3.6>cd Modeltest3.6 folder

[ and list its contents].

C:\Temp\modeltest3.6\Modeltest3.6 folder>dir
Volume in drive C is Local Disk
Volume Serial Number is 0041-B557

Directory of C:\Temp\modeltest3.6\Modeltest3.6 folder

03/09/2005 03:05p <DIR> .
03/09/2005 03:05p <DIR> ..
03/09/2005 03:05p 6,148 .DS_Store
03/09/2005 03:05p 82 ._bin
03/09/2005 03:05p 82 ._doc
03/09/2005 03:05p 82 ._license
03/09/2005 03:05p 82 ._paupblock
03/09/2005 03:05p 82 ._sample
03/09/2005 03:05p 82 ._source
03/09/2005 03:05p <DIR> bin
03/09/2005 03:05p <DIR> doc
03/09/2005 03:05p 0 Icon
03/09/2005 03:05p <DIR> license
03/09/2005 03:05p <DIR> paupblock
03/09/2005 03:05p 3,045 README.html
03/09/2005 03:05p <DIR> sample
03/09/2005 03:05p <DIR> source
9 File(s) 9,685 bytes
8 Dir(s) 2,708,164,608 bytes free

[ Finally you are where you need to be - cd into the bin folder].

C:\Temp\modeltest3.6\Modeltest3.6 folder>cd bin

[ and list its contents - you should see the model.scores file you moved there].

C:\Temp\modeltest3.6\Modeltest3.6 folder\bin>dir
Volume in drive C is Local Disk
Volume Serial Number is 0041-B557

Directory of C:\Temp\modeltest3.6\Modeltest3.6 folder\bin

03/09/2005 03:05p <DIR> .
03/09/2005 03:05p <DIR> ..
03/09/2005 03:05p 6,148 .DS_Store
03/09/2005 03:05p 126,976 Modeltest3.6.exe
03/09/2005 03:05p 130,954 Modeltest3.6.mac.app
03/09/2005 03:05p 7,929 model.scores
4 File(s) 272,007 bytes
2 Dir(s) 2,708,164,608 bytes free

[ Now that you are in the folder with the windows executable of modeltest you can type modeltest3.6 and the program will launch - see below for what the program will display once it launches].

C:\Temp\modeltest3.6\Modeltest3.6 folder\bin>modeltest3.6

Testing models of evolution - Modeltest 3.6
(c) Copyright, 1998-2004 David Posada (dposada@uvigo.es)
Facultad de Biologia, Universidad de Vigo,
Campus Universitario, 36310 Vigo, Spain
_______________________________________________________________
Wed Mar 9 15:07:52 2005
OS = Windows


No input file


HELP

Modeltest is a program for comparing models of evolution using lik
tests and the AIC criterion. The input are log likelihood scores.
raw scores or a Paup scores file resulting from the execution of t
ock of Paup commands (modelblock).

The program can also enter in a calculator mode for obtaining the
ated with the log likelihood ratio statistic for two given scores
ue for entered scores.

JC: Jukes and Cantor 1969
K80: Kimura 2-parameters, Kimura 1980 (also known as K2P)
TrNef: Tamura-Nei 1993 with equal base frequencies
K81: Kimura 3-parameters, Kimura 1981 (also known as K3ST
TIM: Transitional model with equal base frequencies
TVM: Transversional model with equal base frequencies
SYM: Symmetrical model, Zharkikh 1994
F81: Felsenstein 1981
HKY: Hasegawa-Kishino-Yano 1985
TrN: Tamura-Nei 1993
K81uf: Kimura 3-parameters with unequal base frequencies
TIM: Transitional model
TVM: Transversional model
GTR: General time reversible, Rodriguez et al 1990 (also known a

I: invariable sites G: gamma distribution

Usage: -d : debug level (e.g. -d2)
-a : alpha level (e.g., -a0.01) (default is a=0.01)
-n : sample size (e.g., number of characters). Forces the
e.g., -c345) (default is to use AIC)
-t : number of taxa. Forces to include branch lengths as
g., -t28) (default is not to count them)
-w : confidence interval for averaging (e.g., -w0.95) (de
)
-l : LRT calculator mode (e.g., -l)
-i : AIC calculator mode (e.g., -i
-f : input from a file for obtaining AIC values (e.g., -f
-? : help

UNIX/WIN usage: modeltest3.6 [-d -a -n -t -w -l -i -f -?] < infile >outfile


[ the last line above, called UNIX / WIN usage, tells you how to get modeltest to read in your scores (infile) & save the results to a file of your choosing (the outfile) follow the commands below - I've named the outfile 'myout' in this case ].

C:\Temp\modeltest3.6\Modeltest3.6 folder\bin>modeltest3.6 < model.scores > myout
C:\Temp\modeltest3.6\Modeltest3.6 folder\bin>

[ modeltest won't tell you anything after its done - you simply leave the command prompt window and go look in the bin folder to see if the results outfile (myout) is there].


6. Read the results (outfile) to see what model(s) of evolution were chosen as the best fit for your data

You can do this by opening the outfile (myout - or whatever name you gave it) in any text editor, such as PAUP*, or drop it into an internet explorer window which will display the contents

What model did the hLRT (hierarchical Likelihood Ratio Test) choose? Did the AIC choose the same model? Recent work has pointed out a number of drawbacks to the hLRT relative to the AIC and so when there are differences it is probably best to use the model chosen by the AIC. It would be wise to investigate the cause of the difference & perhaps try other model-testing methods (such as the Bayesian Information Criterion) as well. One might also want to analyze one's data using each of the different models to see if they yield different results (unlikely - similar models are expected to yield similar results). Did the AIC identify Among Site Rate Variation as an important pattern in your dataset? Based on the Gamma distribution shape parameter would you say there is a lot or a little ASRV in these data?

Did the AIC select the most complex model available? What is the log-likelihood score of the most complex model (GTR+I+G) and what is the score of the model chosen by the AIC? Look at the bottom of the outfile for the sectiont titled MODEL SELECTION UNCERTAINTY : Akaike Weights to find these scores. In this list note the column titled K which is the number of parameters estimated by the model. How many parameters are estimated by the best fitting model? How many by the GTR+I+G model?

7. Use PAUP* to conduct an analysis with the best fitting model

Note that in the outfile under the AIC scores there is this text:

PAUP* Commands Block: If you want to implement the previous estimates as likelihod settings in PAUP*, attach the next block of commands after the data in your PAUP file

Modeltest is a wonderful program - it not only finds the best fitting model but also provides the commands to implement that model in PAUP*. However, you do not need to attach the block of commands to your datafile for this lab (although this is a good idea if you are working with your own datafile because the commands will then always be in your datafile).

Note that the commands also include parameter estimates for the model - this will save PAUP* a lot of time in working out the log-likelihood scores of each tree it examines because it doesn't have to estimate the MLEs for these from scratch with each tree. However, to be really rigorous you should test these parameter estimated by letting PAUP estimate them from scratch at some point. Chances are they will be very similar to the ones Modeltest had PAUP estimate (modeltest uses a quick NJ tree to provide PAUP a tree on which to estimate the parameters - if you asked PAUP to estimate them WHILE it searches for a best tree you will get much more accurate estimates but it will be a slow search!)

Copy not the entire block of commands, but just the command line itself (unless you want to save the block into your own datafile) - then go back to PAUP. Type

hsearch

and PAUP* will do a quick parsimony search on your dataset, when done type

describetrees 1/plot=phylogram

to see the parsimony phylogram. What OTU is estimated as the sister lineage to Homo? (if you are analyzing the primate dataset)

Now change the optimality criterion from parsimony to likelihood by typing

set criterion = likelihood

then paste the likelihood settings commands that you copied from the modeltest outfile into the command line of PAUP

type

hsearch

and PAUP will use those likelihood settings to search for an optimal tree under the likelihood criterion using the best fitting model of evolution.

when done (note it takes much longer than Parsimony - but also note that it would be much slower if you asked PAUP to estimate those parameters on the fly) type:

describetrees 1/plot=phylogram

What OTU is estimated as the sister lineage to Homo?

ALL DONE - you have found a best fitting model of evolution and used PAUP to conduct a search using that model. There are many other points to consider and things to try with your data - such as performing a more thorough hsearch, or letting PAUP estimate the parameters on the fly to see if they differ, but this should get you started.