Lab 23b - PAUP* III - Likelihood, Modeltest

Biol 615 Systematics and Comparative Biology, (Sikes)

This lab, and many of the subsequent labs, will not be graded. I will be available during lab to answer questions. You can complete this lab at your leisure but will need to use PAUP* and Modeltest (although you can download Modeltest for free to any computer).

Introduction to Modeltest3.7

Modeltest is a program that interacts with PAUP* to test the fit of one's data to 56 different models of evolution. The program was written by David Posada and can be downloaded after filling out a quick registration form from the program website.

The basic overview of the procedure is as follows (but see further below for the details on how to do this):

1. Launch PAUP* and open and execute a DNA datafile (without any search commands embedded inside). PAUP* should tell you that it's executed the file and will wait for further instructions.

2. Use PAUP* to execute the file modelblockPAUPb10 which is inside the Modeltest3.7\paupblock folder. (You should make a copy of it into your analysis folder) This file is full of commands to tell PAUP to analyze the loaded dataset with 56 different models of evolution. The scores of each analysis are saved into a file called model.scores that will appear in the same folder as the modelblockPAUPb10 file.

3. Get the program Modeltest to read in the model.scores file and it will spit out the results file and give it a name you specifiy which will contain the AIC and hLRT scores for the different models - telling you how well the different models fit your data. Open this output file in a text editor and you will see which model was chosen. Easy, right? In the results file there is also the PAUP commands to tell PAUP to use the best fitting model.

Refer to chapter 10 of your text for more information on this process - it was written by the author of the Modeltest program and is HIGHLY recommended.

How to run Modeltest

1. Download the program:

go to

http://darwin.uvigo.es/software/modeltest.html

register and download the program (click get modeltest) - you will see a dialog box - click the SAVE button

Save it to a newly made work folder.

NOTE: I have a copy if the link is dead. Also NOTE: There is a similar program, MrModelTest, that works in a similar fashion but only tests the 24 models that can be run in MrBayes while ModelTest compares 56 models. This runs faster and is useful if you are doing Bayesian analyses.

2. Verify it is downloaded

Leave your browser and navigate to the folder you made.

You should see a folder named Modeltest3.7 - open this folder, inside is a folder called Modeltest3.7 folder - open this one also

The application you will use (but via the Command Line - see below - do NOT double-click the modeltest program) is in the folder named BIN.

You don't need to do anything here except verify that the program downloaded OK and to learn where it is (this will be useful later).

3. Open PAUP* and execute the primate-mtDNA.nex file

The primate-mtDNA.nex file is in the PAUP* folder inside the SAMPLE NEXUS FILES folder. First open the file and look at the data - ALWAYS look at the datafiles to see what you are working with... when satisfied, execute the file.

You can use your own dataset instead if you prefer. However, be warned that to run 56 models of evolution on a large dataset can take some time, perhaps an hour or more. Also, this lab is written to work best with the primate datafile.

The primate dataset is too large to run quickly so you will exclude most of the data - do this by typing the command:

exclude 200-898;

This tells PAUP* to ignore the characters 200-898, leaving only 199 characters to be analyzed.

4. View the Modeltest PAUP block file

The block file is here:

My Documents\Modeltest3.6\Modeltest3.6 folder\paupblock\modelblockPAUPb10

open this file in a text editor and study it to see what it is - you will see a long series of commands that tell PAUP to conduct successive analyses on the loaded dataset using different models of evolution. Note that they begin with the most restrictive and least realistic (JC) models and proceed to the least restrictive and most realistic (GTR+I+G)

When you are done looking over the file execute it in PAUP which will instantly begin running the analyses; for your reduced dataset of only 199 basepairs this should take less than 10 minutes to finish (note that the simple models go very fast and the complex models go much more slowly).

PAUP* will write the scores of each analysis into a file called model.scores that will be inside the same folder as the modelblockPAUPb10 file and your datafile.

When PAUP* is done move the file model.scores from the paupblock folder into the ModelTest / bin folder.

5. Run modeltest on the scores file you just made

Modeltest on the mac is launched by double-clicking the file named Modeltest3.7.macX.app this will open a window that has two buttons on the lower left and two buttons on the lower right.

Click the button that says 'File' on the lower left & navigate to your model.scores file. Select it & click Open.

Now click the button that says 'File' on the lower right. ModelTest will ask you where you want to save the results output - navigate to a suitable folder for these results to appear (your work folder or the ModelTest Bin folder are good options) and click 'Save'.

A file named Modeltest3.7.macX.app.out will appear.



6. Read the results (outfile) to see what model(s) of evolution were chosen as the best fit for your data

You can do this by opening the outfile (Modeltest3.7.macX.app.out - or whatever name you gave it) in any text editor.

What model did the hLRT (hierarchical Likelihood Ratio Test) choose? Did the AIC choose the same model? Recent work has pointed out a number of drawbacks to the hLRT relative to the AIC and so when there are differences it is probably best to use the model chosen by the AIC. It would be wise to investigate the cause of the difference & perhaps try other model-testing methods (such as the Bayesian Information Criterion) as well. One might also want to analyze one's data using each of the different models to see if they yield different results (unlikely - similar models are expected to yield similar results). Did the AIC identify Among Site Rate Variation as an important pattern in your dataset? Based on the Gamma distribution shape parameter would you say there is a lot or a little ASRV in these data?

Did the AIC select the most complex model available? What is the log-likelihood score of the most complex model (GTR+I+G) and what is the score of the model chosen by the AIC? Look at the bottom of the outfile for the section titled MODEL SELECTION UNCERTAINTY : Akaike Weights to find these scores. In this list note the column titled K which is the number of parameters estimated by the model. How many parameters are estimated by the best fitting model? How many by the GTR+I+G model?

7. Use PAUP* to conduct an analysis with the best fitting model

Note that in the outfile under the AIC scores there is this text:

PAUP* Commands Block: If you want to implement the previous estimates as likelihod settings in PAUP*, attach the next block of commands after the data in your PAUP file

Modeltest is a wonderful program - it not only finds the best fitting model but also provides the commands to implement that model in PAUP*. However, you do not need to attach the block of commands to your datafile for this lab (although this is a good idea if you are working with your own datafile because the commands will then always be in your datafile).

Note that the commands also include parameter estimates for the model - this will save PAUP* a lot of time in working out the log-likelihood scores of each tree it examines because it doesn't have to estimate the MLEs for these from scratch with each tree. However, to be really rigorous you should test these parameter estimated by letting PAUP estimate them from scratch at some point. Chances are they will be very similar to the ones Modeltest had PAUP estimate (modeltest uses a quick NJ tree to provide PAUP a tree on which to estimate the parameters - if you asked PAUP to estimate them WHILE it searches for a best tree you will get much more accurate estimates but it will be a slow search!)

Copy not the entire block of commands, but just the command line itself (unless you want to save the block into your own datafile) - then go back to PAUP. Type

set criterion = parsimony

hsearch

and PAUP* will do a quick parsimony search on your dataset, when done type

describetrees 1/plot=phylogram

to see the parsimony phylogram. What OTU is estimated as the sister lineage to Homo? (if you are analyzing the primate dataset)

Now change the optimality criterion from parsimony to likelihood by typing

set criterion = likelihood

then paste the likelihood settings commands that you copied from the modeltest outfile into the command line of PAUP

type

hsearch

and PAUP will use those likelihood settings to search for an optimal tree under the likelihood criterion using the best fitting model of evolution.

when done (note it takes much longer than Parsimony - but also note that it would be much slower if you asked PAUP to estimate those parameters on the fly) type:

describetrees 1/plot=phylogram

What OTU is estimated as the sister lineage to Homo?

ALL DONE - you have found a best fitting model of evolution and used PAUP to conduct a search using that model. There are many other points to consider and things to try with your data - such as performing a more thorough hsearch, or letting PAUP estimate the parameters on the fly to see if they differ, but this should get you started.