Lab 18 - Introduction to PAUP*: Distance methods & parsimony
Zool 575 Introduction to Biosystematics, (Sikes) Winter 2006
You will learn some basic commands to use the software package PAUP*. You will perform both distance and parsimony analyses using a fabricated dataset that nicely shows some of the differences between these methods.
This shouldn't take very long but you have over a week to complete this - hand in on the day of the midterm (Monday, 27 Feb).
NOTE: To understand what is typed versus clicked etc - 1) menus, menu items, and items contained in dialog boxes or elsewhere on the screen are given in a bold san serif font. For example, the text > File > Open means click "File" from the main menu and then select "Open" from the menu items under "File." 2) text that is intended for you to type at the command-line prompt or into a dialog box is given in a plain fixed-width font. For example, weights 2:1stpos means that you should type the courier font text exactly as it appears. Hit return to tell PAUP to do what you wrote. Questions for you to answer are in red.
Note also that there is a commands manual in PDF form that can be used to learn about the full set of available commands and expand your ability to use PAUP*. The file is in the folder with the program. See also your text pages 182-195 for details on using PAUP*.
1. First start PAUP*
Go to the "start" menu of the Windows operating system and choose 'Biology' -> 'PAUP 4.0' -> 'Win 32' -> 'PAUP 4.0 beta 10 Win'. This will start the program PAUP and it will ask for a datafile to open. Click cancel because you are going to make your own datafile.
2. Make your own datafile using the dataset at the bottom of this page
Copy the dataset at the bottom of this page in its entirety starting from and including the #NEXUS command all the way to the bottom. Go to PAUP and choose the menu
> File > New
To create a new document. Paste the dataset into the new document using the right mouse button options & save it in your documents folder with the name z575PAUP1
Now study the datafile itself. Note there are blocks of information and specifications on how many taxa (OTUs) and how many characters there are. This is a morphological dataset so the characters are numbers which correspond to the character state (e.g. 0 = feathers absent, 1 = feathers present). There is also a specification of what group for PAUP to consider the outgroup.
3. Execute the datafile you made
When PAUP executes a datafile it loads it into memory and is ready to do analyses on the data. If there are errors in the file (for example you forgot to start the file with #NEXUS) PAUP will complain and require you to fix the errors before proceeding. Execute the file using the menu
FILE > Execute filename
4. Use the parsimony optimality criterion to find optimal tree(s)
Recall the 3 ways to search for trees: exhaustive, branch and bound, and heuristic.
You do not need to tell PAUP to use parsimony because that is the default optimality criterion. To do an exhaustive search type into the command line:
How many trees were examined? How many "best" trees were found? What was the length of the best tree(s)?
Now compare that search to a branch and bound search
and a heuristic search (with default settings)
Do all the searches find the same number of "best" trees? How many trees (rearrangements) were examined by the heuristic search? (compare this to the number examined in the exhaustive search).
OK. You probably want to see the trees that were found. The first thing to do, since there were multiple trees found is to view a consensus tree (we'll cover these later in lecture) which summarizes all the trees in a single tree by showing only nodes that exist in all the trees found (a 'strict consensus').
The consensus tree is a cladogram, ie it has no branch length information. You should get in the habit of always looking at branch lengths because they can sometimes be very important (as they are in this case). To see the branch lengths we have to view a phylogram. This can be done by typing:
this tells PAUP to display a phylogram of tree 1. You may notice that the branch leading to the bird is very long relative to the others. This is what is called, not surprisingly, a "long branch." You will learn later how long branches can cause all sorts of problems with phylogenetic analyses.
5. Change the Optimality Criterion to Distance
Do this by typing
set criterion = distance
PAUP should tell you this was successful.
6. Perform a cluster analysis using the UPGMA algorithm
do this by typing
Recall that UPGMA makes rooted trees and note that it didn't like our chosen outgroup - the croc. What group was made into the outgroup / root of the tree? Why do you think this happened? (Guess)
What are some of the most obvious differences between the UPGMA tree and the parsimony trees above?
Note also that Parsimony found 3 equally parsimonious trees. UPGMA produced only a single tree. What is wrong with producing only a single tree for this dataset?
7. Perform a cluster analysis using the neighbor-joining algorithm
do this by typing
How does the NJ compare to the UPGMA tree? Is it more similar to the UPGMA tree or the parsimony tree? What are the differences between the NJ tree and the parsimony trees? Again note that only one tree is produced. Type
to see the OTU x OTU matrix of distances that the NJ and UPGMA algorithms were using. What is the mean character distance between the bird OTU and the titanosaurus OTU?
8. Perform an optimality criterion tree search using distances
do this by typing
dset objective = ME
to set the optimality criterion to minimum evolution
then perform a heuristic search by typing
To view the tree as a phylogram type:
This is a more rigorous method than clustering. Does it find the same three trees as parsimony? What does it find? What results above are most similar to the minimum evolution results? What should one do when different methods give different results from the same dataset?
This has been a brief introduction - you'll learn about bootstrapping and other methods of assessing branch support later. Estimating branch support is important for various reasons but one of them is so that we can compare only strongly supported results from different methods.
FORMAT SYMBOLS= " 0 1 2 3" MISSING=? GAP=- ;
[ 10 20]
[ . .]