Lab 18 - Introduction to PAUP*: Distance methods & Parsimony

Biol 615 Systematics and Comparative Biology, (Sikes)

You will learn some of the basic principals of phylogenetic inference. You will perform both distance and parsimony analyses using a fabricated dataset that nicely shows some of the differences between these methods. PAUP* is a powerful although somewhat outdated Phylogenetic software package but there is no alternative as feature rich. MEGA has been touted as a PAUP alternative, is cross-platform and free, and has many features PAUP has and some it doesn't, but MEGA cannot analyze morphological data.

This shouldn't take very long but you have a week to complete this - hand in on or before next Friday.

NOTE: You will be using the UNIX version of PAUP so there will be very little mouse work.

To understand what is typed versus clicked etc: text that is intended for you to type at the command-line prompt is given in a plain fixed-width font. For example, weights 2:1stpos means that you should type the courier font text exactly as it appears (copying & pasting is often an option too). Hit return to tell PAUP* to do what you wrote. Questions for you to answer are in red.

Note also that there is a commands manual in PDF form that can be used to learn about the full set of available commands and expand your ability to use PAUP*. The file is named Cmd_ref_v2.pdf and is in the folder 'Docs' with the program. See also your text pages 289-312 for details on using PAUP*.

1. First start PAUP*

Open the Mac application 'Terminal' by double-clicking it (find it inside the Utilities folder inside the Applications folder). This gives you access to the UNIX underbelly of the Mac OS.

Now type cd and leave a space. Use your mouse to find and drag the folder with PAUP to the terminal window and drop it there. If successful a string of directory names separated by slashes will appear with the final directory name being paup4b10-ppc-macosx. This is a quick way to navigate to your directory of interest (rather than trying to type all those names correctly). To ensure this worked, type ls and hit return - you should see the contents of the PAUP directory listed.

To start PAUP* type

./paup

And it should spit back:

P A U P *
Portable version 4.0b10 for Unix

2. Make your own datafile using the dataset at the bottom of this page

Copy the dataset at the bottom of this page in its entirety starting from and including the #NEXUS command all the way to the bottom. Open the Mac text editor 'TextEdit' (should be inside the applications folder). There are much better text editors for data file manipulation (eg TextWrangler) but this will do for now. Open the TextEdit Preferences dialog and make sure the Format option is set to 'Plain Text'.

Create a new document. Paste the dataset into the new document & save it inside the PAUP folder with the name dinos.nex

Now study the datafile itself. Note there are blocks of information and specifications on how many taxa (OTUs) and how many characters there are. This is a morphological dataset so the characters are numbers which correspond to the character state (e.g. 0 = feathers absent, 1 = feathers present). There is also a specification of what group PAUP should consider the outgroup.

3. Execute the datafile you made

When PAUP executes a datafile it loads it into memory and is ready to do analyses on the data. If there are errors in the file (for example you forgot to start the file with #NEXUS) PAUP will complain and require you to fix the errors before proceeding. Execute the file by returning to the terminal and typing

exe dinos.nex

4. Use the parsimony optimality criterion to find optimal tree(s)

Recall the 3 ways to search for trees: exhaustive, branch and bound, and heuristic.

You do not need to tell PAUP to use parsimony because that is the default optimality criterion. To do an exhaustive search type into the command line:

alltrees

To see these trees type

showtrees 1-5

How many trees were examined? How many "best" trees were found? What was the length of the best tree(s)?

Now compare that search to a branch and bound search

bandb

and a heuristic search (with default settings)

hsearch

Do all the searches find the same number of "best" trees? How many trees (rearrangements) were examined by the heuristic search? (compare this to the number examined in the exhaustive search).

OK. You probably want to see the trees that were found. The first thing to do, since there were multiple trees found is to view a consensus tree (we'll cover these later in lecture) which summarizes all the trees in a single tree by showing only nodes that exist in all the trees found (a 'strict consensus').

contree

The consensus tree is a cladogram, ie it has no branch length information. You should get in the habit of always looking at branch lengths because they can sometimes be very important (as they are in this case). To see the branch lengths we have to view a phylogram. This can be done by typing:

describetrees 1/plot=phylogram

this tells PAUP to display a phylogram of tree 1. You may notice that the branch leading to the bird is very long relative to the others. This is what is called, not surprisingly, a "long branch." You will learn later how long branches can cause all sorts of problems with phylogenetic analyses.

5. Change the Optimality Criterion to Distance

Do this by typing

set criterion = distance

PAUP should tell you this was successful. (Note: This setting has no bearing on UPGMA and NJ searches in #6 & 7 below because they do not use an optimality criterion but this will matter for #8 below).

6. Perform a cluster analysis using the UPGMA algorithm

do this by typing

UPGMA

Recall that UPGMA makes rooted trees and note that it didn't like our chosen outgroup - the croc. What group was made into the outgroup / root of the tree? Why do you think this happened? (Guess)

What are some of the most obvious differences between the UPGMA tree and the parsimony trees above?

Note also that Parsimony found 5 equally parsimonious trees. UPGMA produced only a single tree. What is wrong with producing only a single tree for this dataset?

7. Perform a cluster analysis using the neighbor-joining algorithm

do this by typing

nj

How does the NJ compare to the UPGMA tree? Is it more similar to the UPGMA tree or the parsimony tree? What are the differences between the NJ tree and the parsimony trees? Again note that only one tree is produced. Type

showdist

to see the OTU x OTU matrix of distances that the NJ and UPGMA algorithms were using. What is the mean character distance between the bird OTU and the titanosaurus OTU?

8. Perform an optimality criterion tree search using distances

do this by typing

dset objective = ME

to set the optimality criterion to minimum evolution

then perform a heuristic search by typing

hsearch

To view the tree as a phylogram type:

describetrees 1/plot=phylogram

This is a more rigorous method than clustering. Does it find the same three trees as parsimony? What does it find? What results above are most similar to the minimum evolution results? What should one do when different methods give different results from the same dataset?

This has been a brief introduction - you'll learn about bootstrapping and other methods of assessing branch support later. Estimating branch support is important for various reasons but one of them is so that we can compare only strongly supported results from different methods.

Note: fyi the command in the PAUP block below pset collapse = no tells PAUP to change from its default setting of collapsing zero-length branches. Normally, the default is to collapse these branches. For this dataset the result is that of the 5 most parsimonious trees only three would normally be retained because zero-length branches collapse in two of the trees. You can see the difference if you delete that command from the dataset and re-execute the file and then do another alltrees search. You will see that PAUP retains only 3 trees and two of the trees have polytomies.


Dataset:

#NEXUS
BEGIN TAXA;
DIMENSIONS NTAX=9;

TAXLABELS
bird
deinonychus
compsognathus
trex
brontosaurus
brachiosaurus
titanosaurus
diplodocus
croc;
END;

BEGIN CHARACTERS;
DIMENSIONS NCHAR=20;
FORMAT SYMBOLS= " 0 1 2 3" MISSING=? GAP=- ;
MATRIX
[ 10 20]
[ . .]

bird 11111111111111122220
deinonychus 11100000000000011110
compsognathus 01100000000000011110
trex 00100000000000011110
brontosaurus 00000000000000000001
brachiosaurus 00000000000000000001
titanosaurus 00000000000000000030
diplodocus 00000000000000000030
croc 00000000000000000000
;
END;

begin paup;
outgroup croc;
pset collapse = no;

end;