Zool 575 Introduction to Biosystematics, (Sikes) Winter 2006

The area below contains questions I was asked during the term and my answers. Good luck studying!

February 21, 2006
1) How might you distinguish a hard polytomy form a soft polytomy in practice?
A soft polytomy is the result of insufficient data. The solution is then to get more data – if adding data makes the polytomy go away it was a soft polytomy. If adding data doesn’t make the polytomy go away, you can’t be sure what kind it is because there is always the hope that it is a soft polytomy and if you can add more data it’ll go away. However, some cases have been found where the investigators have estimated the amount of data needed and discovered that they would need more data than exists in the entire genome! In such a case the polytomy is considered hard.

2) What is corrected distance?
Corrected distances are distances that have been corrected for the departure form linearity that occurs due to saturation (multiple hits). Once corrected the distance should be closer to the actual distances (recall the difference between actual and observed distances).

March 5, 2006
3) With AIC, lowest score is best & with likelihood, highest score is best? The example in the notes show the likelihood score of 0.0000782 to be better than a score of 0.0000300, yet on the PAUP2 assignment it told us to choose the likelihood score that was lowest because smaller #'s represent higher likelihoods.
Yes, the lowest AIC is best. The highest likelihood is best. Note that all likelihoods are actually negative log likelihoods so the larger numbers actually look smaller until you remember they are negative (the absolute value of the number gets smaller but the real value of the number gets larger - closer to zero).

March 12, 2006
4) Is this a correct assumption: parsimony will MIX UP LONG BRANCHES WITH SHORT BRANCHES (when there is branch length heterogeneity) & IT WILL THINK ALL BRANCHES ARE HOMOGENEOUS, therefore choosing the incorrect topology.
No, Parsimony ignores branch lengths so it won't do anything necessarily with branches of different lengths but because it minimizes homoplasy it will fail by bringing together long branches that are long because they have experienced higher rates of evolution / homoplasy than other branches, and parsimony will join them to minimize the homoplasy.

5) If parsimony is a fast approximation to ML what is ME considered relative to parsimony? & what is NJ considered relative to ME, & how are these 4 methods are related to each other?
ME is parsimony for distance data.
NJ is a fast approximation to ME.
So......NJ is a fast approximation to ME which is parsimony for distance data and parsimony is a fast approximation to ML therefore all these methods are shortcuts and thus less rigorous methods to approximate the ML tree.

ME is slow relative to parsimony??

ME is fast because it is a distance method and these methods are much faster than character methods.

NJ is a bad way to estimate a topology, worse than ME??

Yes, NJ is worse than ME because it lacks an optimality criterion.

How do authors decide what specimen to designate as holotype? (Lecture 5, mid January)
They would choose a specimen that has all the characteristics that are shared within the species. These characters would be the ones used in the description of the species and when keying specimens out. The holotype may have characters that others in the species do not have, but it is important that it has all the characters shared by the species.

Some genes appear to produce more consistent trees than other genes. Can only these genes be used for an analysis since they are more reliable? Why bother ever using the rest? From lec 33, pg 6, slide 1 April 3
These genes will be consistent for only these taxa. Someone has to do the initial comparison between many genes in order to find this out. When working with another taxa, there is no guarantee that the same genes will be useful again. Another comparison will need to be made. Subsequent work will be easier though, if the initial comparison has been done.

With the new methods and technology that come out, do people go back and redo old analyses? (lec 37 April 12)
Most analyses are not redone and many phylogenetic trees exist for the same groups, many of which would now be considered incorrect. For relationships between taxa of high interest to many researchers and/or the public, old data sets tend to be re-analysed. For these relationships of particular interest there is more of a priority to have the most up to date evolutionary analysis.

1. Will the museum pay you to collect specimens?
Many of the specimens within the a museum are donated, and many people within the museum also work to collect specimens. There are probably some cases where a particular specimen of a rare taxa etc, may be purchased by a museum.

2. Why is phenetics used to build phylogentic trees if it does not infer phylogeny?
Phenetics is used only to compare similarity based upon measurement. Therefore the trees it builds are not phylogenetic, rather they are termed phenograms. These trees do not follow, or intend to depict evolution, they only show which taxa is most “similar” to another based on a large collection of measurements.

3. Would it be possible to pick a gene, that is so essential that literally no transitions or tranversions would be survivable and would this be able to reduce need for prediction of such changes.
Not really, because even the most essential genes can survive a certain amount of change. This is due to the wobble position having more than one possible nucleotide to code for the same amino acid, and also due to the fact that many transitions are not severe in effect. It is however possible, to select for a gene that evolves slower in which there is less chance of randomized data or overwritten changes. (saturation)

4. Is there a mathematical way that you could derive the maximal point in a tree space, which would allow you to have a gauge as to how close you were to that region?
Not that is known. This is 3D space rather than a linear function, where the first derivative would provide you with the global maxima. However algorithms have been develop which attempt to predict maximal or minimal tree space in the absence of homoplasy - however, since homoplasy is present to some degree in almost all data knowing the 'shortest tree without homoplasy' isn't that useful for real data.

5. Are molecular discoveries considered valid enough to allow changing of a taxonomic name, or rearranging of a taxon?
There are no rules. Anybody can change taxonomic names for whatever reasons they feel justify the change. If they did poor work others will ignore it or change it back, hopefully. Typically people describe new species based on morphological data, in question 12 you are given a more modern circumstance in which you have molecular data. The type of data one uses to do taxonomy doesn't matter - it's the conclusions you draw from the analysis of the data that matter.

January 18, 2006 (in-class question)
Would lectotype be considered a duplicate of a holotype?
Not necessarily, because lectotypes can come from a type series, which may consists of more than one species.

January 23, 2006 (in-class question)
Concerning the collections of specimens, does one have to be a professional researcher in order to send the collected samples to museums?
Anybody can send specimens, as long as they ask and contact the museum and tell them how the specimens have been collected (for example, is the specimen for research?) Furthermore, one should know how to send and label the specimens properly in order for the museum to accept the specimen.

Feb 24, 2006
Are ring species the 'overlapping' species that cannot interbreed, or are they the interbreeding populations that occur in the intermediate regions? Also, would you consider these species as a single species, or as separate species, since they do not interbreed?
A ring species is a single species composed of populations most of which can interbreed with each other but some (at the 'ends' of the ring) cannot.

A good write-up can be found at:


March 15, 2006
I was reviewing your notes and study questions and I am having a little trouble with the a priori weighting. Can unequal weighting also be considered an a priori weighting. And also, I don't understand why equal weighing is considered a subjective a priori weighting. Thanks!
'a priori' means essentially "before analysis" - so there are basically two types of weighting - unequal and equal. Both are decided before the analysis so both are types of a priori weighting. The decision is subjective because there is no agreed upon method to objectively determine what weighting scheme to use.

Rarely people will use 'a posterori' weighting in which the final tree is used to identify homoplasious characters that can then be downweighted so another search can be done that will yield a tree with greater support. This is part of a technique called "successive approximations" that I don't think we're covering in class. It's rarely used and for good reason - the outcome is highly dependent on the tree used to determine the weights, thus it's not a very good method but people use it when their dataset is full of homoplasy and they aren't getting a tree with good support.

Many people, however, would argue that equal weighting is less subjective than unequal. This is more true for morphological data than for molecular however. Molecular data has enough simple patterns of evolution that equal weighting can be considered a 'stronger' and less realistic weighting scheme than some forms of unequal weighting.

April 4, 2006 From what I remember you saying, there is typically no negative consequence in throwing away too much tree samples. However, why would there be negative consequences when not enough samples are thrown away (would it be detrimental to discard less than the 20% cut-off?)
There is one negative to throwing away too many: If your sample contains say, 20,001 trees and you throw away 20,000 trees then your consensus tree will be built from only ONE TREE and all the branch support values will be 100% (I've seen people do this by mistake! - because they've confused steps with samples - see below).

Throwing away too few trees will only be a problem if you (1) didn't run very many steps [say you ran only 50,000 steps] and (2) your burn-in is so small it includes some of the trees sampled before the chain really burned in. - The problem is those trees are not estimates of the probabilities and so they can influence and affect your estimates of the values.

There are newer ways to assess burnin. The new MrBayes uses 'splits frequencies' to compare two independent runs - when that value gets below 0.05 the chains of each run are sampling, supposedly, the same space & thus both are burnt in. There are other ways too. Always a good idea to determine where your burn-in tends to happen then go beyond to be safe:

For example, if you run 2 million steps sampling once every 100 steps and your burn-in seems to happen by step 10,000 there is no problem going to step 20,000 for your burn-in (NOTE: step 20,000 is equal to sampled tree number 2,000 so you would set the burnin = 2000 NOT 20,000!!!)

1.Date:Lecture 20

Q: I asked about the validity of the Likelihood model assumption that the rate of change is symmetrical.

A: This is possible to take into account as a 12-parameter model with GTR, which allows for different foreword and backward rates of change for a character. But here is what is said about these general 12-parameter models:

1. There is a problem in that the predicted base frequencies from the rates of change might not equal the base frequencies estimated from the data (so the model would need to be fixed within itself so this would be avoided)

2. As I said the searching would have to examine rooted trees rather than unrooted and this would consume far more computational time

3. "The model needs more work to determine whether there is any way of using it."


2.Date: Feb 13

Q: I asked about the possibility of concatenating exhaustive searches to eliminate the need for a heuristic search of a large number of taxa.
A: To combine these smaller datasets with different OTUs one would have to create a 'supertree' to combine them. Supertrees are made whenever there are taxa (OTUs) missing from different analysis but one wants to combine the trees. To do so one would have to have at least one OTU shared between every pair of subsets (if there were no shared OTUs there would be no way to combine the trees).  Supertrees are pretty controversial and definitely a last resort - few people consider them worth pursuing because they tend to be problematic in many ways.

Consensus trees can be made when you get different trees from the same set of OTUs (and these are common and not controversial).


3.Date: lecture 5

Q: How would it work to choose a lectotype from a type series with more than one species present?
A: Those specimens of a different species are eliminated from the type series.

 Q: What is the logic when investigating possible LBA, behind testing to
see if long branches join to different places when analyzed alone?
A: Take long branches out, then reanalyze. If they join to the same place in each others absence it can’t be due to LBA. Attraction is when one branch is moving the other branch. Farris zone – the branches
actually belong to each other.

Is the answer above correct because I feel it may be incomplete.

Yes it is correct. A branch can't "attract" another branch if it's not there! That said... long branches are troublesome even when alone - we can't assume that simply because the 'other' long branch is absent that the present long branch will stick to the 'correct' spot.

Q: If long branches are found to be sister taxa when analyzed using ML &
a best fitting model, what can one say about the possibility of LBA
As I mentioned in class, one can never be sure if one is suffering from LBA, although some cases have been clearly found (especially when MP joins long branches but ML doesn't). However, ML isn't immune to LBA so it will sometimes fail as well. If you've used the best model and used ML and the long branches are still together you can say either you are suffering from an unusually hard LBA problem (hard enough that ML won't help!) or the long branches belong together - (Farris zone).

Why is AIC better to use than ML to determine the process model. Is it because it decreases variance, due to favouring models with fewer free parameters?? Thank you
Yes, log-likelihood scores (the ML optimality criterion) tend to improve as the model increases in complexity - so using this measure alone would lead one to always choose the most complex model. Although this might be wise for Bayesian analysis, for ML analysis it is definitely preferable to use a model that is complex enough but not overly complex and the AIC helps one choose this model.

Just a quick question about lecture 22. You asked for the single most important thing you can do to get the true tree. I thought initially that it would be increasing your data set. But then I remembered that parsimony actually begins to fail as you add data ( if the data have certain characteristics) , so therefore if you were using that model, this would not be beneficial. So is the most important decision, choosing a model that correctly to fits your data, or something I haven’t even thought of yet??? Thank you
Your reasoning is correct - more data will not help if your analysis is suffering from systematic error due to model violation, thus use of a model that best fits the data is of top priority. After that, adding more data is critical, but the model selection is primary.

Just as an aside......Cummings et al. ('03) did a study of LBA & BI & ML.
The following is from the notes:

They concluded that performance difference was probably due to search strategy differences......The simulated data of Cummings et al. (’03) were simulated in perfect accordance w/the Ml model (parameter values fixed), but not in perfect accordance w/BI model, which assumes parameters aren’t fixed. BI’s assumptions were not met, so we don’t know if BI would still do worse than ML in the Felsenstein zone if its assumptions had been met.

Have they done this test w/BI assumptions met? Any statistical test is not valid unless the assumptions have been met (as my stat prof has been cementing into our heads this semester). How can you compare these 2 things when one isn't valid, otherwise the support for its result goes out the window?
No, they haven't tested BI by meeting its assumptions. Your comment about a statistical test not being valid unless it's assumptions are met may be true - but we can never know with real data if the assumptions are met for our phylogenetic methods. Thus we want to be very careful about model-fitting and use robust methods (that work well even when their assumptions are violated).

The results of their study are still of great interest because it shows, for a given dataset, how BI and ML compare when the tree is challenging (eg felsenstein zone).

I'm reading through lecture 28, & realized that the definitions of Type I and Type II errors are reversed. People fear making a type I error the most because if you reject a true hypothesis, no one is going to go back & re-test it. I've included a chart that helps me remember the errors. The slide following about BS & PP making errors, do they follow the definitions in the pevious slides?
The definitions I gave for type I and type II errors are correct - they are not reversed. They are based on their use in the Erixon et al. 2003 Sys Bio paper. This question was raised by others and after some investigation it became clear that there are two valid but opposite uses / definitions of these error types:

Yes, it's nutty but it seems phylogenetics is 'reject-support' research as described below, rather than 'accept-support' research:


To summarize:

In Reject-Support research:

The researcher wants to reject H0.
Society wants to control Type I error.
The researcher must be very concerned about Type II error.
High sample size works for the researcher.
If there is "too much power," trivial effects become "highly significant."

In Accept-Support research:
The researcher wants to accept H0.
"Society" should be worrying about controlling Type II error, although it sometimes gets confused and retains the conventions applicable to RS testing.
The researcher must be very careful to control Type I error.
High sample size works against the researcher.
If there is "too much power," the researcher's theory can be "rejected" by a significance test even though it fits the data almost perfectly.

My concluding slides about bayesian making more type I errors vs bootstraps making more type II errors were directly from the Erixon paper (which I've triple checked and indeed they are as I said they are). And because Bayesian is less conservative and thus 'more accepting' I concluded that type I errors is 'accepting a falsehood' and type II is rejecting a truth - the opposite of what most people learn for these error types.

I found on a statistics textbook webpage the following discussion: (http://www.statsoft.com/textbook/stpowan.html) which agrees with what I thought - false positives, accepting a falsehood, are here listed as type I errors, Type II errors are listed as rejecting a truth.

Personally, I think it's unfortunate that these error types are so complex - but for this course we will stick to the usage of Erixon et al 2003. Sorry if this confuses you for other Stats courses!

Can you let me know if I'm on the right track with these zones and what
they mean or if I have oversimplified it.

Felsenstein Zone when you have two unrelated long branches
ML and corrected distance methods have long branch
repulsion thus often producing a correct tree

parsimony has long branch attraction thus often producing an
incorrect tree

Farris zone when you have two sister taxa with long branches
ML has long branch repulsion thus often producing an incorrect tree

parsimony has long branch attraction thus often producing a correct

No, you have very much oversimplified it. There is no long branch repulsion.

Siddall 1998 tried to show repulsion with ML but Swofford et al 2001 demonstrated it was false. ML fails to find the correct tree in the Farris zone because there aren't enough data (synapomorphies) to do so. So it's not repulsion, it's being properly ambiguous /cautious when the data are weak. And Swofford et al showed that if enough data were provided for ML it would succeed in the Farris zone, and it would do so by properly counting synapomorphies (unlike MP).

ML succeeds in the Felsenstein zone because it is properly suspicious of data on long branches - ignoring most of it and relying instead on the few real synapomorphies it can find.

MP fails in the Felsenstein zone because of long branch attraction, yes. It mistakes homoplasy for synapomorphy/homology.

MP succeeds in the Farris zone for the same reason! (long branch attraction) - it mistakes homoplasy for synapomorphy/homology.

To reiterate: Given enough data, ML will succeed in either zone and it will do so by properly distinguishing homoplasy from homology. There is no 'repulsion' for ML the way there is 'attraction' for MP.

MP on the other hand, will think homoplasy is homology and will fail terribly in the felsenstein zone but succeed in the farris zone - and in both cases this will be due to it confusing homoplasy with homology.