Derek Sikes
14 June 1999
This PAUP log demonstrates the consequences of three ways of coding taxa that are inapplicable for a suite of character states in a complex (e.g. a complex structure that has, in this example 9 characters associated with it)
circumstances: 22 taxa, 3 genera (indicated by the numbers 1, 2, 3 on the trees below), a character complex which has 10 characters- (#26-36) the first character is simply presence or absence of the character complex itself, the other nine are details of the complex. This complex can either be apomorphic (as in 1. below) or plesiomorphic (as in 2. below).
results in brief:
1. If the complex is apomorphic (states red & black below) for the entire set of taxa studied (and the absence is thus plesiomorphic) then there is no effect on topology between coding the inapplicables as missing (?) versus an extra state. This is because algorithmically there is no difference between plesiomorphic states and missing states- neither can alter the topology of the tree.
2. However, if the complex is plesiomorphic for the set of taxa (blue,black and red states) and some of the taxa lack the complex-green on the tree below (imagine, as in this case, 3 independent loses of the complex and thus the nine characters associated with the complex are inapplicable for those taxa) then there is a significant difference between the two ways of handling inapplicables- as can be seen in the examples below using the missing (?) coding produces no topological differences from the first two cases (apomorphic complex missings & extra states) and correctly reconstructs the three independent loss events whereas coding the inapplicables with an extra state greatly alters the topology, in this case reducing two monophyletic taxa to paraphyletic taxa and resulting in 4 shortest trees instead of only 1, although the trees produced did correctly depict 3 independent events, the consequences on the rest of the tree were quite destablizing & profound.
Recommendation: use missing codings (?) which are less likely to introduce homoplasy (the choice, when the complex is plesiomorphic is this: if there are independent losses of the complex then missings will be more likely to reconstruct this than extra states, if there was only a single loss then missings might weaken the support for the clade relative to extra state codings but missings cannot alter the topology whereas extra states CAN alter the topology. In other words, regardless of the circumstances (complex plesio vs apomorphic, one vs many indepdendent losses) it is safer to use missings (?) than extra states.
Some might argue that if there was a single loss, having many characters with the extra state of (complex absent) will greatly strengthen the support for that clade, however, this is simply a case of character bloating- turning a single character (complex absent) into nine identical characters, and thus artifically strengthening the support of the clade.
A third method of coding inapplicables, the assignment of unique states (autapomorphies) for each taxon lacking the character complex, has two apparent advantages- search algorithms will not consider the 9 states due to the lack of the character complex to be 9 synapomorphies, and character mappings will not reconstruct impossible ancestral states (i.e. the computer will not 'fill in' states, for say tail color, for taxa that lack tails). The last test presented below shows that this method works for this dataset. A possible disadvantage is the abberrant branch-lengths, which will reflect the numerous autapomorphic changes for all the inapplicable taxa (making these into long-branches, [which cannot possibly attract each other because no two long branches share any autapomorphies]).
The following is the edited PAUP log of the four file types:
1. complex apomorphic with inapplicables coded as missings2. complex apomorphic with inapplicables coded as extra states
3. complex plesiomorphic with inapplicables coded as missings
4. complex plesiomorphic with inapplicables coded as extra states
5. complex plesiomorphic with taxa given separate, autapomorphic, states for each inapplicable character.
P A U P *
Version 4.0b2 for Macintosh
Sunday, 13 June 1999 2:15 PM
This copy registered to: Chris Simon
University of Connecticut
Processing of file "complex apo ?" begins...
This file codes the character complex as apomorphic and all species that are inapplicable for the complex are coded as missing (?) for the states of the complex-
MATRIX download complete nexus file
a1 11221111111111111111111111??????????
b1 14225111111115511232211111??????????
c1 14325111114115511232211111??????????
d1 14325111114115511232211111??????????
e1 11332114111115511322111111??????????
f1 11332334111115511322111111??????????
g1 11332334111115511322111111??????????
h2 225462633362422224414222322222222222
i2 225462633362422224414222322222222222
j2 225466233352422425545333223223223222
k2 225466233352422425545333223223223222
ja2 225766233352422425545333222322322322
jb2 225766233352422425545333222322322322
l2 224134424332222211111333122232232233
m2 224134424332222231111333122232232233
n2 224134424332222231111333122232232233
o3 33164551211333331111311111??????????
p3 33164551211333331111311111??????????
q3 33154115321333431113311111??????????
r3 33154115321333431113311111??????????
s3 33154115312334331113311111??????????
t3 33154115312334331113311111??????????
;
END;
Data matrix has 22 taxa, 36 characters Branch-and-bound search settings: Branch-and-bound search completed: Tree description: Unrooted tree(s) rooted using outgroup method Tree number 1 (rooted using default outgroup) Tree length = 97 |
---------------------------------------------------------------------------
Processing of file "complex apo +" begins...
This file codes the character complex as apomorphic and all species that are inapplicable for the complex are coded with an additional state for the states of the complex-
MATRIX download complete nexus file
a1 112211111111111111111111111111111111
b1 142251111111155112322111111111111111
c1 143251111141155112322111111111111111
d1 143251111141155112322111111111111111
e1 113321141111155113221111111111111111
f1 113323341111155113221111111111111111
g1 113323341111155113221111111111111111
h2 225462633362422224414222322222222222
i2 225462633362422224414222322222222222
j2 225466233352422425545333223223223222
k2 225466233352422425545333223223223222
ja2 225766233352422425545333222322322322
jb2 225766233352422425545333222322322322
l2 224134424332222211111333122232232233
m2 224134424332222231111333122232232233
n2 224134424332222231111333122232232233
o3 331645512113333311113111111111111111
p3 331645512113333311113111111111111111
q3 331541153213334311133111111111111111
r3 331541153213334311133111111111111111
s3 331541153123343311133111111111111111
t3 331541153123343311133111111111111111
;
END;
Data matrix has 22 taxa, 36 characters Processing of file "complex apo +" completed. Branch-and-bound search settings: Branch-and-bound search completed: Tree description: Unrooted tree(s) rooted using outgroup method Tree number 1 (rooted using default outgroup) |
note that the only difference between the results of this and using (?)-(prior file) is that the CI is slightly higher but the topology is identical- this is because the lack of the complex is plesiomorphic and plesiomorphies do not influence tree topology so they do not differ from missing codings which also don't influence tree topology |
---------------------------------------------------------------------------
Processing of file "complex plesio ?" begins...
[This file codes the character complex as plesiomorphic and all species that are inapplicable for the complex are coded as missing (?) for the states of the complex-
MATRIX download complete nexus file
a1 112211111111111111111111111111111111
b1 142251111111155112322111111111111111
c1 14325111114115511232211112??????????
d1 14325111114115511232211112??????????
e1 113321141111155113221111112211222122
f1 113323341111155113221111112211222122
g1 113323341111155113221111112211222122
h2 225462633362422224414222313322333233
i2 225462633362422224414222313322333233
j2 22546623335242242554533322??????????
k2 22546623335242242554533322??????????
ja2 225766233352422425545333213333333333
jb2 225766233352422425545333213333333333
l2 224134424332222211111333114444111444
m2 224134424332222231111333114444111444
n2 224134424332222231111333114444111444
o3 331645512113333311113111111166666611
p3 331645512113333311113111111166666611
q3 33154115321333431113311112??????????
r3 33154115321333431113311112??????????
s3 331541153123343311133111111166666611
t3 331541153123343311133111111166666611
;
END;
Data matrix has 22 taxa, 36 characters Processing of file "complex plesio ?" completed.
Branch-and-bound search settings:
Branch-and-bound search completed: Tree description: Unrooted tree(s) rooted using outgroup method Tree number 1 (rooted using default outgroup) |
note that the topology is identical to that of the previous two datafiles- and as with the previous two files only 1 MPT is found and the groups 1,2,3 are each monophyletic |
---------------------------------------------------------------------------
Processing of file "complex plesio +" begins...
This file codes the character complex as plesiomorphic and all species that are inapplicable for the complex are coded with an additional state for the states (nine characters) of the complex-
MATRIX download complete nexus file
a1 112211111111111111111111111111111111
b1 142251111111155112322111111111111111
c1 143251111141155112322111125555555555
d1 143251111141155112322111125555555555
e1 113321141111155113221111112211222122
f1 113323341111155113221111112211222122
g1 113323341111155113221111112211222122
h2 225462633362422224414222313322333233
i2 225462633362422224414222313322333233
j2 225466233352422425545333225555555555
k2 225466233352422425545333225555555555
ja2 225766233352422425545333213333333333
jb2 225766233352422425545333213333333333
l2 224134424332222211111333114444111444
m2 224134424332222231111333114444111444
n2 224134424332222231111333114444111444
o3 331645512113333311113111111166666611
p3 331645512113333311113111111166666611
q3 331541153213334311133111125555555555
r3 331541153213334311133111125555555555
s3 331541153123343311133111111166666611
t3 331541153123343311133111111166666611
;
END;
Data matrix has 22 taxa, 36 characters Processing of file "complex plesio +" completed.
Branch-and-bound search settings:
Branch-and-bound search completed: Tree description: Unrooted tree(s) rooted using outgroup method
Tree length = 147 |
50% majority rule consensus. Note that instead of 1 MPT there are now 4 and two once monphyletic groups have broken into paraphyletic groups. This is due to the algorithm considering the absence of the complex an apomorphy and although in this case there were 3 independent events (loss of the complex) the computer wants to put these 3 lineages together because they have apomorphies for all nine characters-) |
---------------------------------------------------------------------------
Processing of file "complex plesio +(autapos)" begins...
This file codes the character complex as plesiomorphic and all species that are inapplicable for the complex are coded with an additional state for the states (nine characters) of the complex-however, each taxon is given an autapomorphic state to prevent the unwanted consideration of the absences as synapomorphies.
MATRIX download complete nexus file
BEGIN CHARACTERS;
DIMENSIONS NCHAR=36;
FORMAT SYMBOLS= " 0 1 2 3 4 5 6 7 8 9 A B" MISSING=? GAP=- ;
CHARSTATELABELS
1 c1, 2 c2, 3 c3, 4 c4, 5 c5, 6 c6, 7 c7, 8 c8, 9 c9, 10 c10, 11 c11,;
MATRIX
[ 10 20 30 ]
[ . . . ]
a1 112211111111111111111111111111111111
b1 142251111111155112322111111111111111
c1 143251111141155112322111125555555555
d1 143251111141155112322111126677777766
e1 113321141111155113221111112211222122
f1 113323341111155113221111112211222122
g1 113323341111155113221111112211222122
h2 225462633362422224414222313322333233
i2 225462633362422224414222313322333233
j2 225466233352422425545333227788888877
k2 225466233352422425545333228899999988
ja2 225766233352422425545333213333333333
jb2 225766233352422425545333213333333333
l2 224134424332222211111333114444111444
m2 224134424332222231111333114444111444
n2 224134424332222231111333114444111444
o3 331645512113333311113111111166666611
p3 331645512113333311113111111166666611
q3 3315411532133343111331111299AAAAAA99
r3 33154115321333431113311112AABBBBBBAA
s3 331541153123343311133111111166666611
t3 331541153123343311133111111166666611
;
END;
Processing of file "complexplesio+(autapos)"
completed. Branch-and-bound search completed: Tree description: Optimality criterion = maximum parsimony Of 36 total characters: Tree number 1 (rooted using default outgroup) Tree length = 182 |
note that the topology is identical to that of the first three datafiles- and as with the first three files only 1 MPT is found and the groups 1,2,3 are each monophyletic- however, note the branch lengths are now quite different, to reflect the numerous costs due to the autapomorphies. |