Coding inapplicables as missing (?) versus extra states:

Response to Maddison, W. 1993. Missing data versus missing characters in phylogenetic analysis. Sys. Bio. 42: 576-581.

 Derek Sikes

15 Nov 1999


This paper addresses issues associated with coding inapplicable characters as missing and presents a case in which such a coding would have an undesirable effect. Below I argue that the effect is either desirable or, if undesirable, then it is easily avoidable (in a fashion unmentioned by Maddison (1993), which retains missing codings for the inapplicable taxa).


Many characters support the topology shown but the terminal clade on the upper left is unresolved. (Taxa 1-4 and 11-14 have tails, the others don't). Characters that would resolve the left clade are involved in a complex inapplicable to the basal lineages of both the left and right clades. In this case the complex involves presence of a tail and color of the tail.


When the basal lineages are coded as missing for tail color the computer treats the tree reconstruction as if the taxa coded as missing were absent from the tree (taxa 5-10 missing):



In which case the parsimonious resolution would be to have the state blue be plesiomorphic for both clades.

Maddison stated

"In this example the two tailed clades have influenced each other, even though they are widely separated on the tree, because the intervening taxa with missing data allowed the influence to leak through." p.577

And continued to point out that in this case the ancestor of both clades was presumably tailless, and that the presence of tails in these two taxa should be considered in isolation.

Indeed, I agree they should be considered in isolation, because, in fact, the character state "tail present" isn't homologous between the left and right clades. This is analogous to using the character state "eyes present" for a tree including cephalopods and vertebrates. Clearly there is sufficient evidence that tails evolved twice in this tree and one should expect that sufficient examination of these taxa would show that the tails differ in some fashion- enough to break this one character complex into two complexes:

tail: present/absent

tail color: red/blue


tailx: present/absent

tailx color: red/blue


taily color: red/blue

In this case, in which the state TAIL PRESENT isn't homologous I argue that the solution is to remove the homoplasy which in turn solves any problems the missing codings might have introduced and allows each appearance of tails to be treated in isolation (as Dr. Maddison argues they should). (It will also increase the CI of the tree and improve the bootstrap scores).

What if the ancestor was tailed? i.e. the state TAIL PRESENT was homologous? In this case, the reconstruction of blue for plesiomorphic on the right side would influence the resolution of the left side (with the most parsimonious assignment of blue as plesiomorphic for the left also). This effect is no longer undesirable- in fact the only evidence we have on the evolution of tail color (data of the right clade) indicates that the ancestor (most parsimoniously ) had a blue tail and thus the right clade influences the left but for the correct reasons.



The case Dr. Maddison presented is both rare and a result of improper assessment of homologies, which if corrected, removes the undesirable effects of coding inapplicables as missing.