If species are real, we need a way to draw them. Cladistics is biology's family-tree algorithm — shared derived characters all the way down.
Every sub-topic below feeds at least one of these questions.
What tools are used to classify organisms into taxonomic groups?
How do cladistic methods differ from traditional taxonomic methods?
An extra 9 sub-topics for HL — same syllabus, deeper mechanism.
Need for classification of organisms
Difficulties classifying organisms into the traditional hierarchy of taxa
Advantages of classification corresponding to evolutionary relationships
Clades as groups of organisms with common ancestry and shared characteristics
Gradual accumulation of sequence differences as the basis for estimates of when clades diverged from a common ancestor
Examples can be simple and based on sample data to illustrate the tool.
Analysing cladograms
Using cladistics to investigate whether the classification of groups corresponds to evolutionary relationships
Classification of all organisms into three domains using evidence from rRNA base sequences
Classification is biology's filing system. Once you know an organism's group, you know hundreds of its properties without having to study them individually.
The estimated number of species on Earth is about 8.7 million. Classification organises this overwhelming diversity into a nested hierarchy, so that any further study can build on what we already know about the group. Identify a newly discovered mammal — and you instantly know it has hair, mammary glands, three middle-ear bones, a four-chambered heart, and gives live birth (with rare exceptions). That's the leverage classification gives.
Domain → Kingdom → Phylum → Class → Order → Family → Genus → Species. Memorable but problematic: the ranks don't always reflect evolutionary patterns.
"Do Kindly Place Cover On Fresh Green Spring vegetables" — Domain, Kingdom, Phylum, Class, Order, Family, Genus, Species.
A famous example: the traditional class Reptilia (turtles, crocodiles, snakes, lizards) is not a clade. Birds — which Linnaeus put in a separate class — are more closely related to crocodiles than crocodiles are to turtles. To make Reptilia a clade, you'd need to include birds in it. Either reptiles is paraphyletic (excludes some descendants) or it doesn't exist as a proper taxonomic group.
Switching from traditional ranked classification to unranked clades based on common ancestry is a paradigm shift — a fundamental change in how the field organises its knowledge.
If classification follows evolutionary ancestry, members of a group truly share traits because they inherited them from a common ancestor.
The ideal classification follows real evolutionary history. In an evolution-based classification, every taxonomic group is a clade — a common ancestor plus all its descendants. The advantages:
A clade groups organisms (or viruses) that share a common ancestor and the derived characters inherited from it.
The most objective evidence for grouping organisms into clades comes from base sequences of genes or amino acid sequences of proteins. Sequences accumulate mutations as lineages diverge — fewer differences = more recent common ancestor.
Shared physical traits can also support clade assignment, but watch for convergent evolution — unrelated species evolving similar features for similar functions. Wings of bats and birds are convergent, not shared from a common ancestor with wings.
Mutations accumulate at a roughly constant rate. The more sequence differences between two lineages, the longer ago they shared a common ancestor.
Closely related species (recent common ancestor) have few differences in gene/protein sequences. Distantly related species have many. Calibrate the mutation rate (using fossils of known age) and you can convert sequence differences into divergence dates — millions of years ago when lineages last shared a common ancestor.
The molecular clock is therefore an estimate. To pin dates accurately, biologists calibrate clocks with the fossil record wherever they can.
From sequence comparisons, multiple cladograms are mathematically possible. Biologists choose between them using parsimony.
A cladogram is a branching tree showing the most probable sequence of divergences. Nodes represent common ancestors; branches are lineages. Construction steps:
"A simple hypothesis with fewer evolutionary changes is more likely than a complex one with many." Doesn't guarantee the truth — but is the most defensible default when multiple histories are consistent with the data.
Three terms you must know — and one rule that controls them all.
The bottommost node — represents the theoretical last common ancestor of every species on the cladogram.
A point where one lineage splits into two or more descendant lineages. Each node represents a (hypothetical) common ancestor at the moment of divergence.
An endpoint — one species or group at the leaves of the tree.
The further back (closer to the root) the most recent common ancestor of two groups, the more distantly related they are. Two species sharing a node near the tips of the tree are close cousins; two sharing only the root are most distant.
A clade = a node plus all its descendants. Cut any branch at any node and you have a clade. That's why classification by clades scales: every branch is a potentially named group.
A classic case: a plant family classified by morphology, then dismantled by molecular evidence. Convergent evolution had fooled the morphologists.
The figwort family (Scrophulariaceae) was named in 1879 by Antoine de Jussieu, based on shared morphology. Two centuries later, cladistic analysis of three chloroplast genes for every member of the family revealed a much messier picture. The reclassification:
The morphological similarities that grouped these plants were due to convergent evolution — similar floral structures evolved independently in distantly related lineages, probably because the same pollinators favoured the same flower shape.
Scientific theories — including taxonomic classifications — can be falsified by new evidence. The morphological figwort family was falsified by molecular data. Science accepts knowledge claims provisionally, always open to revision.
Carl Woese added a new top-level rank — Domain — after rRNA sequencing revealed that prokaryotes are actually two deeply different groups.
Before 1977, organisms were placed into five kingdoms (Plants, Animals, Fungi, Protists, Bacteria). Woese's sequencing of ribosomal RNA (rRNA) — a gene present in every living thing and changing extremely slowly — showed that "bacteria" were actually two distinct domains, as different from each other as either was from eukaryotes. Crucially, Archaea rRNA is more similar to eukaryote rRNA than to bacterial rRNA — meaning archaea and eukaryotes share a more recent common ancestor than either does with bacteria.
Prokaryotic. Many extremophiles (thermophiles, halophiles, methanogens). Unique membrane lipids; pseudopeptidoglycan cell walls. Closer to eukaryotes in rRNA.
Prokaryotic. Peptidoglycan cell walls. The "true bacteria" we are most familiar with — E. coli, Streptococcus, cyanobacteria.
Eukaryotic — membrane-bound nuclei and organelles. Animals, plants, fungi, protists. More closely related to Archaea than to Bacteria.
If you can't define one of these in a sentence, that's where to revise next.
“What mechanisms contribute to convergent evolution?”
“To what extent is the natural history of life characterized by increasing complexity or simplicity?”