Two strands. Four letters. Three billion years of memory. A chemistry that copies itself well enough for life to last.
Every sub-topic below feeds at least one of these questions.
How does the structure of nucleic acids allow hereditary information to be stored?
How does the structure of DNA facilitate accurate replication?
The required syllabus content for A1.2, in order. Each card is one lesson-sized checkpoint.
Some viruses use RNA as their genetic material but viruses are not considered to be living.
In diagrams of nucleotides use circles, pentagons and rectangles to represent relative positions of phosphates, pentose sugars and bases.
Sugar–phosphate bonding and the sugar–phosphate “backbone” of DNA and RNA
Bases in each nucleic acid that form the basis of a code
RNA as a polymer formed by condensation of nucleotide monomers
DNA as a double helix made of two antiparallel strands of nucleotides with two strands linked by hydrogen bonding between complementary base pairs
Differences between DNA and RNA
Role of complementary base pairing in allowing genetic information to be replicated and expressed
Diversity of possible DNA base sequences and the limitless capacity of DNA for storing information
Conservation of the genetic code across all life forms as evidence of universal common ancestry
Every living organism on Earth — from E. coli to a blue whale — stores its genetic information in DNA. The single most universal fact in biology.
DNA (deoxyribonucleic acid) is the genetic material of all living organisms. It is found in chromosomes and contains the instructions for the growth, development and functioning of every cell. The sequence of bases along a strand of DNA is what we mean by genetic information.
Living organisms have at least one cell, carry out metabolism, maintain homeostasis, respond to stimuli, reproduce, grow and develop, and contain genetic information (in the form of DNA). Use this checklist to decide whether something is alive — or, more interestingly, isn't.
Viruses do carry genetic material — DNA or RNA — surrounded by a protein coat. But they fail almost every other test of life:
This is why the syllabus says "some viruses use RNA as their genetic material but viruses are not considered to be living." A virus is genetic information looking for a cell to hijack.
A nucleotide is a phosphate, a sugar, and a nitrogenous base — held together by covalent bonds.
Both DNA and RNA are polynucleotides — long chains assembled one nucleotide at a time through condensation reactions, building a strong covalent backbone.
Each condensation reaction joins the phosphate of one nucleotide to the sugar of the next, releasing a water molecule. The result is a continuous chain of covalently bonded atoms — sugar, phosphate, sugar, phosphate — which forms a strong backbone for the molecule. The bases stick out sideways from this backbone, where they can pair with bases from another strand or be read by enzymes.
The backbone is strong because every bond in it is a covalent bond — energetically expensive to break. This is why DNA can survive decades inside a cell (and tens of thousands of years inside a frozen mammoth tusk) without degrading.
The sequence of bases is the code. The sugar–phosphate backbone is just the rail it sits on.
Adenine, Thymine, Guanine, Cytosine. A pairs with T; G pairs with C. The sequence in a gene determines the amino acid sequence in a protein.
Adenine, Uracil, Guanine, Cytosine. Uracil replaces thymine. RNA uses U because it's metabolically cheaper to make than T — fine for short-lived transcripts.
The flow of information is direct:
Each set of three bases (a codon) specifies one amino acid. With four possible bases at each position, there are 4³ = 64 possible codons — enough to code for the 20 standard amino acids, with redundancy and start/stop signals.
RNA monomers (nucleotides) link by condensation reactions to form an RNA polymer. Usually single-stranded; the same chemistry, different geometry.
For RNA: vertical column of nucleotides; covalent bond joining the sugar of one nucleotide to the phosphate of the next; label the four RNA bases (U, A, C, G). For an explicit IB diagram you need 4+ nucleotides in a chain.
Two polynucleotide strands wind around each other in opposite directions, held together by hydrogen bonds between complementary base pairs. The structure is so elegant it feels inevitable.
Three differences. Each one matters.
| Feature | DNA | RNA |
|---|---|---|
| Number of strands | 2 (double helix) | 1 (single strand) |
| Pentose sugar | Deoxyribose | Ribose |
| Bases | A, T, G, C | A, U, G, C (U replaces T) |
| Stability | Very stable — built to last | Less stable — built to be disposable |
| Examples | Chromosomes, plasmids | mRNA, tRNA, rRNA |
You should be able to sketch deoxyribose and ribose and label the difference: deoxyribose lacks the –OH on its 2' carbon (the "deoxy" part).
Because the pairings are obligate, each strand is a perfect template for the other. This is what makes replication and gene expression possible.
The chemistry of thymine only allows adenine to bond with it — and they form two hydrogen bonds. In RNA, adenine pairs with uracil instead (also two H-bonds).
Guanine pairs only with cytosine, held by three hydrogen bonds — slightly stronger than A=T. DNA regions rich in G–C are harder to denature.
Any length is possible. Any sequence is possible. So the number of possible DNA molecules is astronomically large — and so is the amount of information one cell can carry.
Total DNA per cell ≈ 2 metres when stretched out.
~249 million nucleotides — the longest in our genome.
~48 million base pairs — the smallest autosome.
Codes for the muscle protein dystrophin. Mutations cause muscular dystrophy.
Genes are sequences of DNA that code for specific proteins. The shortest human gene (coding for a tRNA) is only 76 nucleotides long; the average is 10,000 – 15,000 nucleotides. The total information stored in 23 pairs of chromosomes — about 3 billion base pairs — fits inside every cell of your body, and is copied every time a cell divides.
Bacteria, archaea, plants, fungi, animals — almost without exception, all use the same 64 codons to specify the same 20 amino acids. This shared code is the single strongest piece of evidence that modern life traces back to a single common ancestor.
"Near-universal" because there are tiny exceptions — a few codons differ in some mitochondrial DNAs and in a handful of ciliate protozoans. But the rule holds for >99% of life: AUG = methionine in E. coli, in yeast, in oak trees, in you. Read a human gene with an E. coli ribosome and you get the same protein you'd get reading it with a human ribosome.
This is why we can put a human gene into bacteria and have them make human insulin (the basis of recombinant biotechnology). It's also why we can be confident that life on Earth has a single deep ancestor — LUCA, the Last Universal Common Ancestor — sometime around 3.5–4 Ga.
An extra 5 sub-topics for HL — same syllabus, deeper mechanism.
Directionality of RNA and DNA
Purine-to-pyrimidine bonding as a component of DNA helix stability
Structure of a nucleosome
Evidence from the Hershey–Chase experiment for DNA as the genetic material
Chargaff’s data on the relative amounts of pyrimidine and purine bases across diverse life forms
Both DNA polymerase and RNA polymerase can only add nucleotides in one direction: 5' → 3'. This single constraint shapes everything about how DNA is replicated, transcribed and translated.
In ribose and deoxyribose, the carbons are numbered 1' through 5' going clockwise from the oxygen atom:
The DNA double helix is exactly the same width all the way down the molecule — regardless of the sequence — because every base pair is a purine paired with a pyrimidine.
Adenine and guanine. Larger, two-ring molecules. About 2 nm long.
Cytosine, thymine (uracil in RNA). Smaller, one-ring molecules. About 1.2 nm long.
A purine paired with a pyrimidine = constant length. A purine paired with another purine would be too wide; a pyrimidine paired with another pyrimidine would be too narrow. This is why A pairs only with T (purine + pyrimidine) and G pairs only with C (purine + pyrimidine) — never A with G or T with C.
The consequence: the helix has the same three-dimensional structure regardless of base sequence. This is what lets the same enzymes replicate and transcribe any stretch of DNA.
The nucleosome is the first level of DNA packaging — a unit of about 147 base pairs of DNA wrapped twice around a core of histone proteins, with one more histone holding the structure.
In the 1950s, biology was split: was the genetic material protein or DNA? Hershey and Chase used the T2 bacteriophage and two clever radioactive labels to settle it.
The T2 bacteriophage is a virus that infects E. coli. Its structure is simple: a DNA molecule inside a protein coat. When it infects a bacterium, the DNA enters the cell and the protein coat stays outside, attached to the surface.
Hershey and Chase produced two batches of radioactive viruses:
Each batch was used to infect a separate flask of E. coli. After infection, the cultures were violently agitated in a kitchen blender — shaking the empty protein coats loose from the bacterial surface — then centrifuged. The bacteria (with whatever had entered them) pelleted at the bottom; the loose protein coats stayed in the supernatant.
The experiment was only possible because radioisotopes had recently become available to researchers. New tools open new questions — a recurring pattern in the history of biology.
The early "tetranucleotide hypothesis" said DNA was a boring repeating unit. Chargaff's measurements showed it couldn't be — and his pattern eventually let Watson and Crick crack the helix.
In the early 20th century, Phoebus Levene proposed that DNA was a repeating unit of all four bases in equal amounts — A, T, G, C, A, T, G, C, …. If that were true, every DNA sample would contain 25% of each base.
Erwin Chargaff measured the relative amounts of the four bases in DNA from many different species. Two findings:
Watson and Crick used Chargaff's rule directly: if A=T and G=C across all species, the simplest explanation was that the molecule contains paired strands with A always opposite T and G always opposite C. The double helix fell out of that constraint.
Induction is the move from specific observations to a general hypothesis (Levene's tetranucleotide idea was inductive). The problem: induction can never prove a hypothesis — there's always a next observation that might falsify it. Falsification (Karl Popper's principle) reverses the logic: hypotheses are accepted as long as they survive attempts to break them. A single counter-example can disprove a hypothesis decisively. Chargaff's data did exactly that to the tetranucleotide hypothesis.
If you can't define one of these in a sentence, that's where to revise next.
“What makes RNA more likely to have been the first genetic material, rather than DNA?”
“How can polymerization result in emergent properties?”