We hear a lot about how similar the human genome is compared to the chimpanzee genome. As I have discussed previously, if we compare the genomes one way, they are 72% identical. If we compare them another way, they more than 95% identical. If we compare them yet another way, they are 88-89% identical. That’s a wide range of results! Why can’t we say definitively how similar the human genome is to the chimpanzee genome? There are probably several reasons for this, but I want to highlight a basic one. Even though the human and chimpanzee genomes have been sequenced, we still don’t know them as well as you might think.
To understand why we don’t know these sequenced genomes very well, you need to know a bit about how DNA stores information. As most people know, DNA is a double helix. Each strand of this double helix has a sequence of chemical units called nucleotide bases. There are four different nucleotide bases: adenine (A), thymine (T), cytosine (C), and guanine (G). Taken three at a time, these four nucleotide bases code for a specific kind of chemical called an amino acid. The two strands of the double helix hold together because the nucleotide bases on one strand link up with the nucleotide bases on the other strand.
As shown in the illustration above, the way the nucleotide bases link up is very specific. Adenine (A) links only to thymine (T), and cytosine (C) links only to guanine (G). Because of this, if you know the sequence on one strand of DNA, you automatically know the sequence on the other strand. After all, A can only link to T, so anywhere one strand has an A, the other strand must have a T. In the same way, C can only link to G, so anywhere one strand has a C, the other strand must have a G. So the two strands of the DNA double helix are held together by pairs of nucleotide bases.
As a result, we count the length of a genome in terms of how many base pairs there are. The illustration above, for example, has 14 base pairs (the black G is hiding a C behind it, and the black A is hiding a T behind it). Obviously, then, the larger the number of base pairs in the genome, the longer the genome is. Believe it or not, even though the human and chimpanzee genomes have been sequenced, we don’t know for sure how long either of them are!