Geneticists’ Bias Causes a Big Mistake

This is one way to visualize a coding section of RNA.  It has a start codon that tells the cell to start making a protein, followed by a recipe for that protein.  Then there is a stop codon, to tell the cell that it is done making the protein.

This is one way to visualize a coding section of RNA. It has a start codon that tells the cell to start making a protein, followed by a recipe for that protein. Then there is a stop codon, to tell the cell that it is done making the protein.

You’ve heard it many times before. The vast majority of DNA is junk. Of course, the ENCODE project showed how wrong that notion is. Now that we know the vast majority of DNA is functional, you might wonder how in the world the idea of “junk DNA” became so popular among scientists. I suspect there are many reasons, but some recent research has revealed one of them – a bias regarding what it means for DNA to be functional. The research was done on molecules called long non-coding RNAs, which are commonly referred to as lncRNAs.

What are lncRNAs? Well, let’s start with what RNA is. The genes that your body uses are in your DNA, most of which is found in the control center of the cell, called the nucleus. In order for your cells to use those genes, they must be copied by another molecule. This process is called transcription, and the molecule that performs transcription is RNA. Once it has transcribed the gene, RNA leaves the nucleus, at which point it is often referred to as messenger RNA (mRNA), because it is sending a message to the cell.

What’s the message? It is a recipe for building a protein. That recipe is put together in informational units called codons, and it goes to a ribosome, which is a protein-making factory in the cell. The ribosome reads the codons, translating them one-by-one into a protein. Not surprisingly, this process is called translation. How does the ribosome know when to start building the protein? There is a start codon that tells it to start. How does it know when to stop building the protein? There is a stop codon that tells it to stop. As a result, you can think of messenger RNA in terms of the illustration above – it contains a start codon, a recipe for a protein (the blue bar in the illustration), and a stop codon.

So how does this relate to lncRNAs? Well, messenger RNA is referred to as “coding RNA,” because it codes for the production of proteins. LncRNAs are called “non-coding RNAs,” because it was thought that they do not code for proteins. Now there are lots of RNAs that are thought to be non-coding, but lncRNAs are relatively long. That’s how they get their name. Well, it turns out for at least some lncRNAs, every part of their name (except RNA) is wrong.

In February of this year, Dr. Alexander Schier and his colleagues were looking at the development of zebrafish embryos. They analyzed the proteins that were present during the process, and they found several that had not been previously identified. One of them was a small protein that had exactly the sequence one would expect if it had been coded for by a lncRNA that has been called Toddler. In their study, they produce six lines of evidence that this small protein is made by the cell using Toddler.1 In other words, even though Toddler is known as a long non-coding RNA, it actually does code for a short protein, and during the development of the zebrafish embryo, the cell translates it into a short protein. (As a point of terminology, a short protein is often called a peptide.)

How common is this? Well, a more recent study has identified hundreds of short proteins (peptides) in both zebrafish and humans that are produced from what were thought to be long non-coding RNAs. How did they identify them? They identified the signs of ribosomes reading the lncRNAs. In other words, it’s not just that there are proteins which have the sequences you would expect from lncRNAs. The research team actually showed that those lncRNAs went to a ribosome and got translated!2

So…if there really are hundreds of lncRNAs that actually do code for proteins, why did geneticists call them non-coding? When I first started reading about them, I assumed that geneticists thought they were non-coding because they didn’t have the structure of messenger RNA, as shown in the illustration above. Maybe they didn’t have a start codon. Maybe they didn’t have a stop codon. I just assumed that something must be missing from them. Why else would they be called non-coding?

However, with help from a professor who has forgotten more molecular genetics than I will ever learn, I found out that lncRNAs have exactly the structure you find in messenger RNA. They have a start codon, a recipe for a short protein, and a stop codon. Why, then, were they called non-coding? Because the recipe is really short. Geneticists just assumed that recipes for short proteins just don’t get translated, so RNAs that contain short recipes were thought to be non-coding. Now, of course, there’s more to it than that. Geneticist also tend to compare the RNAs they find in cells to known proteins, and there aren’t that many known short proteins in living organisms.

So in the end, we now know that geneticists’ bias towards large proteins led to a big mistake. At least hundreds of long non-coding RNAs are, in fact, short coding RNAs! It will be interesting to see how this line of research progresses. If you look at all the RNA that is produced in animals and people, you find a bewildering number that have start codons, stop codons, and a recipe for a short protein. I suspect that as time goes on, we will find a bewildering number of short proteins that are produced by those RNAs, at least at specific stages in the life of most organisms.


1. Pauli A, Norris ML, Valen E, Chew GL, Gagnon JA, Zimmerman S, Mitchell A, Ma J, Dubrulle J, Reyon D, Tsai SQ, Joung JK, Saghatelian A, and Schier AF, “Toddler: an embryonic signal that promotes cell movement via Apelin receptors,” Science 2014, doi:10.1126/science.1248636
Return to Text

2. Bazzini AA, Johnstone TG, Christiano R, Mackowiak SD, Obermayer B, Fleming ES, Vejnar CE, Lee MT, Rajewsky N, Walther TC, and Giraldez AJ, “Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation,” EMBO Journal, 33(9):981-93, 2014
Return to Text


  1. Rhonda August 2, 2014 4:27 pm

    Hello Dr. Wile this is off topic but I didn’t know how to get a hold of you to ask. My daughter is going to be studying anatomy and physiology this year. I noticed that your name is no longer listed on the second edition books. I’ve done a little research and learned that you left Apologia and that since your name no longer appears they must have changed the book. My daughter loved your chemistry book! It was so clear yet very indepth and it was also fun and conversational. So enough background. My question I guess is your honest opinion of the new book. How is it the same, different? Is it still a good self teaching book? Is it as engaging? Thank you!

  2. Rhonda August 3, 2014 12:50 am

    Thank you so much for the quick answer! And thank you for helping my daughter love Chemistry.

  3. Kendall August 3, 2014 4:19 pm

    Hi Rhonda. I’d just like to recommend to you what I did. I bought a used version of the human anatomy course from It’s an older version, but has all the basics. If you want Dr. Wile’s version, just go for the used copy! I get a lot of my textbooks used. It’s much cheaper that way, too!

  4. Joe G August 6, 2014 2:08 pm

    Good day- This is off topic but ralated to another older post of yours pertaining to human/ chimp genome similarity. I was told that your article and reference 6 (to the 70%) are now known to be wrong. I was referred to the following site: Ensemble Genome browser pan-trog

    Any thoughts?

    To me it doesn’t matter because we are not our genomes, meaning genomes do not determine the type of organism, they help realize that form and influence its development, but not determine it.

    • jlwile August 6, 2014 6:35 pm

      Thanks for your comment, Joe. I agree that we are not our genomes. However, it is important to know the difference between our genome and that of our supposed closest living relative, because that should at least tell us something about the number of mutations necessary to get from our supposed common ancestor to the genomes we see today. If the difference really is on the order of 30%, then the mutation rate must have been enormous to produce that difference in “only” 6-8 million years.

      The link you gave me is for a tiny section of one chromosome (less than 1 million base pairs on chromosome 17). This, of course, doesn’t tell us anything about the differences between the entire chimp and human genomes! It doesn’t even tell us much about that single chromosome, which has 81 million base pairs. This is a very important thing that you have to watch for when people talk about comparing human and chimp DNA. Sometimes, they compare only the protein-coding genes between the two. Sometimes, they only compare tiny regions of the chromosomes (as your link does). The only studies I have seen on whole-genome comparison indicate the similarity is only about 70%, as the initial sequence of the chimpanzee genome indicated, as the calculation by Richard Buggs suggested, and as the Tomkins study suggested.

  5. Joe G August 7, 2014 8:24 am

    Thank you for your response. I thought the same thing about that reference but I figured I would run it by you. That said it would be nice of the evolutionists to tell us how many mutations it took to get from a knuckle-walker/ quadruped to an upright biped so we have something to measure. It would also be nice to know what genes were affected so we have something to test.

    Back to the topic! Yes evolutionary biology has caused us to overlook many of life’s little secrets- all of those micro- RNAs, for example. And it is very telling that the closer we look the more levels of information we observe. However I did not know those short RNAs were also translated via the ribosome. Great stuff- thank you again.