As I briefly mentioned in a previous post, miRBase 17 included two conceptual changes in the miRNA nomenclature scheme, which deserve further detail and clarification.
The name of a miRNA contains some human-readable information. If you stop reading this post halfway, you’ll likely think this is a good thing. Which of course it is, as long as we recognise the limitations. Hold on to the end and hopefully you’ll see that names can create some issues.
Take for example, hsa-mir-20b. The “hsa” tells us it is a human miRNA. The “20″ tells us that was discovered early — it’s only the 20th family that was named. “20b” tells us that it is related to another miRNA that we can guess is probably called hsa-mir-20a. We can go further — the (lack of) capitalisation of “mir” tells us we’re talking about the miRNA precursor. Or maybe the genomic locus, or maybe the primary transcript, or maybe the extended hairpin that includes the precursor. So that’s already less useful.
hsa-mir-20b has two mature products, named hsa-miR-20b and hsa-miR-20b* (as of this moment — as you’ll see below, this will change). “miR” tells us we’re talking about a mature sequence. In this case miR-20b arises from the 5′ arm of the mir-20b hairpin, and miR-20b* arises from the 3′ arm. The “*” tells us that miR-20b* is considered a “minor” product. That means miR-20b* is found in the cell at lower concentration than miR-20b. It is often inferred that miR-20b* is non-functional, and you’ve probably noticed that miR* sequences in general magically disappear in most pictures of miRNA biogenesis, while the dominant arm is magically incorporated into the RISC complex.
But hang on a minute, a bunch of papers now tell us that miR* sequences can be functional (eg Yang et al. 2011), perhaps through binding different Agonaute proteins (a glut of papers in the past couple of years nicely reviewed by Czech and Hannon, 2011). And, of course, the miR* sequence from one hairpin might be expressed at orders of magnitude higher level than the dominant miR sequence from another hairpin. Perhaps the arm that makes the dominant product can change in different tissues, stages and species (G-J et al. 2011). Should we rename miR and miR* sequences every time someone produces an ever deeper sequencing dataset? To cap it all, the “*” character causes problems for database searches and the like.
We therefore intend to retire the miR/miR* nomenclature, in favour of the -5p/-3p nomenclature (the latter has been used in parallel for mature products of approximately equal expression, and will in future be applied to all sequences). We will make this transition in phases, as we can make companion data available to show the expression of mature products from each arm. In miRBase 17, all Drosophila melanogaster mature sequences are renamed as -5p/-3p, and many previously missing second mature products have been added. The available deep sequencing data makes clear which of the potential mature products is dominant. Other species will follow suit in due course.
The second change in miRBase 17 concerns the small number of pairs of miRNA sequences that are transcribed from the same locus in opposite directions — that is, sense/antisense pairs. For example, the dme-mir-307 locus has been shown to be transcribed in both directions, and both transcripts are processed to produce mature miRNAs. These miRNAs were previously named dme-mir-307 and dme-mir-307-as in miRBase. The -as is confusing, because it is similar to the suffixes used to denote families of related miRNAs. The classification of sense and antisense is arbitrary. To confuse matters further, -as and -s were used in early miRNA literature to refer to mature products produced from the 5′ and 3′ arms of a hairpin precursor. From miRBase 17 onwards, the -as nomenclature is retired. Sense and antisense miRNAs will be named independently and in the same way as all other sequences: If the sequences are similar then they get a, b suffixes (eg dme-mir-307a and dme-mir-307b), and if they are not deemed similar enough then they get different numbers (eg rno-mir-151 and rno-mir-3586).
The combined result of these changes is that the name of a miRNA contains less information than previously. This may seem like a retrograde step. However, the problem with encoding information in the name is that people are tempted to use it. MicroRNA names are often pragmatic compromises, and have been overloaded with relatively complex meaning, for example, regarding family relationships and expression levels. Names should be useful, but should never be used in place of the correct analysis, for example, of sequence relationships or expression. We therefore suggest that you’ll find your miRNA life easier if you bear in mind some simple concepts:
1. Be explicit. If you are referring to the mature miR-20b sequence, you could rely on the capitalisation in miR-20b to say that for you. But it is much better to say “the mature miR-20b sequence”. Even better, show the sequence along with the name; names are not formally stable, but quoting the specific sequence you’ve used in your paper will ensure the entity is traceable forever.
2. Never use the name to encode or derive complex meaning. If you are interested in sequence relationships, you should do some sequence analysis. If you care about expression levels of alternate mature miRNAs, look at expression data. If you derive all your information about miRNA sequence relationships from the name, you will miss a great deal. If you rely on the name to tell you about relative expression then all hope is lost.