Skip to content


What’s in a name?

As I briefly mentioned in a previous post, miRBase 17 included two conceptual changes in the miRNA nomenclature scheme, which deserve further detail and clarification.

The name of a miRNA contains some human-readable information. If you stop reading this post halfway, you’ll likely think this is a good thing. Which of course it is, as long as we recognise the limitations. Hold on to the end and hopefully you’ll see that names can create some issues.

Take for example, hsa-mir-20b. The “hsa” tells us it is a human miRNA. The “20″ tells us that was discovered early — it’s only the 20th family that was named. “20b” tells us that it is related to another miRNA that we can guess is probably called hsa-mir-20a. We can go further — the (lack of) capitalisation of “mir” tells us we’re talking about the miRNA precursor. Or maybe the genomic locus, or maybe the primary transcript, or maybe the extended hairpin that includes the precursor. So that’s already less useful.

hsa-mir-20b has two mature products, named hsa-miR-20b and hsa-miR-20b* (as of this moment — as you’ll see below, this will change). “miR” tells us we’re talking about a mature sequence. In this case miR-20b arises from the 5′ arm of the mir-20b hairpin, and miR-20b* arises from the 3′ arm. The “*” tells us that miR-20b* is considered a “minor” product. That means miR-20b* is found in the cell at lower concentration than miR-20b. It is often inferred that miR-20b* is non-functional, and you’ve probably noticed that miR* sequences in general magically disappear in most pictures of miRNA biogenesis, while the dominant arm is magically incorporated into the RISC complex.

But hang on a minute, a bunch of papers now tell us that miR* sequences can be functional (eg Yang et al. 2011), perhaps through binding different Agonaute proteins (a glut of papers in the past couple of years nicely reviewed by Czech and Hannon, 2011). And, of course, the miR* sequence from one hairpin might be expressed at orders of magnitude higher level than the dominant miR sequence from another hairpin. Perhaps the arm that makes the dominant product can change in different tissues, stages and species (G-J et al. 2011). Should we rename miR and miR* sequences every time someone produces an ever deeper sequencing dataset? To cap it all, the “*” character causes problems for database searches and the like.

We therefore intend to retire the miR/miR* nomenclature, in favour of the -5p/-3p nomenclature (the latter has been used in parallel for mature products of approximately equal expression, and will in future be applied to all sequences). We will make this transition in phases, as we can make companion data available to show the expression of mature products from each arm. In miRBase 17, all Drosophila melanogaster mature sequences are renamed as -5p/-3p, and many previously missing second mature products have been added. The available deep sequencing data makes clear which of the potential mature products is dominant. Other species will follow suit in due course.

The second change in miRBase 17 concerns the small number of pairs of miRNA sequences that are transcribed from the same locus in opposite directions — that is, sense/antisense pairs. For example, the dme-mir-307 locus has been shown to be transcribed in both directions, and both transcripts are processed to produce mature miRNAs. These miRNAs were previously named dme-mir-307 and dme-mir-307-as in miRBase. The -as is confusing, because it is similar to the suffixes used to denote families of related miRNAs. The classification of sense and antisense is arbitrary. To confuse matters further, -as and -s were used in early miRNA literature to refer to mature products produced from the 5′ and 3′ arms of a hairpin precursor. From miRBase 17 onwards, the -as nomenclature is retired. Sense and antisense miRNAs will be named independently and in the same way as all other sequences: If the sequences are similar then they get a, b suffixes (eg dme-mir-307a and dme-mir-307b), and if they are not deemed similar enough then they get different numbers (eg rno-mir-151 and rno-mir-3586).

The combined result of these changes is that the name of a miRNA contains less information than previously. This may seem like a retrograde step. However, the problem with encoding information in the name is that people are tempted to use it. MicroRNA names are often pragmatic compromises, and have been overloaded with relatively complex meaning, for example, regarding family relationships and expression levels. Names should be useful, but should never be used in place of the correct analysis, for example, of sequence relationships or expression. We therefore suggest that you’ll find your miRNA life easier if you bear in mind some simple concepts:

1. Be explicit. If you are referring to the mature miR-20b sequence, you could rely on the capitalisation in miR-20b to say that for you. But it is much better to say “the mature miR-20b sequence”. Even better, show the sequence along with the name; names are not formally stable, but quoting the specific sequence you’ve used in your paper will ensure the entity is traceable forever.

2. Never use the name to encode or derive complex meaning. If you are interested in sequence relationships, you should do some sequence analysis. If you care about expression levels of alternate mature miRNAs, look at expression data. If you derive all your information about miRNA sequence relationships from the name, you will miss a great deal. If you rely on the name to tell you about relative expression then all hope is lost.

Posted in nomenclature.


27 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Christoph says

    Sam,
    Thanks for the great post.
    “What does the “*” mean?” is probably the question I hear the most.
    Now I know where to send them :-)
    Christoph

  2. JianRong says

    Thank you so much for the “official” clarfication. The “*”, “-3p/-5p” and “-as” issues did cause some troubles.

    However, are there any plans about the names of miRNA-offset RNAs?

    jianrong

    • sam says

      Yep, good question. The offset miRNAs are also in our sights. Although, to be honest, I don’t yet have a clear view of the best solution. Currently, miRBase annotates only a very small number of offset miRNAs (sometimes called moRs), and they have names of the form ath-miR161.1 and ath-miR161.2. One option is to use this scheme but roll it out more widely. I don’t like that nomenclature much, because dot-number is used in many sequence databases to indicate versions of sequences. I think it makes sense for us to do the same eventually. Another possible solution is to largely ignore offset sequences. We can dynamically annotate the most abundant mature miRNA from each precursor arm (by automatic analysis of deep seq datasets, for example), and treat offset sequences in the same way that isomiRs are treated — that is, leaving the user to see these data for themselves in the deep seq views. As I say, the correct solution is not yet clear to us, but we’ll be thinking about it, and comments and suggestions are welcome.

  3. Bisrat says

    That was a great clarification. Thanks a lot!

  4. B Smith says

    ‘*” causes havoc in excel. Glad to see you go!

  5. Li R.S says

    Thanks a lot for the retire of “-as”.!!!!!
    But in 17.0, I can still see new added mature sequences named by ” * “, for example, ssc-miR-545*, hsa-miR-4524*.
    So, only the mature sequences of dme were converted ?

    I hate ” * ” much more than “-as ” , not only for the CONFUSE effect.
    it raised errors in most language and regular expressions, and I always have to replace it firstly…
    I really want to see it vanish in next release.
    Best regards!

    • sam says

      In release 17 it was only dme sequences that were converted. (Tribolium data, as it came from our lab, was also added with exclusively -5p, -3p tags.) The rest will follow as soon as pragmatically possible. Model organisms with lots of deep sequencing data can be changed quickly (next release). Some others may take more time, as we want to replace the information that is (foolishly) encoded in the miR/miR* names, for example with sequencing read counts, rather than lose it completely.

      • Li R.S says

        Glad to hear expression information between miR and miR* can be mantained by some other way…
        Will the miRname became sth like miR-1-5P-X…?
        er….

  6. sam says

    All mature miRNA names will switch to the form miR-1-5p. What does the -X mean in your example?

    • Li R.S says

      I mean, for example, add an “*”after 5p, or add the copy number directly after “5p”, then we know which form dominates~~
      er…just a joke…

  7. Anelda says

    This is a great review. My biggest question now is actually “What is the definition of a microRNA?”. If you look at reviews written over the last few years, the definition if very flexible and length has been described from between 17 – 28 nt (mostly 20-22nt though). Is it also possible to write a post on what exactly the criteria is for a sequence to end up in Mirbase as a microRNA? That would be awesome.

    • sam says

      I don’t think it is the definition of a microRNA exactly that is the issue here — microRNAs are well defined by their biogenesis and mode of action. Rather the problem is how to tell the difference between microRNA sequences and other short or fragmentary RNA sequences. This is the subject of some discussion in the literature, and we tried to sum up the current state in the last miRBase paper:

      miRBase: integrating microRNA annotation and deep-sequencing data.
      Kozomara A, Griffiths-Jones S.
      NAR 2011 39:D152-D157

      I’ll try and write something about this here as well, as it’s clearly the key issue right now, particularly with deep seq datasets.

  8. J.Clancy says

    Looking forward to the end of stars myself. Writing a miRNA profiling paper with some stars and some -5p/-3p and it really doesn’t make much sense (to me or the intended audience). Would be nice to keep a history of the miRNA in there somewhere so us oldies in the field can see which one used to be the star.
    Also, I think the deep sequencing data section of miRbase is great, but I foresee it getting very crowded very soon. Good luck with the curation of that!!

  9. subhashree says

    thats a very very useful post!!! thanks..

  10. George Murphy says

    Sam,

    Thanks very much for your helpful information. I have a couple questions re:naming. I’ve always assumed naming miRNAs was originated by the discovering investigator. Is that true? If so, how do we keep two investigators from naming different miRNAs by the same number? So, who controls the orderly naming?

    Similarly, while family members have similar sequences, how is it decided and a consensus reached that they are indeed part of a “family”?

    Thanks again for your help,
    George

    • sam says

      Names are coordinated and assigned by us. Currently, the correct procedure to submit sequences to us immediately on acceptance of a manuscript for publication. Then we can assign official names quickly such that they can be included in the published article. The submitting authors therefore usually submit a paper with preliminary names. Many journal editors are familiar with this procedure, and it mostly works pretty well.

      Unfortunately there isn’t a single definition of a family that is applied uniformly. Names are sometimes a compromise, hence the warning in this post about reading too much into a name. We err towards assigning families based on sequence similarity across the whole hairpin precursor, which is suggestive of a common ancestor (in an analogous way to how gene/protein families are usually defined), rather than relying on similarity in the miRNA seed region (which might arise independently). We get lots of queries about this, so I’ll write a more detailed post some day.

  11. mycobio says

    Sam,

    How to delimit a mirnas family?

    thank you very much!

  12. Bastian says

    Thank you very much! The mirna-family delimitation questions is also very interesting!

  13. Jitendra says

    hey sam,
    i am confuse about difference b/w miR-21 or hsa-miR-21 or rnu-miR-21?
    in some papers ‘miR-21′ are used for human experiments and in same paper they are using miR-21 for rats.so miR-21 in human is hsa-miR-21 and in Rat it will be rno-miR-21 is to so.
    or miR-21 it self a product.like synthetic microrna we can use it for any mamalian species.

    • sam says

      rno-miR-21 and hsa-miR-21 are endogenous rat and human miRNAs, and do not necessarily have identical sequences (although, as it happens, in this case, they do!).

  14. Weixin says

    Hi Sam,
    In miR-548q, what does “q” stand for? Thanks.

    • Weixin says

      Hi Sam,
      What is the meaning of “miR-548aj-1″?
      Thanks.

    • sam says

      The lettered and numbered suffixes refer to different loci that produce identical (-1, -2 etc), or similar (a, b etc) mature miRNA sequences. There’s plenty written about this in previous miRBase NAR papers, and elsewhere.

  15. Sarah says

    Great article, I think it would be really useful to have an entry in the entry page for miRNA gene listing previous names of the mature miRNAs, in the same way that HUGO gene names are also accompanied by sometimes antiquated synonyms. Even better would be to have a downloadable list of all previous mature miRNA names and their up to date equivalents, that way it would be extremely easy to immediately update a batch of results from an old study in the literature, or compare between experiments performed using different releases of miRbase. I know it is possible to do this with dead and diff files but it is very fiddly, especially with 18 releases to contend with and many many species. Finally I think it would be useful to have an entry showing at what miRbase release the gene or additional mature miRNAs were added. I think these features would add to an otherwise comprehensive and workable database.

    • sam says

      Yep, these things have been on the list for a while. We’ll try and implement them soon.

  16. Sarkawt says

    Dear Sam I have got 2 questions:
    1) The number of some miRNAs is larger than the number of currently discovered miRNAs. For example: hsa-miR-4524*. The number of miRNAs discovered so far is less than 4524. What does this large number refere to?

    2) What miRNAs can be used as housekeeping miRNAs in miRNA experiments on human breast FFPE tissues?

    Many thanks

    • sam says

      1. The numbers are not incremented in a species-specific way. The highest number in miRBase 18 in mir-5710, which means that we have assigned numbers to around 5710 families across all species.

      2. I’m afraid I don’t know. I guess you want to find a miRNA that is expressed at around the same level in all tissues. I have a vague recollection of seeing a miRNA with a number in the 20′s used as a control for expression studies, but I may have dreamt that.



Some HTML is OK

or, reply to this post via trackback.