Skip to content

What’s in a name?

As I briefly mentioned in a previous post, miRBase 17 included two conceptual changes in the miRNA nomenclature scheme, which deserve further detail and clarification.

The name of a miRNA contains some human-readable information. If you stop reading this post halfway, you’ll likely think this is a good thing. Which of course it is, as long as we recognise the limitations. Hold on to the end and hopefully you’ll see that names can create some issues.

Take for example, hsa-mir-20b. The “hsa” tells us it is a human miRNA. The “20″ tells us that was discovered early — it’s only the 20th family that was named. “20b” tells us that it is related to another miRNA that we can guess is probably called hsa-mir-20a. We can go further — the (lack of) capitalisation of “mir” tells us we’re talking about the miRNA precursor. Or maybe the genomic locus, or maybe the primary transcript, or maybe the extended hairpin that includes the precursor. So that’s already less useful.

hsa-mir-20b has two mature products, named hsa-miR-20b and hsa-miR-20b* (as of this moment — as you’ll see below, this will change). “miR” tells us we’re talking about a mature sequence. In this case miR-20b arises from the 5′ arm of the mir-20b hairpin, and miR-20b* arises from the 3′ arm. The “*” tells us that miR-20b* is considered a “minor” product. That means miR-20b* is found in the cell at lower concentration than miR-20b. It is often inferred that miR-20b* is non-functional, and you’ve probably noticed that miR* sequences in general magically disappear in most pictures of miRNA biogenesis, while the dominant arm is magically incorporated into the RISC complex.

But hang on a minute, a bunch of papers now tell us that miR* sequences can be functional (eg Yang et al. 2011), perhaps through binding different Agonaute proteins (a glut of papers in the past couple of years nicely reviewed by Czech and Hannon, 2011). And, of course, the miR* sequence from one hairpin might be expressed at orders of magnitude higher level than the dominant miR sequence from another hairpin. Perhaps the arm that makes the dominant product can change in different tissues, stages and species (G-J et al. 2011). Should we rename miR and miR* sequences every time someone produces an ever deeper sequencing dataset? To cap it all, the “*” character causes problems for database searches and the like.

We therefore intend to retire the miR/miR* nomenclature, in favour of the -5p/-3p nomenclature (the latter has been used in parallel for mature products of approximately equal expression, and will in future be applied to all sequences). We will make this transition in phases, as we can make companion data available to show the expression of mature products from each arm. In miRBase 17, all Drosophila melanogaster mature sequences are renamed as -5p/-3p, and many previously missing second mature products have been added. The available deep sequencing data makes clear which of the potential mature products is dominant. Other species will follow suit in due course.

The second change in miRBase 17 concerns the small number of pairs of miRNA sequences that are transcribed from the same locus in opposite directions — that is, sense/antisense pairs. For example, the dme-mir-307 locus has been shown to be transcribed in both directions, and both transcripts are processed to produce mature miRNAs. These miRNAs were previously named dme-mir-307 and dme-mir-307-as in miRBase. The -as is confusing, because it is similar to the suffixes used to denote families of related miRNAs. The classification of sense and antisense is arbitrary. To confuse matters further, -as and -s were used in early miRNA literature to refer to mature products produced from the 5′ and 3′ arms of a hairpin precursor. From miRBase 17 onwards, the -as nomenclature is retired. Sense and antisense miRNAs will be named independently and in the same way as all other sequences: If the sequences are similar then they get a, b suffixes (eg dme-mir-307a and dme-mir-307b), and if they are not deemed similar enough then they get different numbers (eg rno-mir-151 and rno-mir-3586).

The combined result of these changes is that the name of a miRNA contains less information than previously. This may seem like a retrograde step. However, the problem with encoding information in the name is that people are tempted to use it. MicroRNA names are often pragmatic compromises, and have been overloaded with relatively complex meaning, for example, regarding family relationships and expression levels. Names should be useful, but should never be used in place of the correct analysis, for example, of sequence relationships or expression. We therefore suggest that you’ll find your miRNA life easier if you bear in mind some simple concepts:

1. Be explicit. If you are referring to the mature miR-20b sequence, you could rely on the capitalisation in miR-20b to say that for you. But it is much better to say “the mature miR-20b sequence”. Even better, show the sequence along with the name; names are not formally stable, but quoting the specific sequence you’ve used in your paper will ensure the entity is traceable forever.

2. Never use the name to encode or derive complex meaning. If you are interested in sequence relationships, you should do some sequence analysis. If you care about expression levels of alternate mature miRNAs, look at expression data. If you derive all your information about miRNA sequence relationships from the name, you will miss a great deal. If you rely on the name to tell you about relative expression then all hope is lost.

Posted in nomenclature.

59 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Christoph says

    Thanks for the great post.
    “What does the “*” mean?” is probably the question I hear the most.
    Now I know where to send them :-)

  2. JianRong says

    Thank you so much for the “official” clarfication. The “*”, “-3p/-5p” and “-as” issues did cause some troubles.

    However, are there any plans about the names of miRNA-offset RNAs?


    • sam says

      Yep, good question. The offset miRNAs are also in our sights. Although, to be honest, I don’t yet have a clear view of the best solution. Currently, miRBase annotates only a very small number of offset miRNAs (sometimes called moRs), and they have names of the form ath-miR161.1 and ath-miR161.2. One option is to use this scheme but roll it out more widely. I don’t like that nomenclature much, because dot-number is used in many sequence databases to indicate versions of sequences. I think it makes sense for us to do the same eventually. Another possible solution is to largely ignore offset sequences. We can dynamically annotate the most abundant mature miRNA from each precursor arm (by automatic analysis of deep seq datasets, for example), and treat offset sequences in the same way that isomiRs are treated — that is, leaving the user to see these data for themselves in the deep seq views. As I say, the correct solution is not yet clear to us, but we’ll be thinking about it, and comments and suggestions are welcome.

  3. Bisrat says

    That was a great clarification. Thanks a lot!

  4. B Smith says

    ‘*” causes havoc in excel. Glad to see you go!

  5. Li R.S says

    Thanks a lot for the retire of “-as”.!!!!!
    But in 17.0, I can still see new added mature sequences named by ” * “, for example, ssc-miR-545*, hsa-miR-4524*.
    So, only the mature sequences of dme were converted ?

    I hate ” * ” much more than “-as ” , not only for the CONFUSE effect.
    it raised errors in most language and regular expressions, and I always have to replace it firstly…
    I really want to see it vanish in next release.
    Best regards!

    • sam says

      In release 17 it was only dme sequences that were converted. (Tribolium data, as it came from our lab, was also added with exclusively -5p, -3p tags.) The rest will follow as soon as pragmatically possible. Model organisms with lots of deep sequencing data can be changed quickly (next release). Some others may take more time, as we want to replace the information that is (foolishly) encoded in the miR/miR* names, for example with sequencing read counts, rather than lose it completely.

      • Li R.S says

        Glad to hear expression information between miR and miR* can be mantained by some other way…
        Will the miRname became sth like miR-1-5P-X…?

  6. sam says

    All mature miRNA names will switch to the form miR-1-5p. What does the -X mean in your example?

    • Li R.S says

      I mean, for example, add an “*”after 5p, or add the copy number directly after “5p”, then we know which form dominates~~
      er…just a joke…

  7. Anelda says

    This is a great review. My biggest question now is actually “What is the definition of a microRNA?”. If you look at reviews written over the last few years, the definition if very flexible and length has been described from between 17 – 28 nt (mostly 20-22nt though). Is it also possible to write a post on what exactly the criteria is for a sequence to end up in Mirbase as a microRNA? That would be awesome.

    • sam says

      I don’t think it is the definition of a microRNA exactly that is the issue here — microRNAs are well defined by their biogenesis and mode of action. Rather the problem is how to tell the difference between microRNA sequences and other short or fragmentary RNA sequences. This is the subject of some discussion in the literature, and we tried to sum up the current state in the last miRBase paper:

      miRBase: integrating microRNA annotation and deep-sequencing data.
      Kozomara A, Griffiths-Jones S.
      NAR 2011 39:D152-D157

      I’ll try and write something about this here as well, as it’s clearly the key issue right now, particularly with deep seq datasets.

  8. J.Clancy says

    Looking forward to the end of stars myself. Writing a miRNA profiling paper with some stars and some -5p/-3p and it really doesn’t make much sense (to me or the intended audience). Would be nice to keep a history of the miRNA in there somewhere so us oldies in the field can see which one used to be the star.
    Also, I think the deep sequencing data section of miRbase is great, but I foresee it getting very crowded very soon. Good luck with the curation of that!!

  9. subhashree says

    thats a very very useful post!!! thanks..

  10. George Murphy says


    Thanks very much for your helpful information. I have a couple questions re:naming. I’ve always assumed naming miRNAs was originated by the discovering investigator. Is that true? If so, how do we keep two investigators from naming different miRNAs by the same number? So, who controls the orderly naming?

    Similarly, while family members have similar sequences, how is it decided and a consensus reached that they are indeed part of a “family”?

    Thanks again for your help,

    • sam says

      Names are coordinated and assigned by us. Currently, the correct procedure to submit sequences to us immediately on acceptance of a manuscript for publication. Then we can assign official names quickly such that they can be included in the published article. The submitting authors therefore usually submit a paper with preliminary names. Many journal editors are familiar with this procedure, and it mostly works pretty well.

      Unfortunately there isn’t a single definition of a family that is applied uniformly. Names are sometimes a compromise, hence the warning in this post about reading too much into a name. We err towards assigning families based on sequence similarity across the whole hairpin precursor, which is suggestive of a common ancestor (in an analogous way to how gene/protein families are usually defined), rather than relying on similarity in the miRNA seed region (which might arise independently). We get lots of queries about this, so I’ll write a more detailed post some day.

  11. mycobio says


    How to delimit a mirnas family?

    thank you very much!

  12. Bastian says

    Thank you very much! The mirna-family delimitation questions is also very interesting!

  13. Jitendra says

    hey sam,
    i am confuse about difference b/w miR-21 or hsa-miR-21 or rnu-miR-21?
    in some papers ‘miR-21′ are used for human experiments and in same paper they are using miR-21 for miR-21 in human is hsa-miR-21 and in Rat it will be rno-miR-21 is to so.
    or miR-21 it self a synthetic microrna we can use it for any mamalian species.

    • sam says

      rno-miR-21 and hsa-miR-21 are endogenous rat and human miRNAs, and do not necessarily have identical sequences (although, as it happens, in this case, they do!).

  14. Weixin says

    Hi Sam,
    In miR-548q, what does “q” stand for? Thanks.

    • Weixin says

      Hi Sam,
      What is the meaning of “miR-548aj-1″?

    • sam says

      The lettered and numbered suffixes refer to different loci that produce identical (-1, -2 etc), or similar (a, b etc) mature miRNA sequences. There’s plenty written about this in previous miRBase NAR papers, and elsewhere.

  15. Sarah says

    Great article, I think it would be really useful to have an entry in the entry page for miRNA gene listing previous names of the mature miRNAs, in the same way that HUGO gene names are also accompanied by sometimes antiquated synonyms. Even better would be to have a downloadable list of all previous mature miRNA names and their up to date equivalents, that way it would be extremely easy to immediately update a batch of results from an old study in the literature, or compare between experiments performed using different releases of miRbase. I know it is possible to do this with dead and diff files but it is very fiddly, especially with 18 releases to contend with and many many species. Finally I think it would be useful to have an entry showing at what miRbase release the gene or additional mature miRNAs were added. I think these features would add to an otherwise comprehensive and workable database.

    • sam says

      Yep, these things have been on the list for a while. We’ll try and implement them soon.

  16. Sarkawt says

    Dear Sam I have got 2 questions:
    1) The number of some miRNAs is larger than the number of currently discovered miRNAs. For example: hsa-miR-4524*. The number of miRNAs discovered so far is less than 4524. What does this large number refere to?

    2) What miRNAs can be used as housekeeping miRNAs in miRNA experiments on human breast FFPE tissues?

    Many thanks

    • sam says

      1. The numbers are not incremented in a species-specific way. The highest number in miRBase 18 in mir-5710, which means that we have assigned numbers to around 5710 families across all species.

      2. I’m afraid I don’t know. I guess you want to find a miRNA that is expressed at around the same level in all tissues. I have a vague recollection of seeing a miRNA with a number in the 20′s used as a control for expression studies, but I may have dreamt that.

  17. Bilge says

    Dear Sam,
    Thanks for this detailed article. I have a question regarding miR-467b and miR-467b*. According to your explanation, these two had to be named as -3p and -5p. However, they are now under a common name as miR-467b-3p. I guess this means that they arise from the same arm. But, how will we distinguish them from now on?

    • sam says

      The previous versions of miRBase had a bug with this entry — as you say, the sequences named miR-467b and miR-467b* were not in fact miR and miR* but rather isomiRs from the same arm. These isomiRs appear to exist for the majority of miRNAs — we define the most abundant of these isomiRs as the mature sequence from a given arm. The deep sequencing data ( helps to show this, but we don’t intend to assign names to every variant. Unambiguous distinction of these sequences (in a manuscript for example) is best done using the sequence itself.

  18. sarbani saha says

    Hi Sam
    I am a PhD. 1st year student and I am going to work on micro RNA. For that the micro RNA database is required but I cannot open the .dat file and also other file format which I have downloaded from miRBase. Can you kindly tell me the procedure to get these files?

    • sam says

      The .gz files are a zipped file format. On windows machines, winzip or similar should be able to unzip them. There are also .zip files available on the FTP site.

  19. MiReg says

    In many previous publications, it has been mentioned mir-125b or miR-125b etc. My question is whether authors are trying mean mir-125b-5p? When a literature mentions with out any -5p/-3p for a miRNA what should it mention, whether its -5p or -3p?

    • sam says

      I’m afraid you can’t tell for sure just from a name like miR-125b which of the two possible mature sequences the authors are referring to. In all likelihood, they are referring to the -5p sequence in this case, but not necessarily in other cases. Each miRNA entry has a list of previous names. You can see from this page:

      …. that miR-125b-5p was previously called miR-125b in miRBase.

  20. math says

    I believed that I understood the nomenclature until I bumped into miR-203a and miR-203b. There is little apparent similarity between miR-203a and miR-203b (previously called hsa-miR-3545). Am I just missing something? How similar two mature miRNAs should be to assign them into one family and to use the suffix ‘a’, ‘b’, etc?

    • sam says

      hsa-mir-203a and hsa-mir-203b are expressed from opposite genomic strands at the same location — i.e. they are a sense/antisense pair. Because of the base-pairing in a hairpin, sense and antisense miRNAs tend to have similar sequences. You can see this on the website — go here:

      Tick the boxes next to hsa-mir-203a and hsa-mir-203b. Then select “stem-loop alignment”, “Clustalw alignment”, and click “fetch”.

  21. math says

    thanks Sam. Now I get it. I think.

    However, I must add that liked the system better when the ‘a’ and ‘b’ suffix implied that the two mature miRNAs had identical/very similar seed sequence and hence, (presumably) similar/overlaping functions. I am aware of the fact that this is an oversimplification, still in large-scale expression studies it was always reassuring to see how many times ‘a’ and ‘b’ etc. forms are co-regulated. Now the ‘a’ and ‘b’ suffix may imply nothing else then physical proximity of the genes encoding them.

    For example the mature miR-203a and miR-203b does not seem to have a similar seed sequence, as far as I can tell. Because of this, I would say that they belong to one miRNA ‘cluster’, because of an overlap between the genes encoding their mature forms, but not to the same family, because otherwise they are quite different from each other. With the new system I am still a bit unsure about the difference between miRNA families and miRNA clusters.

    Anyway, this is only my personal opinion. This is your system, you have the right to decide.

    • sam says

      BTW, I think it’s useful to think about s/as pairs differently from clustered miRNAs, because s/as pairs cannot be processed from the same transcript. In fact, s/as pairs are special cases that are evolutionarily linked, but not transcriptionally linked.

  22. sam says

    There’s no conceptual change in the system here. You can argue this is a marginal case, but the mature sequences do still have significant similarity. From the 5′ arms (even though the miR-203a-5p sequence isn’t annotated in miRBase — it should be!), you get this alignment:


    That’s identical seeds (defined as nts 2-7). The 3′ arms don’t have identical seeds, but they are similar (and I’m afraid I’m one of those nay-sayers who doesn’t buy that the seed is everything):


    I think it’s actually pretty tough for sense/antisense pairs to look different enough from each other to warrant different numbers.

  23. math says

    many thanks for the clarification!

  24. Masood says

    Hi Mr. Sam.
    In fact, I am a little bit confused, what is the difference between hsa-miR-34b and hsa-miR-3b*. ? In my study hsa-miR-34b* is highly down-regulated than hsa-miR-34b (about 10 times fold change)? What does that mean? Another question, what is the difference between miR-34b in Human and miR-34b in mouse for example.

    • sam says

      miR-34b and miR-34b* are mature sequences from opposite arms of the hairpin precursor. The sequences are very different, so they will have different targets. (As this article says, these sequences are no longer called miR-34b and miR-34b*, rather miR-34b-5p and miR-34b-3p.) The human and mouse sequences are orthologs, but there may be small differences in the sequences between the two species.

  25. Alexis says

    Hi Sam,
    If the miRNA labeled in the miRBase using the same initial name, like miR-166a, b, c, d and so on, can we call them belonging to the same family, the family miR-166? The members of the same family should be identical at nucleotides 2-8, did miRBase follow this to name each individual?

    • sam says

      There are several definitions of a family. You allude to one: mature sequences with the same seed are often called a miRNA seed family. However, we use a broader definition more like a standard view of a gene family — miRNA hairpin sequences that are similar enough to infer that they have evolved from a common ancestor will be members of the same family, and will *usually* get the same number. Since the mature sequences are usually the most conserved parts of a miRNA hairpin, this generally means high sequence similarity between mature sequences. However, sometimes a name choice is a pragmatic call from conflicting evidence. The post in which you’ve commented explicitly advises that you don’t use the names alone to infer complex relationships like this.

  26. Mumtaz says

    We write miRs in 5′ to 3′ direction regardless whether it is a 3p or 5p miR. Can you please clarify whether all miRs target 3′ UTRs in their 5′-3′ orientation. Why we see in Tragetscan, for example, UTRs as 5′-3′ and miR sequences 3′-5′ when in reality miRs are 5′-3′. This question is disturbing me! Your reply is apprecaited

    • sam says

      RNAs (like DNA) base-pair in an anti-parallel orientation. So the usual way to draw a miRNA/target pair is something like this:

      UTR    5’      G  C      CC        3’
                      AC UCUUUA  CAUUCCA
                      UG AGAAAU  GUAAGGU
      miRNA  3’ UAUGUA  A                5’
  27. Carol says

    I am having some troubles trying to find a “standarized” way to name the mutations that I find and are not described yet. I wonder if I should start counting from the beggining of the microRNA or from the beggining of the pre-microRNA. Any explanation, paper or site describing how to name mutations in microRNA will be extremely helpful.
    Thank you so much.

  28. sam says

    I don’t know of a standard way to do this. As long as it is clear which you are doing, I think it would be OK to number from either the mature sequence or the hairpin. A couple of things to be aware of though:

    1. The hairpin sequences in miRBase are not strictly speaking the precursors. The precursor is defined by Drosha cleavage, which by definition occurs at the outside end of the mature duplex. miRBase hairpins are often a little longer.

    2. The ends of both hairpin and mature sequences in miRBase are subject to change over time. That means that however you define your positions you should probably also show the sequence to be unambiguous.

  29. EZ says

    If I understood well, hsa-miR-138-1 and hsa-mir-138-2 means that the mature sequences are identical for miR-138-1 and miR-138-2 but they originate from different precursors. Is that right? If so, why are the mature sequences not identical for this miR in question?

    hsa-miR-138-1-3p .GCUAcUUCACaACACCAGGGcc
    hsa-miR-138-2-3p .GCUAuUUCACgACACCAGGGuu

    Thank you for your help!

    • sam says

      I’m sorry your comment got missed. In this case, the dominant mature sequences arises from the 5′ arm of the hairpin, and is called hsa-miR-138-5p. That product is identical from the two hairpin loci hsa-mir-138-1 and hsa-mir-138-2, and the hairpins were named on that basis. The passenger strand sequence is different between the two loci, and so the -1 and -2 suffixes are used in the mature names to track which hairpin produces which sequence.

  30. Kim says

    Is the * the same as the # which is used to name the TaqMan miRNA assays? I have a dataset full of # and I was relabeling with the -3p and -5p but assumed the # was the same as *. Wanted to be sure before I publish!

    • sam says

      Hi Kim — I’m sorry for the delay. Given the possibility of sequence changes between database (and assay) versions, I’m afraid the safest way to be sure that these names are referring to the same sequence is to check the sequence itself.

  31. Tariqul Tareq says

    Dear Sam
    I have two questions.
    1. If we find miRNAs that are matched (for example) against ath-156a, osa-156a-5p and gma-156a-3p with significant reads for each of them; in that case when we shall put name; will we address them as 156a, 156a-5p and 156a-3p?
    2.If we find miRNAs that are matched with mir156b(for example) but in different species and there are significant sequence variations among them, even in the seed region(2-8). What will be their nomenclature?For your convenience, i have given a example below:
    Some of sequences matched with pta-miR156b and some with cca-miR156b(Deep sequencing data)
    How shall we put name for these?

    Thanks in advance

    • sam says

      Ultimately, we will assign the final names for you when your paper is accepted for publication. In short, the names are mostly driven by the hairpin loci. So if you have a hairpin that is clearly related to ath-MIR156a, the most important thing is that it gets the name MIR156. We can try to make the lettered suffixes match up, but sometimes that’s not possible. The mature sequences that derive from that hairpin will then get the names miR156-5p and miR156-3p, even if they have some sequence differences. Please let us know when you have sequences for naming.

  32. Kyle Caligiuri says

    ‘We can dynamically annotate the most abundant mature miRNA from each precursor arm (by automatic analysis of deep seq datasets, for example), and treat offset sequences in the same way that isomiRs are treated…’

    I think this is probably the best way to go. Also, in regards to species specificity, this may be something to think about as well. In doing my own deep sequencing projects, species specificity and ortholog identification may be based on the parameters set when mapping to miRBase. Comparing the same dataset and allowing 0 mismatches or 1 mismatch can result in quite variable data. For example, with 0 mismatches we didn’t detect mmu-miR-141 in mouse brain tissue, but hsa-miR-141 was identified as a novel ortholog. Allowing 1 mismatch identified the miRNA as mmu-miR-141, and upon looking at the sequence it was due to 1 bp extension at the 3′ end, which favoured the hsa version. I wonder if orthologs should eventually be compiled and only the dominant form be annotated with the species identifier. I guess time will tell as more NGS data is compiled!

    • sam says

      Hi Kyle

      We would like to compile ortholog sets, but the ortholog concept makes sense across the hairpin locus, not for individual mature miRNAs (isomiRs). The correct way to do this kind of thing is therefore with sequence analysis that takes account of more than just the mature sequence — similarity across the hairpin locus, ideally supplemented with synteny analysis.

  33. naman says

    I am still confused between miRNA clusters and miRNA family.
    like mmu-miR-486 and mmu-miR-3107 belong to one family and same cluster and share the same sequence(one nucleotide change in 3p) as well.
    and mmu-miR-92b and mmu-miR-25 belong to same family but share no sequence similarity , so how are they related.

  34. naman says

    Is the miRNA with less no of reads(between 5p and 3p) useless and has no role in gene regulation?

Some HTML is OK

or, reply to this post via trackback.