Skip to content

miRBase 22 release

After repeated and unreasonable delay, miRBase 22 is finally released. As you might expect with such a long gap, the number of sequences in the database has jumped significantly — by over a third. The vast majority of the increase comes from new microRNA annotations in species not previous represented in the database. Indeed, there are sequences for 48 new species in this release. Still, we know we are missing microRNA annotations that have been published. We apologise for that, and will be working hard to catch up and get back to more timely data releases. Please let us know if we are missing your data.

Other new things:

  • We’ve changed how we collect and manage the deep sequencing datasets that you can see in the miRBase read views. The number of deep sequencing datasets that we have mapped has jumped in this release — to 831. We have around 1000 more datasets mapped and ready to go, but we’ve hit a technical issue with database size and speed for the website, for which we didn’t want to hold up the release any further. As soon as we’ve fixed that problem, the deep sequencing data views in miRBase will expand dramatically. With that update, we expect the number of microRNA annotations that will be classified as “high confidence” to also jump significantly.
  • We’re developing interfaces to keep track of the changes in miRBase over time. The first view of that is available in miRBase 22 — click the “change log” links on the microRNA entry pages to see.
  • We’re also developing views of functional data, incorporating both literature mining, and the excellent work of Huntley et al. (RNA 2016 22:667-676). The first views of that will appear on the microRNA entry pages shortly.
  • Look out for a programmatic webservice to retrieve sequences, also coming shortly.
  • As always, please let us know if you have comments, questions, suggestions.

    Sam and Ana

Posted in data update, new features, releases.

12 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Paolo Zambonelli says


    I’m working on porcine miRNA detection and analysis.
    We published two paper about the detection of the miRNA trascription profile on porcine adipose tissue (Gaffo et al., 2014 doi: 10.1111/age.12192; Davoli et al., 2018 Doi: 10.1111/age.12646) connected with the NCBI GEO series GSE47748 and GSE108829 but I didn’t found any reference to these publication and gene set.
    Do you expect to update miRBase with new dateset (included those produced on the lab where I work) in the near future?

    Best regards,

    • sam says

      Hi Paolo

      Sorry for the delayed reply. I have the sequences associated with Gaffo et al, and I’ll make sure we get those into the next release. I can’t see the miRNA annotations (hairpin and mature sequences) associated with Davoli et al. Can you send those by email?

  2. Momchil says

    Dear Sam and Ana,

    Thank you for the great job you done with mirBase22.
    I was browsing in the FTP site and I saw that ‘database_files/’ folder for v.22 is missing.
    When can we expect this files published?

    • sam says

      Hi Momchil

      I’ve just made this available on the FTP site. Some ancillary tables, particularly for the mapped reads, are too large to dump now, so they aren’t there.

  3. Kun says

    the data ‘miFam.dat’ seemed also not-existed, no need ?

    • sam says

      Hi Kun

      We haven’t rebuilt the family classification for miRBase 22. Partly this is because it wasn’t very good, and partly because we don’t aim to be a family database. We’d like to utilise Rfam instead, which is an RNA family database. That might take a little while, so we’ll think about an interim solution.

  4. Sean says

    Referencing to mature sequences in the hairpin structure of miRNA’s seems to have some inconsistencies, ie, sometimes it is specified whether the mature sequence is found on the 5′ or 3′ end, but in other cases it isn’t. Also, some strand names contain inconsistencies in capitalisation. Are these due to a lack of data? Or is there anything that can be done

    • sam says

      We’ve added the -5p and -3p suffixes to mature sequences when there are mature sequences from both arms. Where only one mature miRNA has been annotated, there is some inconsistency in the mature name, as you say. We are gradually cleaning this up.

      Inconsistency in capitalisation is a bug. Can you point at some examples of this? Thanks.

      • Sean says

        Thanks for the reply.

        I found the inconsistent capitalisation in the human page for miRNA’s. An example is hsa-mir-1-1 Accession MI0000651. If you retrieve the stem loop structure it is named as >hsa-mir-1-1, but if you retrieve the mature sequences, they are each named with a capital r in “mir”: >hsa-miR-1-5p, >hsa-miR-1-3p. It isn’t too big of a problem because it’s easy to cancel out capitalisation, but it could cause future problems for someone who doesn’t spot the mistake.

        • sam says

          Ah I misunderstood. That’s entirely intentional. Mature sequences have the capital R, genes/hairpin precursors have lowercase r. That’s been the convention since the very first miRNA papers. See here (and the first miRBase papers) for more info:

  5. Khor says

    HI, where can find the total number of identified miRNA in human from mirBase database..? Appreciate your help.

Some HTML is OK

or, reply to this post via trackback.