Skip to content

miRBase 18 is coming …

If all goes to plan, miRBase 18 should be released this week. The release contains 18226 entries representing hairpin precursor miRNAs, expressing 21643 mature miRNA products, in 168 species. That represents 1488 new hairpin sequences and 1929 novel mature products. We have continued to rename miR/miR* pairs using the -5p/-3p nomenclature (see the previous blog post about this), and to add mature products with deep sequencing evidence. More soon …

Posted in releases.

What’s in a name?

As I briefly mentioned in a previous post, miRBase 17 included two conceptual changes in the miRNA nomenclature scheme, which deserve further detail and clarification.

The name of a miRNA contains some human-readable information. If you stop reading this post halfway, you’ll likely think this is a good thing. Which of course it is, as long as we recognise the limitations. Hold on to the end and hopefully you’ll see that names can create some issues.

Take for example, hsa-mir-20b. The “hsa” tells us it is a human miRNA. The “20″ tells us that was discovered early — it’s only the 20th family that was named. “20b” tells us that it is related to another miRNA that we can guess is probably called hsa-mir-20a. We can go further — the (lack of) capitalisation of “mir” tells us we’re talking about the miRNA precursor. Or maybe the genomic locus, or maybe the primary transcript, or maybe the extended hairpin that includes the precursor. So that’s already less useful.

hsa-mir-20b has two mature products, named hsa-miR-20b and hsa-miR-20b* (as of this moment — as you’ll see below, this will change). “miR” tells us we’re talking about a mature sequence. In this case miR-20b arises from the 5′ arm of the mir-20b hairpin, and miR-20b* arises from the 3′ arm. The “*” tells us that miR-20b* is considered a “minor” product. That means miR-20b* is found in the cell at lower concentration than miR-20b. It is often inferred that miR-20b* is non-functional, and you’ve probably noticed that miR* sequences in general magically disappear in most pictures of miRNA biogenesis, while the dominant arm is magically incorporated into the RISC complex.

But hang on a minute, a bunch of papers now tell us that miR* sequences can be functional (eg Yang et al. 2011), perhaps through binding different Agonaute proteins (a glut of papers in the past couple of years nicely reviewed by Czech and Hannon, 2011). And, of course, the miR* sequence from one hairpin might be expressed at orders of magnitude higher level than the dominant miR sequence from another hairpin. Perhaps the arm that makes the dominant product can change in different tissues, stages and species (G-J et al. 2011). Should we rename miR and miR* sequences every time someone produces an ever deeper sequencing dataset? To cap it all, the “*” character causes problems for database searches and the like.

We therefore intend to retire the miR/miR* nomenclature, in favour of the -5p/-3p nomenclature (the latter has been used in parallel for mature products of approximately equal expression, and will in future be applied to all sequences). We will make this transition in phases, as we can make companion data available to show the expression of mature products from each arm. In miRBase 17, all Drosophila melanogaster mature sequences are renamed as -5p/-3p, and many previously missing second mature products have been added. The available deep sequencing data makes clear which of the potential mature products is dominant. Other species will follow suit in due course.

The second change in miRBase 17 concerns the small number of pairs of miRNA sequences that are transcribed from the same locus in opposite directions — that is, sense/antisense pairs. For example, the dme-mir-307 locus has been shown to be transcribed in both directions, and both transcripts are processed to produce mature miRNAs. These miRNAs were previously named dme-mir-307 and dme-mir-307-as in miRBase. The -as is confusing, because it is similar to the suffixes used to denote families of related miRNAs. The classification of sense and antisense is arbitrary. To confuse matters further, -as and -s were used in early miRNA literature to refer to mature products produced from the 5′ and 3′ arms of a hairpin precursor. From miRBase 17 onwards, the -as nomenclature is retired. Sense and antisense miRNAs will be named independently and in the same way as all other sequences: If the sequences are similar then they get a, b suffixes (eg dme-mir-307a and dme-mir-307b), and if they are not deemed similar enough then they get different numbers (eg rno-mir-151 and rno-mir-3586).

The combined result of these changes is that the name of a miRNA contains less information than previously. This may seem like a retrograde step. However, the problem with encoding information in the name is that people are tempted to use it. MicroRNA names are often pragmatic compromises, and have been overloaded with relatively complex meaning, for example, regarding family relationships and expression levels. Names should be useful, but should never be used in place of the correct analysis, for example, of sequence relationships or expression. We therefore suggest that you’ll find your miRNA life easier if you bear in mind some simple concepts:

1. Be explicit. If you are referring to the mature miR-20b sequence, you could rely on the capitalisation in miR-20b to say that for you. But it is much better to say “the mature miR-20b sequence”. Even better, show the sequence along with the name; names are not formally stable, but quoting the specific sequence you’ve used in your paper will ensure the entity is traceable forever.

2. Never use the name to encode or derive complex meaning. If you are interested in sequence relationships, you should do some sequence analysis. If you care about expression levels of alternate mature miRNAs, look at expression data. If you derive all your information about miRNA sequence relationships from the name, you will miss a great deal. If you rely on the name to tell you about relative expression then all hope is lost.

Posted in nomenclature.

miRBase 17 released

miRBase 17 is finally released. The README has stats on the numbers of new sequences (of which there are again many). We’ve added a bunch more deep sequencing data, and there’s more on the way. There are also two important changes in nomenclature, about which we’ll write more shortly. From the README:

The -as nomenclature (previously used to designate a miRNA that is
antisense to another) is discontinued, and a small number of sequences
are renamed for this reason.

The first steps to retire the miR* designation are also taken here.
Mature sequence from all Drosophila melanogaster precursors are now
designated -5p and -3p. Many minor mature products have been added,
and others are renamed.

Some people have been asking if the views of deep sequencing data are available in bulk. We’re working on that, and we’ll post here when you can get them.

Comments, questions, abuse, praise all welcome.

Posted in data update, releases.

miRBase 17 is coming

miRBase 17 is coming, perhaps as early as tomorrow. It’s another reasonably big update — a total of 16772 hairpin entries, including more than 400 new human and mouse sequences, new deep seq datasets mapped, and even some new Drosophila melanogaster sequences! More shortly ….

Posted in data update, releases.

miRBase unavailable: Friday 4th Feb 2011, 7am-10am GMT

Due to essential building maintenance works, there will be a power outage affecting the miRBase web server on Friday 4th Feb 2011, 7am-10am GMT. The planned work should take no more than 3 hours, but the web site should be considered “at risk” for the remainder of the day.

Posted in down time.

More next generation sequencing data in miRBase

We have added more next generation sequencing datasets to the read views in miRBase. These include data from 7 metazoan genomes, 5 series of Drosophila sets from GEO and one human GEO series — that’s 56 new next generation sequencing experiments in all. The depth of coverage for some species is now extensive, for example, over 1 million reads map to the mouse mir-17 sequence.

Please, send us your feedback.

Posted in data update.

miRBase and deep-seq data: paper published in NAR

A manuscript describing the integration of deep sequencing data in miRBase is now available in Nucleic Acids Research (Database Issue, Advance Access).

Ana Kozomara and Sam Griffiths-Jones.
miRBase: integrating microRNA annotation and deep-sequencing data.
Nucleic Acids Res. (2010)
doi: 10.1093/nar/gkq1027

Posted in papers.

miRBase 16 and deep sequencing data

It’s taken a while.  miRBase is, after all, more than 8 years old. But we’ve dragged ourselves kicking and screaming into the blogosphere, on the off-chance a miRBase blog might be of some minor interest.  So here you will find, in time, news of miRBase releases, discussion of new features, ideas for future development and more.  All, of course, with questions, discussion, feedback and comment from you!

miRBase 16 has been available for a month or so.  You may or may not have noticed the new interface to view short RNA deep sequencing data mapped to miRNA sequences (see, for example, the dme-mir-282 entry). The patterns of mapped reads allow us to see the diversity of mature miRNA sequences (sometimes called isomiRs) expressed from a given locus.  Read counts, summed over many experiments, act as a proxy for miRNA expression.  We can therefore show the relative abundance of miR, miR* and isomiR sequences.  Reads can be filtered by count, experiment, tissue and stage.  We can also use the read data as detailed evidence for miRNA annotations, and so revisit previous dubious annotations.  A new search interface allows us to search for miRNAs by tissue- or stage-specific expression.  We have so far added reads from just a few datasets imported from GEO, but we’ll be adding many more in the coming weeks and months.  A manuscript describing our approaches for dealing with deep sequencing data has been accepted for publication in the NAR 2011 database issue, so look out for that!

We’re keen to hear what you think, and suggestions for improvements are always welcome, here or by email.

Posted in new features, releases.

Hello miRNA world!

The miRBase blog lives!  …. Nearly.

Posted in Uncategorized.