Skip to content

miRBase web site down time, Oct 22nd-23rd

Essential network and electrical work in our server room work means that the web site is at risk of intermittent down time on Monday 22nd and Tuesday 23rd October. Apologies for any inconvenience.

Posted in down time.

miRBase 19 released

miRBase 19 is now available, brought to you from the Benasque RNA meeting in the sunny Pyrenees, and with a slightly larger time gap than usual. In that extended time, we have added more than the usual number of new sequences — 3171 new hairpins and 3625 novel mature products, bringing the totals to 21264 and 25141 respectively in 193 species. As always, the full README file is available on the FTP site, along with downloadable files containing all data in various formats.

We have spent some time deleting misannotated sequences, and the deep sequencing read views will allow us to focus more on this — 133 entries are removed in this release, many from the rice miRNA complement. We have also cleaned-up a number of cases of duplicate entries mapping to a single genomic locus (some prompted by new genome assembly releases) and rationalised many miRNA names. This is therefore a good time to remind you that the names are meant to be useful, but are not formally stable, and shouldn’t be used to convey complex information. The miRNA accession numbers *do* remain stable between releases, and of course, you can always quote the sequence to be truly unambiguous.

In this release, the miR* nomenclature is finally retired for all species, as previously promised. For every hairpin and mature sequence, all IDs that have previously been used in miRBase are now visible on the entry pages, and are downloadable in bulk from the FTP site.

At the time of writing, we have not added new deep sequencing datasets to the read view pages — however, a decent sized update to that section will be coming along shortly, together with an announcement here.

As always, comments, questions, abuse, praise all welcome here or by email.

Posted in data update, releases.

miRBase 19 is coming …

We’re scrambling to release miRBase 19 in the next few days from the Benasque RNA meeting in middle of the sunny Pyrenees. We have over 3000 new sequences, including the first entries for 25 new species (mostly plants). We’ve also put some effort into cleaning up some old entries, deleting over 130 misannotated sequences. More soon.

Posted in releases.

Missing comments

A corrupt database has led to the loss of any blog comments left in the past 10 days or so. Please feel free to email or re-post. Apologies for any inconvenience.

Posted in Uncategorized.

miRBase, Wikipedia and community annotation

Many miRBase entry pages have a new “community annotation” section (see, for example, dme-mir-10). This section incorporates information about specific microRNA families and sequences taken directly from the free, online encyclopedia, Wikipedia. In total, over 4500 miRBase entries currently include information from Wikipedia. We show the summary paragraph from the Wikipedia page, the full page, and a link to edit the page in Wikipedia. Any edits will appear in Wikipedia immediately, and in miRBase within 24 hours.

There is already a large amount of information in Wikipedia about specific microRNA sequences and families. We hope that distributing this information in miRBase, and providing links to edit the pages, will encourage miRBase users and microRNA experts to contribute their knowledge in the form of Wikipedia edits and new pages. Textual annotation of microRNAs in miRBase is therefore now firmly in the hands of the microRNA community.

Anyone can edit a Wikipedia page, and editing a page is straightforward. However, Wikipedia has strict policies and guidelines about how to edit and create pages. Adhering to these guidelines makes it much more likely that your contributions will survive. The following help pages on the Wikipedia site provide detailed information about how to keep Wikipedians happy:

The most important thing to remember is that information you add should be substantiated, preferably with literature citations. You’ll see that lots of existing Wikipedia microRNA pages have fairly minimal information. We’re compiling a list of microRNA pages that are in need of some attention, here. Please take a look, and consider adding some information. Pages such as the mir-10 entry, and the mir-2 family page provide excellent models for what makes a great microRNA page.

You can also create new pages at Wikipedia about microRNA sequences and families that have miRBase entries, but don’t currently have Wikipedia entries. Please let us know if you do this, so we can incorporate your annotation into miRBase, and create the appropriate links from miRBase entries to the relevant Wikipedia pages. The most important thing to remember if you’re considering making a new Wikipedia page about a microRNA is that your contribution should be “notable”. A microRNA of completely unknown function is unlikely to be worthy of a Wikipedia page. However, if you’ve just published a paper that describes the evolution of the mir-277646 family, and its function as a core regulator of the cell cycle, then a Wikipedia page is certainly deserved.

Let us know what you think, here or by email to the usual address.

Now go edit!

This effort is building on that of the Rfam database of RNA families, which has paved the way in incorporating RNA information (and biological annotation more generally) into Wikipedia, led by Alex Bateman with all the real work done by Jen Daub, John Tate and Paul Gardner. We are extremely grateful to them for allowing us to steal code and lists of relevant Wikipedia pages.

The following sources provide detailed information about the Rfam/Wikipedia alliance, and its success:

Daub J, Gardner PP, Tate J, Ramsköld D, Manske M, Scott WG, Weinberg Z, Griffiths-Jones S, Bateman A. The RNA WikiProject: community annotation of RNA families. RNA. 2008 14(12):2462-2464.

Logan DW, Sandal M, Gardner PP, Manske M, Bateman A. Ten simple rules for editing Wikipedia. PLoS Comput Biol. 2010 6(9):e1000941.

Bateman A, Logan DW. Time to underpin Wikipedia wisdom. Nature. 2010 468(7325):765.

Posted in community annotation, new features.

MicroRNA Wikipedia pages in need of attention

The following Wikipedia pages about microRNA sequences and families could do with some loving care. Please take a look, and consider adding information about microRNA function, evolution, discovery, and references. Feel free to comment here, or email us at the usual address, if you make changes worthy of removing pages from this list.


mir-92_microRNA_precursor_family (Intro section is out-of-date and needs a re-write)

Posted in community annotation, new features.

miRBase website “at risk”, Thu 10th to Fri 18th Nov

Due to server room refurbishment, the miRBase website may experience some instability between Thu 10th and Fri 18th November 2011. The plan is for just 30 minutes or so down time at either end of that period, but the website should be considered “at risk” throughout. Apologies for any inconvenience.

Posted in down time.

miRBase 18 released

After a little more pain than usual, miRBase 18 is finally released. The database contains 18226 entries representing hairpin precursor miRNAs, expressing 21643 mature miRNA products, in 168 species. That represents 1488 new hairpin sequences and 1929 novel mature products. The full README file is available on the FTP site.

As previously discussed, we have continued to rename mature sequences, phasing out the miR/miR* nomenclature in favour of the -5p/-3p nomenclature. That affects approximately 1400 mature sequences this time, from human, mouse and C. elegans. (We had planned to do rat as well, but decided to hold off until we had incorporated more rat deep sequencing data.)

There are also significant changes to the zebrafish miRNA complement, rationalising the entries with respect to the (now not so new) Zv9 genome assembly. That has lead to the deletion of 26 zebrafish entries, and the creation of 12 entries that represent duplicate loci. The full list of changes are itemized in the miRNA.diff file on the FTP site.

The website also shares new deep sequencing data — now approaching 250 datasets from NCBI GEO. In addition to raw read counts, we also show normalized read counts, currently calculated as reads per thousand reads that map to miRNAs (designated RPT on the website). We have also implemented a new feature to allow the comparison of normalized read counts from multiple experiments. For example, from the list of all D. melanogaster datasets (accessible from the “by tissue expression” box on the search page), you can tick up to 5 different experiments to compare read counts. This is getting dangerously close to allowing some really complex and powerful analyses through the website! You can also download the read counts from the results page, for offline processing. This is all Ana Kozomara’s work. As with all new features, it is wise to consider this to be in beta. We’ll be very happy to get your comments, bugs, praise, and abuse, as usual, here or by email.

Posted in data update, new features, releases.

miRBase 18 is coming …

If all goes to plan, miRBase 18 should be released this week. The release contains 18226 entries representing hairpin precursor miRNAs, expressing 21643 mature miRNA products, in 168 species. That represents 1488 new hairpin sequences and 1929 novel mature products. We have continued to rename miR/miR* pairs using the -5p/-3p nomenclature (see the previous blog post about this), and to add mature products with deep sequencing evidence. More soon …

Posted in releases.

What’s in a name?

As I briefly mentioned in a previous post, miRBase 17 included two conceptual changes in the miRNA nomenclature scheme, which deserve further detail and clarification.

The name of a miRNA contains some human-readable information. If you stop reading this post halfway, you’ll likely think this is a good thing. Which of course it is, as long as we recognise the limitations. Hold on to the end and hopefully you’ll see that names can create some issues.

Take for example, hsa-mir-20b. The “hsa” tells us it is a human miRNA. The “20″ tells us that was discovered early — it’s only the 20th family that was named. “20b” tells us that it is related to another miRNA that we can guess is probably called hsa-mir-20a. We can go further — the (lack of) capitalisation of “mir” tells us we’re talking about the miRNA precursor. Or maybe the genomic locus, or maybe the primary transcript, or maybe the extended hairpin that includes the precursor. So that’s already less useful.

hsa-mir-20b has two mature products, named hsa-miR-20b and hsa-miR-20b* (as of this moment — as you’ll see below, this will change). “miR” tells us we’re talking about a mature sequence. In this case miR-20b arises from the 5′ arm of the mir-20b hairpin, and miR-20b* arises from the 3′ arm. The “*” tells us that miR-20b* is considered a “minor” product. That means miR-20b* is found in the cell at lower concentration than miR-20b. It is often inferred that miR-20b* is non-functional, and you’ve probably noticed that miR* sequences in general magically disappear in most pictures of miRNA biogenesis, while the dominant arm is magically incorporated into the RISC complex.

But hang on a minute, a bunch of papers now tell us that miR* sequences can be functional (eg Yang et al. 2011), perhaps through binding different Agonaute proteins (a glut of papers in the past couple of years nicely reviewed by Czech and Hannon, 2011). And, of course, the miR* sequence from one hairpin might be expressed at orders of magnitude higher level than the dominant miR sequence from another hairpin. Perhaps the arm that makes the dominant product can change in different tissues, stages and species (G-J et al. 2011). Should we rename miR and miR* sequences every time someone produces an ever deeper sequencing dataset? To cap it all, the “*” character causes problems for database searches and the like.

We therefore intend to retire the miR/miR* nomenclature, in favour of the -5p/-3p nomenclature (the latter has been used in parallel for mature products of approximately equal expression, and will in future be applied to all sequences). We will make this transition in phases, as we can make companion data available to show the expression of mature products from each arm. In miRBase 17, all Drosophila melanogaster mature sequences are renamed as -5p/-3p, and many previously missing second mature products have been added. The available deep sequencing data makes clear which of the potential mature products is dominant. Other species will follow suit in due course.

The second change in miRBase 17 concerns the small number of pairs of miRNA sequences that are transcribed from the same locus in opposite directions — that is, sense/antisense pairs. For example, the dme-mir-307 locus has been shown to be transcribed in both directions, and both transcripts are processed to produce mature miRNAs. These miRNAs were previously named dme-mir-307 and dme-mir-307-as in miRBase. The -as is confusing, because it is similar to the suffixes used to denote families of related miRNAs. The classification of sense and antisense is arbitrary. To confuse matters further, -as and -s were used in early miRNA literature to refer to mature products produced from the 5′ and 3′ arms of a hairpin precursor. From miRBase 17 onwards, the -as nomenclature is retired. Sense and antisense miRNAs will be named independently and in the same way as all other sequences: If the sequences are similar then they get a, b suffixes (eg dme-mir-307a and dme-mir-307b), and if they are not deemed similar enough then they get different numbers (eg rno-mir-151 and rno-mir-3586).

The combined result of these changes is that the name of a miRNA contains less information than previously. This may seem like a retrograde step. However, the problem with encoding information in the name is that people are tempted to use it. MicroRNA names are often pragmatic compromises, and have been overloaded with relatively complex meaning, for example, regarding family relationships and expression levels. Names should be useful, but should never be used in place of the correct analysis, for example, of sequence relationships or expression. We therefore suggest that you’ll find your miRNA life easier if you bear in mind some simple concepts:

1. Be explicit. If you are referring to the mature miR-20b sequence, you could rely on the capitalisation in miR-20b to say that for you. But it is much better to say “the mature miR-20b sequence”. Even better, show the sequence along with the name; names are not formally stable, but quoting the specific sequence you’ve used in your paper will ensure the entity is traceable forever.

2. Never use the name to encode or derive complex meaning. If you are interested in sequence relationships, you should do some sequence analysis. If you care about expression levels of alternate mature miRNAs, look at expression data. If you derive all your information about miRNA sequence relationships from the name, you will miss a great deal. If you rely on the name to tell you about relative expression then all hope is lost.

Posted in nomenclature.