You might have noticed some additional information on the mature miRNA pages in the last few weeks. See for example:
http://mirbase.org/cgi-bin/mature.pl?mature_acc=MIMAT0000123
http://mirbase.org/cgi-bin/mature.pl?mature_acc=MIMAT0000069
The new section “QuickGO function” contains a set of high quality manual annotations of Gene Ontology terms for mature miRNAs, the vast majority of which come from the work of Rachael Huntley et al. at the UCL Functional Gene Annotation group. The annotation has been an Herculean biocuration task — more than 4000 GO terms assigned to nearly 400 miRNAs, all from expert reading of primary literature. Human miR-21 is the star — 244 GO terms:
http://mirbase.org/cgi-bin/mature.pl?mature_acc=MIMAT0000076
We’re pulling these data from the EBI’s QuickGO database. Their webservices make this straightforward (thanks Tony!). It’s also worth noting that the GO terms are actually assigned to RNAcentral IDs. RNAcentral maintains mappings of IDs between RNA sequence databases, including miRBase. Again, this legwork makes the task of providing these annotations much easier than it would otherwise be.
Functional information has been generally lacking in miRBase. These GO data have already made a significant difference to this, and we’re planning more. Look out for functional statements from text mining of the primary literature, coming to a web browser near you soon.
Rachael et al.’s latest paper on GO annotation of miRNAs is in the preprint section at RNA:
Expanding the horizons of microRNA bioinformatics.
Rachael P Huntley, Barbara Kramarz, Tony Sawford, Zara Umrao, Anastasia Z Kalea, Vanessa Acquaah, Maria-Jesus Martin, Manuel Mayr and Ruth C Lovering.
RNA 2018
See also:
Guidelines for the functional annotation of microRNAs using the Gene Ontology.
Huntley RP, Sitnikov D, Orlic-Milacic M, Balakrishnan R, D’Eustachio P, Gillespie ME, Howe D, Kalea AZ, Maegdefessel L, Osumi-Sutherland D, Petri V, Smith JR, Van Auken K, Wood V, Zampetaki A, Mayr M, Lovering RC.
RNA 2016 22:667-676.
The GOA database: Gene Ontology annotation updates for 2015.
Huntley RP, Sawford T, Mutowo-Muellenet P, Shypitsyna A, Bonilla C, Martin MJ, O’Donovan C.
Nucleic Acids Research 2014 43:D1057-D1063.
QuickGO: a web-based tool for Gene Ontology searching.
Binns D, Dimmer E, Huntley R, Barrell D, O’Donovan C, Apweiler R.
Bioinformatics 2009 25:3045-3046.
Posted in new features.
By sam
– June 7, 2018
After repeated and unreasonable delay, miRBase 22 is finally released. As you might expect with such a long gap, the number of sequences in the database has jumped significantly — by over a third. The vast majority of the increase comes from new microRNA annotations in species not previous represented in the database. Indeed, there are sequences for 48 new species in this release. Still, we know we are missing microRNA annotations that have been published. We apologise for that, and will be working hard to catch up and get back to more timely data releases. Please let us know if we are missing your data.
Other new things:
- We’ve changed how we collect and manage the deep sequencing datasets that you can see in the miRBase read views. The number of deep sequencing datasets that we have mapped has jumped in this release — to 831. We have around 1000 more datasets mapped and ready to go, but we’ve hit a technical issue with database size and speed for the website, for which we didn’t want to hold up the release any further. As soon as we’ve fixed that problem, the deep sequencing data views in miRBase will expand dramatically. With that update, we expect the number of microRNA annotations that will be classified as “high confidence” to also jump significantly.
- We’re developing interfaces to keep track of the changes in miRBase over time. The first view of that is available in miRBase 22 — click the “change log” links on the microRNA entry pages to see.
- We’re also developing views of functional data, incorporating both literature mining, and the excellent work of Huntley et al. (RNA 2016 22:667-676). The first views of that will appear on the microRNA entry pages shortly.
- Look out for a programmatic webservice to retrieve sequences, also coming shortly.
As always, please let us know if you have comments, questions, suggestions.
Sam and Ana
Posted in data update, new features, releases.
By sam
– March 12, 2018
As mentioned previously, we briefly held off from releasing the set of “high confidence” miRNAs for miRBase 21, because of a last-gasp bug. Those data are now available, tagged with the label “high confidence” on the entry pages, and for download on the FTP site. The total number of miRNAs labelled “high confidence” has increased by 168, to 1996. That increase is partly due to our incorporation of more deep sequencing datasets, and also because we’ve relaxed one criterion:
In miRBase 20, a high confidence sequence must have at least 10 reads that map to each of the two mature sequences (-5p and -3p). In miRBase 21, high confidence sequences must either (a) have at 10 reads mapping to each arm, as before, *or* (b) have at least 5 reads mapping to each arm *and* at least 100 reads mapping in total. The latter case helps us to catch some of the well-established, highly expressed miRNAs that have very high arm expression bias — that is, a large number of reads mapping to one arm, and a small number to the other.
A few sequences labelled as high confidence in miRBase 20 have disappeared in the miRBase 21 set, some because of the aforementioned bug.
Facilities remain in place for you to vote for whether or not you agree with our high confidence assertions — see individual entry pages, and sequencing read views.
Posted in Uncategorized.
By sam
– July 3, 2014
Apologies for the longer-than-usual wait.
miRBase 21 is now available on the website, and all data available for download on the FTP site. As usual, the release notes describe the major changes. Of particular note this time, the Genome Reference Consortium have released a new human genome assembly, GRCh38. We have therefore remapped the human microRNA dataset to this assembly, which includes the removal of a handful of duplicate entries that now map to a single locus — for example, GRCh37 had 6 loci representing miR-3118, whereas GRCh38 has only 4. In total, there is a small increase in the number of annotated human microRNA loci, to 1881. Elsewhere in the database, the increases have been larger — we have hundreds of new sequences in each of bat, horse, goat, cobra and salmon, amongst others. In total, 4196 new hairpin sequences and 5441 new mature products have been added. The work to clean up dubious and misannotated sequences also goes on, with another 72 entries in total removed from this release.
Unfortunately, at the last moment, we’ve found an issue with the update of the “high confidence” microRNA dataset. Rather than delay the release further, we’ve decided to go ahead without the “high confidence” set for now. That will follow in the next few days, with an announcement here.
As usual, please let us know (use the comments box below, or by email) if you have any questions or comments.
Posted in releases.
By sam
– June 26, 2014
The release of miRBase 21 has taken much longer than we would have liked, but it’s nearly there now. We anticipate making the data available within the next week. Because of the time since the last release, it’s another hefty update, with over 4000 new hairpin sequences, and over 5000 new mature sequences. The new sequences mostly represent organisms that previously had few or no microRNA annotations. More soon ….
Posted in releases.
By sam
– June 16, 2014
Due to some essential network maintenance, the miRBase website is at risk of short periods of down time between 8 and 10am GMT on Tuesday 4th Feb. We apologise for any inconvenience.
Posted in down time.
By sam
– January 30, 2014
The 2014 Database Issue of Nucleic Acids Research includes an update paper about miRBase. In particular, we describe how we are using publicly available deep sequencing data to classify a subset of miRBase microRNA entries as “high confidence”. A post with more details about the associated changes to the website is coming shortly …..
Posted in papers.
By sam
– December 31, 2013
Read no further unless you care about the MySQL database dumps in the database_files directory on the FTP site.
A couple of people (many thanks Jeff and Jakob) found errors in the the release 20 MySQL database dumps: a small number of new mature sequences were not linked to their hairpin precursors, and the ends of a smaller number of old mature sequences were off by 1. The table affected was mirna_pre_mature. If you’re using these dumps you will probably want to grab the fixed version from the FTP site (timestamp 17/7/2013). You might notice other files in that directory with the same new timestamp — feel free to grab those too, but you are much less likely to care about those (the changes are either cosmetic or updated links to other resources). The FASTA format sequence files, the EMBL format data file, and all other data dumps were unaffected by these bugs.
Apologies for any inconvenience.
Posted in bugs, data update, releases.
By sam
– July 17, 2013
Phew. After considerably more pain and tears than usual, miRBase 20 is finally available on the website and for download on the FTP site (see also the README file). The gap between releases has also been longer than usual, which means that the increase in data is greater than usual (probably explaining the increase in pain). In all, we have 3355 new hairpin sequences and 5393 new mature microRNAs from around 40 new publications, increasing the totals to 24521 hairpin sequences and 30424 mature sequences. As always, the full list of additions, deletions and name changes in available in the miRNA.diff file on the FTP site, along with all other miRBase data in various file formats. There are minor changes to the structure of the MySQL database underlying the website, and therefore to the database dumps. As we still don’t have sensible documentation for those dumps, you should ask if you care about this.
Ana has also spent a fair bit of time adding datasets to the deep sequencing section of the site: we have now mapped reads from 306 small RNA deep sequencing experiments to miRBase hairpins, increasing the coverage to 37 species. In all, approximately 25% of all mature microRNAs have at least 10 reads mapping to them across all datasets. As we’ve said before, these data can be used for expression analysis, and for judging the validity of microRNA annotations. We’ve been working on a system to use these aggregated data to assess the confidence in a given microRNA annotation, and allow users to filter the data by this confidence measure. We aim to have something to show on that in the next release or two. Feel free to point us in the direction of publicly available datasets that we don’t already capture, preferably in the form of a GEO or SRA accession.
Comments, criticism, suggestions, abuse to the usual address.
Posted in data update, releases.
By sam
– June 24, 2013
Recent comments