As mentioned previously, we briefly held off from releasing the set of “high confidence” miRNAs for miRBase 21, because of a last-gasp bug. Those data are now available, tagged with the label “high confidence” on the entry pages, and for download on the FTP site. The total number of miRNAs labelled “high confidence” has increased by 168, to 1996. That increase is partly due to our incorporation of more deep sequencing datasets, and also because we’ve relaxed one criterion:
In miRBase 20, a high confidence sequence must have at least 10 reads that map to each of the two mature sequences (-5p and -3p). In miRBase 21, high confidence sequences must either (a) have at 10 reads mapping to each arm, as before, *or* (b) have at least 5 reads mapping to each arm *and* at least 100 reads mapping in total. The latter case helps us to catch some of the well-established, highly expressed miRNAs that have very high arm expression bias — that is, a large number of reads mapping to one arm, and a small number to the other.
A few sequences labelled as high confidence in miRBase 20 have disappeared in the miRBase 21 set, some because of the aforementioned bug.
Facilities remain in place for you to vote for whether or not you agree with our high confidence assertions — see individual entry pages, and sequencing read views.
Posted in Uncategorized.
– July 3, 2014
Apologies for the longer-than-usual wait.
miRBase 21 is now available on the website, and all data available for download on the FTP site. As usual, the release notes describe the major changes. Of particular note this time, the Genome Reference Consortium have released a new human genome assembly, GRCh38. We have therefore remapped the human microRNA dataset to this assembly, which includes the removal of a handful of duplicate entries that now map to a single locus — for example, GRCh37 had 6 loci representing miR-3118, whereas GRCh38 has only 4. In total, there is a small increase in the number of annotated human microRNA loci, to 1881. Elsewhere in the database, the increases have been larger — we have hundreds of new sequences in each of bat, horse, goat, cobra and salmon, amongst others. In total, 4196 new hairpin sequences and 5441 new mature products have been added. The work to clean up dubious and misannotated sequences also goes on, with another 72 entries in total removed from this release.
Unfortunately, at the last moment, we’ve found an issue with the update of the “high confidence” microRNA dataset. Rather than delay the release further, we’ve decided to go ahead without the “high confidence” set for now. That will follow in the next few days, with an announcement here.
As usual, please let us know (use the comments box below, or by email) if you have any questions or comments.
Posted in releases.
– June 26, 2014
The release of miRBase 21 has taken much longer than we would have liked, but it’s nearly there now. We anticipate making the data available within the next week. Because of the time since the last release, it’s another hefty update, with over 4000 new hairpin sequences, and over 5000 new mature sequences. The new sequences mostly represent organisms that previously had few or no microRNA annotations. More soon ….
Posted in releases.
– June 16, 2014
Due to some essential network maintenance, the miRBase website is at risk of short periods of down time between 8 and 10am GMT on Tuesday 4th Feb. We apologise for any inconvenience.
Posted in down time.
– January 30, 2014
The 2014 Database Issue of Nucleic Acids Research includes an update paper about miRBase. In particular, we describe how we are using publicly available deep sequencing data to classify a subset of miRBase microRNA entries as “high confidence”. A post with more details about the associated changes to the website is coming shortly …..
Posted in papers.
– December 31, 2013
Read no further unless you care about the MySQL database dumps in the database_files directory on the FTP site.
A couple of people (many thanks Jeff and Jakob) found errors in the the release 20 MySQL database dumps: a small number of new mature sequences were not linked to their hairpin precursors, and the ends of a smaller number of old mature sequences were off by 1. The table affected was mirna_pre_mature. If you’re using these dumps you will probably want to grab the fixed version from the FTP site (timestamp 17/7/2013). You might notice other files in that directory with the same new timestamp — feel free to grab those too, but you are much less likely to care about those (the changes are either cosmetic or updated links to other resources). The FASTA format sequence files, the EMBL format data file, and all other data dumps were unaffected by these bugs.
Apologies for any inconvenience.
Posted in bugs, data update, releases.
– July 17, 2013
Phew. After considerably more pain and tears than usual, miRBase 20 is finally available on the website and for download on the FTP site (see also the README file). The gap between releases has also been longer than usual, which means that the increase in data is greater than usual (probably explaining the increase in pain). In all, we have 3355 new hairpin sequences and 5393 new mature microRNAs from around 40 new publications, increasing the totals to 24521 hairpin sequences and 30424 mature sequences. As always, the full list of additions, deletions and name changes in available in the miRNA.diff file on the FTP site, along with all other miRBase data in various file formats. There are minor changes to the structure of the MySQL database underlying the website, and therefore to the database dumps. As we still don’t have sensible documentation for those dumps, you should ask if you care about this.
Ana has also spent a fair bit of time adding datasets to the deep sequencing section of the site: we have now mapped reads from 306 small RNA deep sequencing experiments to miRBase hairpins, increasing the coverage to 37 species. In all, approximately 25% of all mature microRNAs have at least 10 reads mapping to them across all datasets. As we’ve said before, these data can be used for expression analysis, and for judging the validity of microRNA annotations. We’ve been working on a system to use these aggregated data to assess the confidence in a given microRNA annotation, and allow users to filter the data by this confidence measure. We aim to have something to show on that in the next release or two. Feel free to point us in the direction of publicly available datasets that we don’t already capture, preferably in the form of a GEO or SRA accession.
Comments, criticism, suggestions, abuse to the usual address.
Posted in data update, releases.
– June 24, 2013
miRBase 20 is long overdue, but should finally make an appearance within the next week. As you might expect, the extended period since the last release means many new entries — over 3000 new stem-loop sequences, and over 5000 new mature sequences. These additions mostly expand the miRNA sets of species already in the database, rather than adding new species. More soon.
Posted in Uncategorized.
– June 12, 2013
The miRBase website may be intermittently inaccessible from 8am-9am GMT on Tuesday 19th March, and all day on Saturday 23rd March, while some network and electrical maintenance is carried out. Apologies for any inconvenience.
Posted in Uncategorized.
– March 12, 2013