What do the miRNA names/identifiers mean?
The numbering of miRNA genes is simply sequential. For instance, at the time of writing the last published miRNA was mouse mir-352. The next novel published miRNA will get the number 353. However, if you submit an Xenopus miRNA that is identical to human mir-121 for example, we will suggest you also name your sequence mir-121.
The names/identifiers in the database are of the form hsa-mir-121. The first three letters signify the organism. The mature miRNA is designated miR-121 in the database and in much of the literature, whilst mir-121 refers to the miRNA gene and also to the predicted stem-loop portion of the primary transcript. Distinct precursor sequences and genomic loci that express identical mature sequences get names of the form hsa-mir-121-1 and hsa-mir-121-2. Lettered suffixes denote closely related mature sequences -- for example hsa-miR-121a and hsa-miR-121b would be expressed from precursors hsa-mir-121a and hsa-mir-121b respectively.
miRNA cloning studies sometimes identify two ~22nt sequences miRNAs which originate from the same predicted precursor. When the relative abundancies clearly indicate which is the predominantly expressed miRNA, the mature sequences are assigned names of the form miR-56 (the predominant product) and miR-56* (from the opposite arm of the precursor). When the data are not sufficient to determine which sequence is the predominant one, names like miR-142-5p (from the 5' arm) and miR-142-3p (from the 3' arm). An older convention sometimes used miR-142-s and miR-142-as.
miRNAs that do not conform to these ideas have in some cases been renamed in the database. There are however a few published exceptions to these rules that are accommodated. For example, different organisms have slightly different naming conventions -- in plants, published names are of the form MIR121. Viral miRNAs also adopt a slightly different naming scheme. For this reason it is unwise to rely on capitalisation to confer information, such as the mir/miR precursor/mature convention. let-7 and lin-4 are obvious exceptions to the numbering scheme, and these names are retained for historical reasons. New submissions of homologues of let-7 or lin-4 will also acquire these names.
Please note that miRNA names are able to convey only limited information, and are entirely unsuitable to encode information about complex sequence relationships. You should not therefore rely on the name to tell you all you need to know about the sequence. Sensible database approaches should instead use dedicated fields and annotation to describe such relationships, such as the "family" data provided here.
Criteria and conventions for miRNA identification and naming are described in the following short article:
Victor Ambros, Bonnie Bartel, David P. Bartel, Christopher B. Burge, James C. Carrington, Xuemei Chen, Gideon Dreyfuss, Sean R. Eddy, Sam Griffiths-Jones, Mhairi Marshall, Marjori Matzke, Gary Ruvkun, and Thomas Tuschl. A uniform system for microRNA annotation. RNA 2003 9(3):277-279.
In addition to a name or ID, each miRBase Sequence entry has a unique accession number. The accession number is the only truely stable identifier for an entry -- miRNA names may change from those published as relationships between sequences become clear. The advantage of the accessioned system is that such changes can be tracked in the database, allowing names to evolve to remain consistent, whilst providing the user with full access to the data and history. However, accessions convey little biological meaning, and it is expected that miRNAs are referred to by name in publications.