| | | |

Genomics: The miRNA Genes

For seven years after the discovery of the lin-4 RNA, the genomics of this type of tiny regulatory RNA appeared simple: there was no evidence for Yin-4-like RNAs beyond nematodes and no sign of any similar noncodtng RNAs within nematodes. This all changed upon the discovery that let-?, another gene in the C. elegans hetero-chronic pathway, encoded a second ~22 nt regulatory RNA. The let-7 RNA acts to promote the transition from late-larval to adult cell fates in the same way that the lin-4 RNA acts earlier in development to promote the progression from the first larval stage to the second (Reinhart et al., 2000; Slack et al., 2000). Furthermore, homologs of the let-7 gene were soon identified in the human and fly genomes, and let-7 RNA itself was detected in human, Drosophila, and eleven other bilateral animals (Pasquinelli et al., 2000).

Because of their common roles in controlling the timing of developmental transitions, the lin-4 and let-7 RNAs were dubbed small temporal RNAs (stRNAs), with anticipation that additional regulatory RNAs of this type would be discovered (Pasquinelli et al., 2000). Indeed, less than one year later, three labs cloning small RNAs from flies, worms, and human cells reported a total of over one hundred additional genes for tiny noncoding RNAs, approximately 20 new genes in Drosophila, approximately 30 in human, and approximately 60 in worms (Lagos-Quintana et al., 2001; Lau et al., 2001; Lee and Ambros, 2001). The RNA products of these genes resembled the lin-4 and let-7 stRNAs in that they were ~22 nt endogenously expressed RNAs, potentially processed from brie arm of astern loop precursor (Figure 1), and they were generally conserved in evolution-some quite broadly, others only in more closely related species such as C. elegans and C. briggsae. But unlike lin-4 and let-7 RNAs, many of the newly identified ~22 nt RNAs were not expressed in distinct stages of development and instead were more likely to be expressed in particular cell types. Thus the term microRNA was used to refer to the stRNAs and all the other tiny RNAs with similar features but unknown functions (Lagos-Quintana et al., 2001; Lau et al., 2001; Lee and Ambros, 2001). Intensified cloning efforts have revealed numerous additional miRNA genes in mammals, fish, worms, and flies (Lagos-Quintana et al., 2002,2003; Mourelatos et al., 2002; Ambros et al., 2003b; Aravin et al., 2003; Dostie et al., 2003; Houbaviy et al., 2003; Kim et at., 2003; Lim et al., 2003a, 2003b; Michael et al., 2003). A registry has been set up to catalog the miRNAs and facilitate the naming of newly identified genes (Griffiths-Jones, 2004).

Like C. elegans lin-4 and /et7, most miRNA genes come from regions of the genome quite distant from previously annotated genes, implying that they derive -from independent transcription units (Lagos-Quintana et at., 2001; Lau et at., 2001; Lee and Ambros, 2001). Nonetheless, a sizable minority (e.g., about a quarter of the human miRNA genes) are in the introns of pre-mRNAs. These are preferentially in the same orientation as the predicted mRNAs, suggesting that most of these miRNAs are not transcribed from their own promoters but are instead processed from the introns, as seen also for many snoRNAs (Aravin et al., 2003; Lagos-Quintana et al., 2003; Lai et al., 2003; Lim et al., 2003a). This arrangement provides a convenient mechanism for the coordinated expression of a miRNA and a protein. Regulatory scenarios are easy to imagine in which such coordinate expression could be useful, which would explain the conserved relationships between miRNAs and host mRNAs. A striking example of this conservation involves mir-7, found in the intron of hnRNP K in both insects and mammals (Aravin et al., 2003).

Other miRNA genes are clustered in the genome with an arrangement and expression pattern implying transcription as a multi-cistronic primary transcript (Lagos-Quintana et al., 2001; Lau et al., 2001). Although the majority of worm and human miRNA genes are isolated and not clustered (Urn et al., 2003a, 2003b), over half of the known Drosophila miRNAs are clustered (Aravin et al., 2003). The miRNAs within a genomic cluster are often, though not always, related to each other; and related miRNAs are sometimes but not always clustered (Lagos-Quintana et al., 2001; Lau et al., 2001). Orthologs of C. elegans lin-4 and let-7 are clustered in the fly and human genomes and are coexpressed, sometimes from the same primary transcript, leading to the idea that the genomic separation of lin-4 from let-7 in nematodes might be unique to the worm lineage (Aravin et al., 2003; Bashirullah et al., 2003; Sempere et al., 2003). This example illustrates the possibility that even in cases where clustered genes have no apparent sequence homology, they may share functional relationships.

Some of the more interesting genomic locations of miRNA genes include those in the Hox clusters. The mir-10 gene lies in the Antennapedia complex of insects and in the orthologous locations in two Hox clusters of mammals, whereas the mir~iab-4 gene is within the insect Bithorax cluster (Aravin et a!., 2003; Lagos-Quintana et al.t 2003). In light of the roles of other genes of the Hox clusters, the Hox miRNAs are especially good candidates for having interesting functions in animal development. Other interesting loci include the mir- 15a-mir-16 cluster, which falls within a region of human chromosome 13 thought to harbor a tumor suppressor gene because it is the site of the most common structural aberrations in both mantle cell fymphoma and B cell chronic lymphocytic leukemia (Lagos-Quintana et al., 2001; Calinetai., 2002).

Nearly all of the cloned mtRNAs are conserved in closely related animals, such as human and mouse, or C. elegans and C. briggsae (Lagos-Quintana et al., 2003; Urn et al., 2003a, 2003b). This statement remains true even when ignoring evolutionary conservation as a criterion for classifying clones as miRNAs. Many are also conserved more broadly among the animal lineages (Ambros et ai., 2003b; Aravin et at., 2003; Lagos-Quintana et al., 2003; Lim et al.,2003a). For instance, more than a third of the C. elegans miRNAs have easily recognized homologs among the human miRNAs (Lim et ai., 2003a). When comparing distant lineages, considerable expansion or contraction of gene families is apparent, the most striking example being the let-7 family, which has four identified members in C. elegans and at least 15 in human, but only one in Drosophila (Pasquinelli et al., 2000; Aravin et al., 2003; Lai et al., 2003; Lim et al., 2003a).

Genomics: miRNA Expression

Many miRNAs have intriguing expression patterns. For example, paralogs and orthologs of the C. elegans lin-4 and let-7 RNAs have stage-specific expression in development as if they, too, function as stRNAs (Pasquinelli et al., 2000; Lau et al., 2001; Lagos-Quintana et al., 2002; Bashirullah et al., 2003; Lim et al., 2003a). Other interesting examples include miR-1, which is primarily found in the mammalian heart (Lee and Ambros, 2001; Lagos-Quintana et al., 2002); miR-122, which is primarily in the liver (Lagos-Quintana et al., 2002); miR-223, which is primarily in the granulocytes and macrophages of mouse bone marrow (Chen et al., 2004); miRNAs of the mir-35-mir-42 cluster, which are preferentially in the C. elegans embryo (Lau et al., 2001); and those of the mir~ 290-mir-295 cluster, which are expressed in mouse embryonic stem cells but not in differentiated cells (Hou-baviy et al., 2003). Expression array technology has been adapted to examine miRNAs and has revealed distinct expression patterns in different developmental stages or regions of the mammalian brain (Krichevsky et ai., 2003). With all the different genes and expression patterns, it is reasonable to propose that every metazoan cell type at each developmental stage might have a distinct miRNA expression profile—providing ample opportunity for "micromanaging" the output of the tran-scriptome.

Another remarkable aspect of miRNA expression is the sheer abundance of certain miRNAs in the cells. For example, miR-2, miR-52 and miR-58 are each present on average at more than 50,000 molecules per adult worm cell—a greater abundance than the U6 snRNA of the spliceosome (Lim et al., 2003a). Whether this high expression is attributable to very robust transcription or to slow decay is not yet known. Some miRNAs are expressed at much lower levels. For instance, miR-124 is present in the adult worm on average at 800 molecules per cell (Lim et al., 2003a). This lower average level (though still higher than that of the typical mRNA) might be due to low expression in many cells or high expression in just a few cells. The finding that the mouse or-tholog of miR-124 is nearly exclusively expressed in the brain supports the latter explanation (Lagos-Quintana et al., 2002).

Genomics: Computational Approaches and Gene Number

There has been some speculation as to why miRNAs were not discovered earlier; the answer is clearly not that they are rare. MicroRNAs and their associated proteins appear to be one of the more abundant ribonucleopro-tein complexes in the cell. Nonetheless, miRNAs whose expression is restricted to honabundant cell types or specific environmental conditions could still be missed in cloning efforts. Thus, computational approaches have been developed to complement experimental approaches to miRNA gene identification. From early on, homology searches have revealed orthologs and paralogs of known miRNA genes (Pasquinelli et al., 2000; Lagos-Quintana et al., 2001; Lau et al., 2001; Lee and Ambros, 2001). Another simple approach has been to search the vicinity of known miRNA genes for other stem loops that might represent additional genes of a genomic cluster (Lau et al., 2001; Aravin et al., 2003; Seitz et al., 2003; Ohler et al., 2004). This strategy is important because some of the most rapidly evolving miRNA genes are present as tandem arrays within op-eron-like clusters, and the divergent sequences of these genes make them relatively difficult to spot using the more general approaches.

Gene-finding approaches that do not depend on homology or proximity to known genes have also been developed and applied to entire genomes (Ambros et al., 2003b; Grad et al., 2003; Lai et al., 2003; Lim et al., 2003a). They typically start by identifying conserved genomic segments that both fall outside of predicted protein-coding regions and potentially could form stem loops and then score thesecandidate miRNA stem loops for the patterns of conservation and pairing that characterize known miRNAs genes. So far, the two most sensitive computational scoring tools are MiRscan, which has been systematically applied to nematode and vertebrate candidates (Lim et al., 2003a, 2003b), and miRseeker, which has been systematically applied to insect candidates (Lai et al., 2003). Both MiRscan and miRseeker have identified dozens of genes that were subsequently (or concurrently) verified experimentally. Because of their relatively high sensitivity, MiRscan and miRseeker have also enabled reasonably firm estimates of the number of miRNA genes in the genomes of human (200-255 miRNA genes; Lim et al., 2003b), C. elegans (103-120 genes; Lim et al., 2003a; Ohleretal., 2004), and Drosoph-ila (96-124 genes; Lai et al., 2003). In each species, these numbers represent nearly 1 % of the predicted genes in the genome, a fraction similar to that of other large gene families with regulatory roles, such as the homeodomain transcription-factor family.

These estimates imply that the majority of miRNA genes have now been found in the mammalian and nem-atode lineages—particularly in C. elegans, where approximately 100 miRNA genes have been identified. (This tally is conservative in that it excludes some reported genes that appear to be questionable [Ohler et al., 2004].) In Drosophila, 77 genes, representing 71 unique miRNAs, have been reliably identified (Aravin et al., 2003; Lai et al., 2003), and in humans, approximately 175 genes, representing approximately 145 unique miRNAs, have either been validated in human cells or identified based on their homology to genes validated in mouse or zebrafish (miRNA Registry, release 3.0; Griffiths-Jones, 2004). When considering the number of miRNAs remaining to be identified or validated in these species, it is important to remember that gene number estimates by MiRscan and miRseeker rest on the assumption that the stem loops of the rare, difficult-to-clone miRNAs will show patterns of conservation and pairing resembling those of the abundant, easily cloned miRNAs. This assumption appears to hold for C. elegans, for which there was a reassuring lack of correlation between the number of times an miRNA was cloned and its MiRscan score (Lim et aL, 2003a),

If instead a disproportionate number of difficult-to-clone miRNAs are also difficult to identify computationally, then estimates of the number of miRNA genes in the genome will be too low. This might be the situation in humans—perhaps because the vertebrate genomes used in the analysis are more highly diverged. Most of the first 109 miRNAs cloned from mammals have readily identifiable homologs in the genome of pufferfish (Fugu ripens), which enabled MiRscan analysis to identify 81 (74%) of these genes by scoring stem loops conserved in human, mouse, and fish (Lim et alM 2003b). Extrapolating from this sensitivity and the number of additional candidates with scores matching the known miRNAs, an upper bound on the number of human miRNA genes was calculated to be 255 (Lim et al., 2003b). However, more recently identified mammalian miRNA genes appear relatively less likely to be conserved in fish, particularly those genes cloned from embryonic stem cells and mammalian brain and the 14 miRNA candidates residing in a large imprinted cluster (Houbaviy et al., 2003; Kim et al., 2003; Seitz et al., 2003). These recent data suggest that the more difficult-to-clone mammalian miRNAs are less likely to be conserved in fish and thus less likely to have been identified computationally, which implies that a confident upper bound on the number of human genes is difficult to determine using analyses that extended to fish and that 255 is too low a value for this upper bound—although it still might exceed the actual number of human miRNA genes.

Genomics: miRNAs in Plants

Cloning of small RNAs from plants has also revealed miRNAs, although the multitude of other 21 to 24 nt RNAs found in plants sometimes complicated their initial classification (Uave et al., 2002a; Mette et al., 2002; Park et al., 2002; Reinhart et al., 2002). Like the metazoan miRNAs, the plant miRNAs (1) are endogenously expressed ~22 nt RNAs potentially processed from one arm of foldback precursors, (2) are generally conserved in evolution, and (3) come from regions of the genome distinct from previously annotated genes (Reinhart et al., 2002). To date, 20 unique Arabidopsis miRNAs have been reported; a few are closely related to each other, and thus the reported genes represent 15 distinct miRNA families. (Bartel and Bartel, 2003; Palatnik et al., 2003). Because some could be derived from multiple genomic loci, the 20 miRNAs could represent more than 40 Arabidopsis genes. The homology searches based on the cloned genes also reveal numerous potential paralogs with a point substitution or two in the predicted miRNA. Additional gene families are likely to be found when the cloning of small plant RNAs is scaled up and computational gene-finding methods are extended to plants. It appears that, as in animals, a substantial fraction of the gene regulatory molecules in plants could be RNA rather than protein.

The discovery of miRNAs in both plants and animals suggests that this class of noncoding RNAs has been modulating gene expression since at least the last common ancestor of these lineages (Reinhart et aL, 2002). Nonetheless, plant and animal miRNAs differ in some aspects, which appear to be related to differences in their biogenesis. The most notable differences are in the miRNA stem loops; the plant predicted foldbacks are much more variable in size and typically larger than those of animals (Figure 1; for a more comprehensive look at plant miRNA predicted stem loops, see online supplemental material of Reinhart et al., 2002). More subtle differences include somewhat more pairing between the miRNA and the other arm of the stem loop in plants compared to animals, a tighter distribution of plant miRNA lengths that centers on 21 nt rather than the 22-23 nt lengths most often seen in animals, and perhaps a stronger preference for a U at the 5' terminus of the plant miRNAs (Lau et al., 2001; Reinhart et al., 2002; Bartel and Bartel, 2003). These differences, together with the absence of reports that particular miRNA genes are conserved between plants and animals, leave open the prospect that miRNA genes arose independently in each of these multicellular lineages, after their last common ancestor (which is thought to have been unicellular). Even in this scenario of dual origins, the presence of miRNAs in all plant and animal species examined thus far suggests early origins in both lineages, perhaps preceding and facilitating the developmental patterning needed for multicellular body plans.

Biogenesis: miRNA Transcription

A 693 bp genomic fragment rescues the Iin-4 deficiency, implying that ail the elements required for the regulation and initiation of transcription are located in this short fragment (Lee et al., 1993). However, little is known regarding these transcriptional processes for lin-4 or any other miRNA gene. Some miRNAs residing in introns are likely to share their regulatory elements and primary transcript with their pre-mRNA host genes. For the remaining miRNA genes, presumably transcribed from their own promoters, no primary transcripts have been fully defined. Nonetheless, these primary miRNA transcripts, catted pri-miRNAs (Lee et-al., 2002), are generally thought to be much longer than the conserved stem loops currently used to define miRNA genes, as suggested by the following: (1) the idea that clustered miRNA stem loops are transcribed from a single primary transcript (Lagos-Quintana et al., 2001; Lau et al.f 2001), (2) matches between miRNAs and lengthy ESTs in the databases (Lagos-Quintana et al., 2002; Aukerman and Sakai, 2003), (3) RT-PCR experiments amplifying large fragments of the pri-miRNAs (Lee et al., 2002; Aravin et al., 2003).

The two candidate RNA polymerases for pri-miRNA transcription are pol II and pel III. Pol II produces the mRNAs and some noncoding RNAs, including the small nucleolar RNAs (snoRNAs) and four of the small nuclear RNAs (snRNAs) of the spliceosome, whereas pol III produces some of the shorter noncoding RNAs, including tRNAs, 5S ribosomal RNA, and the U6 snRNA. The miRNAs processed from the introns of protein-coding host genes are undoubtedly transcribed by pol II. The following observations provide indirect evidence that many of the other miRNAs also are pol II products, even though most of the metazoan miRNA genes do not have the classical signals for polyadenylylation (Ohler et al., 2004): (1) The pri-miRNAs can be quite long, more than one 1 kb, which is longer than typical pol III transcripts. (2) These presumed pri-miRNAs often have internal runs of uridine residues, which would be expected to prematurely terminate pol III transcription. (3) Many miRNAs are differentially expressed during development, as is observed often for pol II but not pol HI products. (4) Fusions that place the open reading frame of a reporter protein downstream from the 5' portion of miRNA genes lead to robust reporter protein expression, suggesting that miRNA primary transcripts are capped pol II transcripts. Examples of such fusions include artificial reporter constructs designed to investigate the regulation of miRNA expression (Johnson et al., 2003; Johnston and Hobert, 2003) and a natural chromosome transloca-tion linked to an aggressive B cell leukemia, in which a truncated MYC gene is fused to the 5' portion of mir-142 (Gauwerky et al., 1989; Lagos-Quintana et al., 2002). Although these observations indicate that many miRNAs are pol II transcripts, others might still be pol III transcripts, just as most but not all snRNAs are pol II products. Ectopic expression of miR-142 and other miRNAs from a pol III promoter produces efficiently and precisely processed miRNAs that function in vivo (Chen et al., 2004), indicating that there is no obligate fink between the identity of the polymerase and downstream miRNA processing or function.

Biogenesis: miRNA Maturation

The current model for maturation of the mammalian miRNAs is shown in Figure 2B. The first step is the nuclear cleavage of the pri-miRNA, which liberates a ~60-70 nt stem loop intermediate, known as the miRNA precursor, or the pre-miRNA (Lee et a!., 2002; Zeng and Cuilen, 2003). This processing is performed by the Drosha RNase III endonuclease, which cleaves both strands of the stem at sites near the base of the primary stem loop (Lee et al., 2003) (Figure 2B, step 2). Drosha cleaves the RNA duplex with a staggered cut typical of RNase III endonucleases, and thus the base of the pre~ miRNA stem loop has a 5' phosphate and ~2 nt 3' overhang (Basyuk et aL, 2003; Lee et al., 2003). This pre-miRNA is actively transported from the nucleus to the cytoplasm by Ran-GTP and the export receptor Ex-portin-5 (Yi et al., 2003; Lund et al., 2004) (Figure 2B, step 3).

The nuclear cut by Drosha defines one end of the mature miRNA. The other end is processed in the cytoplasm by the enzyme Dicer (Lee et al., 2003). Dicer, also an RNase III endonuclease, was first recognized for its role in generating the small interfering RNAs (siRNAs) that mediate RNA interference (RNAi) (Bernstein et al., 2001) and was later shown to play a role in miRNA maturation (Grishok et al,, 2001; Hutvigner et al., 2001; Ket-ting et al., 2001). According to the current model of miRNA maturation, Dicer performs an activity in metazoan miRNA maturation similar to that which it performs when chopping up double-stranded RNA during RNAi: It first recognizes the double-stranded portion of the pre-miRNA, perhaps with particular affinity for a 5' phosphate and 3' overhang at the base of the stem loop. Then, at about two helical turns away from the base of the stem loop, it cuts both strands of the duplex. This cleavage by Dicer lops off the terminal base pairs and loop of the pre-miRNA, leaving the 5' phosphate and ~2 nt 3' overhang characteristic of an RNase III and producing an siRNA-like imperfect duplex that comprises the mature miRNA and similar-sized fragment derived from the opposing arm of the pre-miRNA (Figure 2B, step 4).

The fragments from the opposing arm, called the miRNA* sequences (Lau et al,, 2001), are found in libraries of cloned miRNAs but typically at much tower frequency than are the miRNAs (Lagos-Quintana et al., 2002; Aravin et al.-, 2003; Lim et al., 2003a). For example, in an effort that identified over 3400 clones representing 80 C. etegans miRNAs, only 38 clones representing 14 miRNAs* were found (Lim et al., 2003a). This approximately 100-fold difference in cloning frequency indicates that the mlRNA:miRNA* duplex is generally shortlived compared to the miRNA single strand.

According to the current model, the specificity of the initial cleavage mediated by Drosha determines the correct register of cleavage within the miRNA precursor and thus defines both mature ends of tho miRNA (Lee et al., 2003). This idea that Drosha, not Dicer imparts the specificity is appealing because studios have shown that generic double-stranded RNA is refractory to Drosha cleavage and that Dicer progressively chops up an RNA double strand, irrespective of its sequence (Zamore et al., 2000; Bernstein et al., 2001; Ubashir et al 2001a; Zhang et aL, 2002). The determining of Drosha recognition are largely undefined but include the sec-ondary structure at the base of the primarly stem loop as-well as some elements flanfong the stem loop but generally within 125 nt of the miRNA (Lee et al., 2003; Chen et al., 2004).

This stepwise scenano for miRNA maturation based primarily on the investigation of mammalian Drosha and Dicer function (Lee et al.v 2002, 2003). The notion that it applies to other metazoan species is supported by the identity of the long form of the C. eiegans tin-4 RNA, which appears to be an excellent match (within the resolution of nuciease mapping) to that expected for the lin-4 pre-miRNA (Lee et al., 1993). Furthermore, presumed pre-miRNAs for numerous miRNAs can be detected on Northern blots, and when examined in the context of reduced Dicer activity, these pre-miRNAs invariably increase in abundance, as would be expected if Dicer was responsible for their processing (Grishok et al., 2001; Hutvagner et al., 2001; Ketting et al., 2001; Lee and Ambros, 2001; Lim et al., 2003a). Finally, the general existence of the miRNA:miRNA* duplex is supported by the cloning of numerous miRNAs* in nematodes and flies, although for most miRNA genes, an experimentally identified miRNA* has not yet been reported.

The cloning of a few miRNAs* in plants also points to a transient miRNA:miRNA* duplex (Remhart et al., 2002). However, the biogenesis of this duplex appears to differ in plants (Figure 2A). Most notably, pre-miRNAs have not been compellingly detected in plants—not even in plants with crippled DCL1, a Dicer-tike protein known to assist in miRNA maturation (Reinhart et al., 2002). The lack of pre-miRNA in these dcl1-9 plants (formerly known as caf-f plants), together with the apparent nuclear localization of the DCL1 protein (Papp et al., 2003), suggests that DCL1 provides the Drosha functionality in plants, making the first cut that sets the register for miRNA maturation (Figure 2A, step 2). DCL1 (or another enzyme yet to be identified) then makes the second cut, which corresponds to metazoan Dicer cleavage, before the miRNA leaves the nucleus (Figure 2A, step 3). A coupled second cut in the nucleus would explain why pre-miRNA-like RNAs do not accumulate to detectable levels in plants. It would also explain why ectopic nuclear but not cytoplasmic expression of P19, a plant viraf protein that inhibits silencing by sequestering siRNA duplexes, prevents miRNA accumulation (Papp et al.,2003). Perhaps HASTY, the plant ortholog of Exportin-5, is responsible for exporting the miRNA:miRNA* duplex from the nucleus, which would explain the pleiotropic developmental phenotypes of hasty mutants (Bollman et al., 2003; Yi et al., 2003; Lund et al., 2004) (Figure 2A, step 4).

Biogenesis: RISC Assembly

Following cleavage and nucleocytoplasmic export, the miRNA pathway of plants and animals appears to be biochemically indistinguishable from the central steps of RNA silencing pathways known as posttranscriptional gene silencing (PTGS) in plants, quelling in fungi, and RNAi in animals. Indeed, understanding miRNA biogenesis and function has been greatly facilitated by analogy and contrast to the siRNAs of RNAi, and vice versa. In light of these biochemical connections, the discovery of lin-4 and its regulation of Iin-14 can be considered in hindsight as the first characterization of an RNAi-like phenomenon in animals.

To illustrate the commonality between miRNAs and siRNAs, the RNAi pathway is briefly outlined here (and depicted in Figure 2C). The pathway begins with long double-stranded RNA, either a bimolecular duplex or an extended hairpin, that either is artificially introduced into the cell or animal during a gene knockdown experiment (Fire et al., 1998) or is naturally generated—from sense and antisense genomic transcripts, or perhaps from the activity of a cellular RNA-dependent RNA polymerase (found in plants, fungi, and nematodes, but not flies or mammals) or as an intermediate of viral replication (Cogoni and Macino, 1999; Ketting et al., 1999; Dalmay et al., 2000; Mourrain et al., 2000; Smardon et al., 2000; Aravin et al., 2001, 2003; Li et al., 2002). The double-stranded RNA is processed by Dkser into many ~22 nt siRNAs (Hamilton and Baulcombe, 1999; Hammond et al., 2000; Parrish et al., 2000; Zamoreet al., 2000; Grishok et al., 2001; Ketting et al., 2001; Knight and Bass, 2001) (Figure 2C, steps 2-4). Although these siRNAs are initially short double-stranded species with 5' phosphates and 2 nt 3' overhangs characteristic of RNase III cleavage products, they eventually become incorporated as single-stranded RNAs into a ribonucleoprotein complex, known as the RNA-induced silencing complex (RISC) (Hammond et al., 2000; Bbashir et al., 2001a, 2001b; Nykanen et al., 2001; Martinez et al., 2002; Schwarz et al., 2002) (Figure 2C, step 6). The RISC identifies target messages based on perfect (or nearly perfect) complementarity between the siRNA and the mRNA, and then the endonuclease of the RISC cleaves the mRNA at a site near the middle of the siRNA complementarity, measuring from the 5' end of the siRNA and cutting between the nucleotides pairing to residues 10 and 11 of the siRNA (Elbashir et al., 2001 a, 2001 b). Similar pathways have been proposed for gene silencing in plants and fungi (Hamilton and Baulcombe, 1999; Vance and Vaucheret, 2001; Pickford et al., 2002).

The RISC has been purified from fly and human cells and in both cases contains a member of the Argonaute protein family, which is thought to be a core component of the complex (Hammond et al., 2001; Hutvagner and Zamore, 2002; Martinez et al., 2002). This fits nicely with previous genetic data showing that Argonaute proteins RDE-1, QDE2, and AGO1 are crucial for RNAi and analogous processes in worms, fungi, and plants, respectively (Tabara et al., 1999; Catalanotto et al., 2000; Fagard et al., 2000). Argonaute and its homologs are approximately 100 kDa proteins that are sometimes called PPD proteins because they all share the PAZ and PIWI domains (Cerutti et al., 2000). The PAZ domain (first recognized in Piwi, Argonaute, and Zwille/Pinhead proteins) has a stable fold when isolated from the rest of the protein, which has a β barrel core that together with a side appendage appears to bind weakly to single-stranded RNAs at least 5 nt in length and also to double-stranded RNA (Lingel et al., 2003; Song et al., 2003; Van et al., 2003). This dual binding ability suggests that the Argonaute protein could be directly associated with the siRNA before and after it recognizes the mRNA target.

Other RISC-associated proteins include the suspected RNA binding proteins VIG and Fragile X-related protein and the nuclease Tudor-SN, none of which have defined roles in the RISC (Caudy et al., 2002, 2003; Ishizuka et al., 2002). These proteins do not copurify with RISC in all purification schemes and their stoichi-ometry in RISC has not been established. Perhaps they are also cor© components of the RISC that do not remain associated during some purification methods. Alternatively, they could be accessory factors that modify the specificity or function of the core complex. The notion that RISC comes in different subtypes is already supported by the number of Argonaute family members found in different species, ranging up to 24 in C. elegans, and the preferential genetic or biochemical association of different family members with different types of silencing RNAs (Grishok et al., 2001; Caudy et al., 2002; Zilberman et al., 2003). The RISC endonuclease, known as Slicer, has not been identified, suggesting that it might be present In sub-stoichiometric amounts and only recruited after the other components of RISC have found a suitable match to the siRNA. Another possibility is that one of the Identified RISC components provides the Slicer activity by means of an unrecognized nuclease domain.

MicroRNAs were first reported to reside in the miRNA ribonucleoprotein complex (miRNP), which in humans includes the proteins elF2C2, the heKcase GeminS and Gemin4 (Mourelatos et al., 2002). elF2C2 is a human Argonaute homolog and was later found to be a constituent of the human siRNA-programmed RISC (Martinez et al., 2002). Furthermore, the human let-7 miRNA is associated with e»F2C2 and capable of specifying cleavage of an artificial target with perfect complementarity to the miRNA (Hutvigner and Zamore, 2002). Thus the miRNP possesses the salient properties that define the RISC (HutvЈgner and Zamore, 2002), and although it might later be shown to represent a particular subtvoe of RISC, it is referred to as a RISC in this review This perspective is further supported by the demonstration that plant miRNAs can direct cleavage of their natural targets (Llave et al., 2002b; Tang et al., 2003) and that siRNAs originally designed to specify cleavage can also mediate translattonal repression (Doench et al. 2003-Zeng et al., 2003).

When the miRNA strand of the mlRNA:miRNA* duplex is loaded into the RISC, the miRNA* appears to be peeled away and degraded. What then is the mechanism for choosing which of the two strands enters the RISC? The answer largely lies in the relative stability of the two ends of the duplex: for both siRNA and miRNA duplexes, the strand that enters the RISC is nearly always the one whose 5' end is jess tightly paired (Khvorova et al., 2003; Schwarz et al., 2003). This observation suggests that a helicase-like enzyme (yet to be identified) samples the ends of the duplex multiple times—usually releasing the end before beginning to productively unwind the duplex but occasionally unwinding the duplex, resulting in a strong bias for productive unwinding at the easier end (Khvorova et al., 2003; Schwarz et al., 2003) (Figures 2A-2C, steps 5). This elegant rule for predicting which strand of the duplex will enter the RISC was initially formulated based on observations and experiments in animal systems, but it also applies to plant siRNAs (Khvorova et al., 2003) and plant miRNAs. Its predictive value for the vast majority of plant and animal miRNAs strongly implies the existence of the miRNA:miRNA* duplex as a transient intermediate in the biogenesis of all miRNAs, even those for which a miRNA* has not yet been cloned. For afew vertebrate and insect genes, both strands of the miRNA duplex accumulate at frequencies suggesting that both enter the RISC, raising the prospect that either or both might be functional (Lagos-Quintana et al., 2002; Krichevsky et al., 2003; Schwarz et al., 2003). These rare cases can be reconciled with the asymmetric loading of the RISC because the ends of these duplexes have nearly equivalent stabilities at their ends; for each RISC assembled, the helicase loads only one strand of each duplex but chooses each strand with similar frequency (Schwarz et al., 2003).

Mechanism: mRNA Cleavage

MicroRNAs can direct the RISC to downregulate gene expression by either of two posttranscriptional mechanisms: mRNA cleavage or translational repression (Figures 3A and 3B). According to the prevailing model, the choice of posttranscriptional mechanisms is not determined by whether the small silencing RNA originated as an siRNA or a miRNA but instead is determined by the identity of the target: Once incorporated into a cyto-plasmic RISC, the miRNA will specify cleavage if the mRNA has sufficient complementarity to the miRNA, or it will repress productive translation if the mRNA does not have sufficient complementarity to be cleaved but does have a suitable constellation of miRNA complementary sites (Hutvagner and Zamore, 2002; Zeng et al., 2002,2003; Doench et al., 2003). Although this model is generally supported by experimental tests, highly functional siRNAs and metazoan miRNAs have sequence-composition differences centering at positions 12 and 13, which might point to inherent differential sequence preferences for the two respective modes of repression (Khvorova et al., 2003). Furthermore, a perplexing observation has come from the study of a plant miRNA, miR172, which appears to regulate APETALA2 via translational repression despite the near-perfect complementarity between the miRNA and its single complementary site in the APETALA2 ORF (Aukerman and Sakai, 2003; Chen, 2003).

When a miRNA guides cleavage, the cut is at precisely the same site as that seen for siRNA-guided cleavage, i.e., between the nucleotides pairing to residues 10 and 11 of the miRNA (Elbashir et at., 2001a; Hutvagner and Zamore, 2002; Llave et al., 2002b; Kasschau et al., 2003). The register of cleavage does not change when the miRNA is not perfectly paired to the target at its 5' terminus (Kasschau et al., 2003; Palatnik et al., 2003). Therefore, the cut site appears to be determined relative to miRNA residues, not miRNAitarget base pairs. After cleavage of the mRNA* the miRNA remains intact and can guide the recognition and destruction of additional messages (Hutvagner and Zamore, 2002; Tang et al., 2003).

Mechanism: Translational Repression

From the beginning, it was proposed that lin-4 RNA specifies the translational repression of C. etegans lin-14 mRNA. This is the simplest interpretation of the observation that Hn-4 RNA expression coincides with a drop in UN-14 protein without a change in lin-14 mRNA (Wightman et at., 1993). The surprise came later, when it was shown that the polysome profile of lin-14 mRNA at the first larval stage is indistinguishable from that at later larval stages, when LIN-14 protein levels have dropped (Olsen and Ambros, 1999); The same is true for lin-28 mRNA, another message targeted by lin-4 RNA (Seggerson et al., 2002). Two possibilities were put forward to explain these results (Olsen and Ambros, 1999). The Hn-4 RNA might repress translation at a step after translation initiation, in a manner that does not perceiv-ably alter the density of the ribosomes on the message, e.g., by the slowing or stalling of all the ribosomes on the message. An alternative possibility is that translation continues at the same rate but is nonproductive because the newly synthesized polypeptide is specifically degraded. In this review, both of these mechanistic possibilities are lumped together as translations repression, as is common practice, even though in the second possibility polypeptide synthesis per se is not repressed. A better mechanistic understanding of lin-4~specified translational repression awaits the development of an in vitro system that faithfully recapitulates lin-4 regulation of its targets.

Extending the analysis of polysome profiles beyond C. elegans lin-4 regulation will be important for learning whether the postinitiation mechanism applies more gen-erally to translattonal repression mediated by other miRNAs. Indeed, evidence for translational repression of any metazoan miRNA targets other than those of lin-4 is scant because the fate of the messenger RNA during miRNA-mediated regulation has not yet been monitored for these non-//n-4 targets. Nonetheless, several indirect lines of evidence support the notion that metazoan miRNAs other than lin-4 RNA typically mediate translational repression rather than mRNA cleavage: First, other metazoan miRNAs, as well as siRNAs, can repress the expression of heterologous reporter transcripts without decreasing mRNA levels, if these messages contain either the natural miRNA complementary sites from the miRNA target (Brennecke et al., 2003) or multiple artificial complementary sites that have bulges or mismatches at their center when paired to the miRNA, such that the pattern of base pairing resembles that found between the let- 7 RNA and its natural complementary sites in the C. elegans, lin-41 3' UTR (Zeng et al., 2002, 2003; Doench et al., 2003). Second, the let-7-pro-grammed RISC endogenous to human cells does not cleave an RNA fragment containing the let-7 complementary sites found in C. elegans lin-41 (Hutvagner and Zamore, 2002). Third, there is a difference between plants and animals with regard to the extent of complementarity between the miRNAs and mRNAs (Rhoades et al., 2002). Because near-perfect complementarity is thought to be required for RISC-mediated cleavage but not translational repression, the lower degree of complementarity seen in animals suggests that translational repression is more prevalent in animals than in plants. Nonetheless, it would be premature to conclude that more metazoan miRNA regulatory targets are transla-tionally inhibited than are cleaved. Surprisingly little complementarity appears to be needed to specify detectable RISC-mediated cleavage in mammalian cells (Jackson et al., 2003), suggesting that it will not be long before natural examples of miRNA-directed mRNA cleavage will be reported in animals.

The cooperative action of multiple RISCs appears to provide the most efficient translational inhibition (Doench et al., 2003). This explains the presence of multiple miRNA complementary sites in most genetically identified targets of metazoan miRNAs (Lee et al., 1993; Wightman et al., 1993; Reinhart et al., 2000; Abrahante et al., 2003; Lin et al., 2003). The computationally identified metazoan targets also have multiple sites, but this pattern is uninformative because the presence of multiple sites was a criterion for their identification (Brennecke et al., 2003; Lewis et al., 2003; Stark et al., 2003). Although only a small fraction of the miRNA-mRNA regulatory pairs are known in any animal, there are already instances in which different miRNA species have been proposed to regulate the same targets (Reinhart et al., 2000; Abrahante et al., 2003; Lin et al., 2003). These examples, and the analogy to other biological regulatory systems, most notably transcriptional regulation, have led to the general expectation that as the list of known metazoan miRNArmRNA regulatory interactions becomes more comprehensive, combinatorial control will be seen to be common, if not the norm.

The complementary sites for the known metazoan targets reside in the 3' UTRs. This bias might reflect a mechanistic preference, perhaps enabling the bound complexes to avoid the mRNA-clearing activity of the ribosome. After all, numerous other examples of eukary-otic translation regulation are mediated through 3' UTR elements (Kuersten and Goodwin, 2003). Alternatively, it might reflect a bias in the way that metazoan miRNA targets and complementary sites are discovered: The lin-4:lin-14 precedent might have directed subsequent searches to the 3' UTRs, and conserved complementary sites are easier to distinguish in the UTRs, away from the confounding sequence conservation of the ORFs. The reported siRNA-mediated translational repression from a single imperfect complementary site in the ORF of a mammalian reporter construct (Saxena et al., 2003) illustrates why it would be premature to conclude that most metazoan miRNA regulation is mediated through multiple complementary sites in the 3' UTRs.

Among the dozens of mtRNA-target relationships that have been examined, there has been no evidence for miRNAs directing upregulation of gene expression. These findings are consistent with the idea that miRNAs are all acting within a silencing complex, namely the RISC. Even if miRNAs are limited to functioning within RISC complexes, there is still the prospect that some miRNAs might specify more than just posttranscriptional repression; some might target DNA for transcriptional silencing (Figure 3C). Argonaute proteins and siRNAs are associated with DNA methylation and silencing in plants (Mette et al., 2000; Hamilton et al., 2002; Zilber-man et al., 2003), heterochromatin formation in fungi (Hall et al., 2002; Reinhart and Bartel, 2002; Volpe et al., 2002), and DNA rearrangements in ciiiates (Mochizuki et al., 2002). Each of these examples suggests the existence of a nuclear RISC-like complex. If miRNAs are not involved in DNA silencing, it will be interesting to learn how they avoid entering the nuclear RISC, particularly in plants, where processing appears to be completed in the nucleus.

Mechanism: Target Recognition

The importance of complementarity to the 5' portion of metazoan miRNAs has been suspected since the observation that the lin-14 UTR has "core elements" of complementarity to the 5' region of the lin-4 miRNA (Wight-man et al., 1993). More recent observations support this idea: (1) Residues 2-8 of several invertebrate miRNAs are perfectly complementary to 3' UTR elements previously shown to mediate posttranscriptional repression (Lai, 2002). (2) Within the miRNA complementary sites of the first validated targets of invertebrate miRNAs, mRNA residues that pair (sometimes imperfectly) to residues 2-8 of the miRNA are perfectly conserved in or-thologous messages of other species, and a contiguous helix of at least six basepairs is nearly always seen in this region (Stark et al., 2003). (3) Residues 2-8 of the miRNA are the most conserved among homologous metazoan miRNAs (Lewis et al., 2003; Lim et al., 2003a). (4) When predicting targets of mammalian miRNAs, requiring perfect pairing to the heptamer spanning rest-dues 2-8 of the miRNA is much more productive than is requiring pairing to any other heptamer of the miRNA (Lewis et al., 2003). Pairing to this 5' core region also appears to disproportionally govern the specificity of siRNA-mediated mRNA cleavage (Jackson et al., 2003; Pusch et al., 2003), and the same is true for a plant miRNA that mediates mRNA cleavage (Reinhart, Mal-lory, Tang, Zamore, Barton, D.B., unpublished).

Why is complementarity to the 5' end of the small RNA universally important, regardless of the mechanism of gene regulation? One possibility is that the RISC presents only this core region to nucleate pairing to the mRNAs. Presentation of these ~7 nucleotides prearranged in the geometry of an A-form helix would preferentially enhance the affinity with matched mRNA segments. Presentation of a preformed helical segment of this length would be a reasonable compromise between the topological difficulties associated with longer prearranged helical geometry and the drop in initial binding specificity that would result from a shorter core. In this scenario, mismatches with the core region inhibit initial target recognition and thus prevent cleavage or transla-tional repression regardless of the degree of complementarity elsewhere in the complementary site. If there is sufficient additional pairing after the remainder of the miRNA is allowed to participate, cleavage ensues. However, core pairing supplemented by just a few flanking pairs appears to be sufficient to mediate translational repression in cooperation with other RISCs bound to the message (Lewis et a!., 2003). Interestingly, the ability of the Argonaute PAZ domain to bind both double- and single-stranded RNAs (Lingel et al., 2003; Song et al., 2003; Van et al., 2003), mentioned earlier, would make it a suitable candidate for presenting the core and stabilizing the core pairing.

Mechanism: Distinctions between miRNAs and siRNAs

Because miRNAs and endogenous siRNAs have a shared central biogenesis (Figures 2B and 2C, steps 4-6) and can perform interchangeable biochemical functions (Figures 3A and 3B), these two classes of silencing RNAs cannot be distinguished by either their chemical composition or mechanism of action. Nonetheless, important distinctions can be made, particularly in regard to their origin, evolutionary conservation, and the types of genes that they silence (Figured 2B and 2C, steps 1-3 and 7; Bartel and Bartel, 2003): First, miRNAs derive from genomic loci distinct from other recognized genes, whereas siRNAs often derive from mRNAs, transposons, viruses, or heterochromatic DNA (Figure 2, steps 1). Second, miRNAs are processed from transcripts that can form local RNA hairpin structures, whereas siRNAs are processed from long bimolecular RNA duplexes or extended hairpins (Figure 2, steps 2). Third, a single miRNA:miRNA* duplex is generated from each miRNA hairpin precursor molecule, whereas a multitude of siRNA duplexes are generated from each siRNA precursor molecule, leading to many different siRNAs accumulating from both strands of this extended dsRNA (Figure 2). Fourth, miRNA sequences are nearly always conserved in related organisms, whereas endogenous siRNA sequences are rarely conserved. These types of differences are the basis of practical guidelines for distinguishing and annotating newly discovered miRNAs and endogenous siRNAs (Ambros et al., 2003a).

Although much remains to be learned about the biological targets of miRNAs and endogenous siRNAs, a fifth distinction can be made between these two classes of silencing RNAs: endogenous siRNAs typically specify "auto-silencing," in that they specify the silencing of the same locus (or very similar loci) from which they originate, whereas miRNAs specify "hetero-silencing,* In that they are produced from genes that specify the silencing of very different genes (Figure 2, steps 7). Natural examples of auto-silencing include the silencing of viruses, transposons, and the heterochromatic outer repeats of centromeres. Another example is the Drosoph-ila Su(Ste) repeats, which generate siRNAs that silence the Su(Ste) repeats themselves as welt as the very similar Stellate genes (Aravin et al., 2001). At first glance, miR-127 and miR-136 might seem to be exceptions to this principle because they originate from the antisense strand of their presumptive target, the Rtl1 mRNA (Seitz et al., 2003). However, because these genes lie in an imprinted locus, in which the miRNAs are expressed from the maternal chromosome and the Rtl1 mRNA is expressed from the paternal chromosome, these miRNAs can still be thought of as specifying hetero-silencing. This fifth distinction explains the greater sequence conservation seen for miRNAs. To the extent that the siRNAs come from the same loci that they target, a mutational event that changes the sequence of the siRNA would also change the sequence of its regulatory target, and siRNA regulation would be preserved—an unusual case of maintaining an important function without selective pressure for conserving the sequence. In contrast, a mutation in a miRNA would rarely be accompanied by simultaneous compensatory changes at the loci of its targets, and thus selection pressure would preserve the miRNA sequence.

With these distinctions between the miRNAs and the endogenous siRNAs in mind, it is perhaps worth considering how to classify the small RNAs that arise from constructs introduced into ceils for the purpose of gene knockdown experiments. Small RNAs processed from the extended double-stranded regions of long, inverted repeats are clearly siRNAs. At the other extreme are approximately 22 nt RNAs processed from pre-miRNA-like stem loops. For metazoan cases in which these stem loops include the determinants for the sequential processing by Drosha then Dicer, classification is again simple; these would be artificial miRNAs. However, classification is less clear for RNAs deriving from the short hairpin constructs typically used for knockdowns in mammalian cells (Dykxhoorn et al., 2003), whose processing is unlikely to involve Drosha and even might not involve Dicer.

Function: Regulatory Roles of miRNAs

The most pressing question to arise from the discovery of the hundreds of different miRNAs is, what are all these tiny noncodrng RNAs doing? For lin-4t let-7, and several other miRNAs identified by forward genetics, crucial clues to their function and regulatory targets came even before their status as noncoding RNA genes was discovered (Meneely and Herman, 1979; Chalfie et al., 1981; Ambros, 1989; Weigel et al., 2000; Hipfner et al., 2002; Aukerman and Sakai, 2003; Brennecke et al., 2003; Johnston and Hobert, 2003; Xu et al., 2003). These and other miRNAs that have reported functions based on in vivo experimentation are listed in Table 1. For some of these cases, function was determined by the phenotypic consequences of a mutated miRNA or an altered miRNA complementary site, either of which can disrupt miRNA regulation. In other cases, function was inferred from the effects of mutations or transgenic constructs that lead to ectoptc expression of the miRNA. .

For the vast majority of miRNAs, the phenotypic consequences of disrupted or altered miRNA regulation are not known. However, computational approaches are being developed to find the regulatory targets of the miRNAs, providing clues to miRNA function based on the known roles of these targets (Rhoades et al., 2002; Enright et al., 2003; Lewis et al., 2003; Stark et al., 2003). Computationally predicted targets supported by subsequent experiments or independent phylogenetic evidence are listed in Table 2. The experiments supporting the identity of these targets typically fall into two classes. In cases where the miRNA is thought to specify mRNA cleavage, the cleavage products can be reverse-transcribed, cloned, and sequenced; a preponderance of sequences that end precisely at the predicted site of cleavage provides experimental validation that this mRNA is a cleavage target of the complementary miRNA (Llave et al., 2002b; Kasschau et al., 2003; Xie et al., 2003). To enable detection of both translational repression and mRNA cleavage, heterologous reporter assays can be used in which the miRNA complementary sites are fused to a reporter gene and expression is examined relative to control constructs, or in the presence and absence of the miRNA (Lewis et al., 2003; Stark et al., 2003). Caution is warranted when interpreting reporter assays that involve multimerization of the miRNA complementary site(s) because such an assay succeeded in validating a miRNA complementary site that was mistakenly taken from a gene that was unrelated to the intended target but similarly annotated (Kawasaki and Taira, 2003a, 2003b). A positive result in the heterologous reporter assay indicates that determinants needed for miRNA regulation are indeed present within the mRNA fragment fused to the reporter, which together with evolutionary conservation of both the miRNA and its complementary sites can provide reasonable evidence of a regulatory relationship. Of course, such a hypothesis is considerably strengthened with evidence of coincident expression of the miRNA and its target in the animal or plant, or experiments that examine the effects of manipulating the miRNA or its complementary site in its native in vivo context.

Function: Roles of Plant miRNAs

In plants, miRNAs have a propensity to pair to mRNAs with near-perfect complementarity, enabling convincing targets to be readily predicted for most known plant miRNAs (Rhoades et al., 2002; Bartel and Bartel, 2003). Evolutionary conservation of the miRNArmRNA pairing in Arabidopsis and rice, together with experimental evidence showing that miRNAs can direct cleavage of targeted mRNAs, supports the validity of these predictions (Uave et al., 2002a; Rhoades et al., 2002; Kasschau et al., 2003; Tang et al., 2003). The known plant miRNAs have a remarkable penchant for targeting transcription factor gene families, particularly those with known or suspected roles in developmental patterning or cell differentiation (Rhoades et al., 2002; Tables 1 and 2). This explains the pleiotropic developmental phenotypes of plants mutant in DCL1(CAF) and HEN1, genes known to influence miRNA accumulation, and AGO1, a gene that might be involved in miRNA function (Bohmert et al., 1998; Jacobsen et al.f 1999; Park et al., 2002; Reinhart et al., 2002; Schauer et al., 2002). Of the few predicted plant targets that are not transcription factors, two are DCL1 and AGO1, suggesting a negative feedback mechanism that controls the expression of these genes with known or suspected roles in miRNA biogenesis and function (Rhoades et al., 2002; Bartel and Bartel, 2003; Xie et al., 2003). Why are there so many targets of the plant miRNAs transcription factors that have been implicated in the control of plant development? The model put forward to answer this question proposes that many plant miRNAs function during cellular differentiation by mediating the degradation of key regulatory gene transcripts in specific daughter cell lineages (Rhoades et a!., 2002; Figure 4). For example, during differentiation, certain genes specifying a less differentiated state might need to be turned off. This can be achieved by repressing transcription; however, a gene is not fully off until its message stops making protein. Thus, to more quickly stop expression of such a gene, the differentiating cell can deploy a miRNA that specifies the cleavage of that mRNA. The active clearing of the lingering regulatory messages (or of new messages generated by continued transcription) could enable rapid daughter cell differentiation without having to depend on regulatory genes having con-stitutively unstable messages. In this respect, miRNA regulation would be analogous to ubiquitin-dependent protein degradation, except that specific mRNAs, rather than proteins, are targeted for degradation.

This model concurs with the observation that a mutation disrupting the miRNA complementary site of PHB mRNA leads to a more expansive distribution of the message, as if it were no longer being cleared from cells expressing the miRNA (McConnett et al., 2001; Rhoades et al., 2002). It also explains why so many of the initially identified target genes specify formation and identity of meristern, i.e., plant stem cells (Tables 1 and 2)—these are precisely the genes that would need to be turned off during early differentiation. The model also would apply to scenarios later in differentiation or to cases where the daughter cell is choosing among two or more differentiated states, which would explain the targeting of the other transcripts that have regulatory roles later in development. One point of caution in trying to deduce the general roles of plant miRNAs is that the known set of plant miRNAs is enriched in the more abundant miRNAs of plant tissues and organs and thus might not be representative. For example, miRNAs specifying an undifferentiated state would have been less likely to be cloned because most celts of plant organs are typically differentiated.

Function: Roles of Animal miRNAs

Computational methods have recently been developed to identify the targets of Drosophila and mammalian miRNAs (Enright et al., 2003; Lewis et al., 2003; Stark et al., 2003). These methods search for multiple conserved regions of miRNA complementarity within 3' UTRs. Identlfying targets in animals has been a more difficult task than in plants because in animals there are far fewer mRNAs with near-perfect complementarity to miRNAs. This makes the analysis noisier—much more prone to false positives. Furthermore, evolutionary conservation was used as a criterion for target identification in animals, and thus it could not be used as a means to independently validate the targets. Nonetheless, the experimental support achieved for a majority of the predictions tested is encouraging (Table 2), and there are compelling reasons to take seriously the remaining untested predictions. For example, in one of the fly studies, there were striking clusters of functionally related genes among the top predictions (Stark et al., 2003). The most notable examples were Notch target genes for miR-7, proapoptotic genes for miR-2, and a set of enzymes involved in branched-chain amino acid degradation for miR-277. In the mammalian study, over 400 regulatory targets were predicted when using parameter cutoffs that gave a signal-to-noise ratio of 3.2:1 (Lewis et al., 2003). This signal-noise ratio was seen only when restricting the miRNAs to those most conserved among mammals and fish, and only when demanding perfect complementarity to the most conserved portion of miRNAs (the 7 nt core segment comprising residues 2-8 of the miRNAs), observations that would be exceedingly difficult to explain if most of the identified messages were not relevant targets of the miRNAs. The ability to identify hundreds of miRNA targets with confidence that most of the predicted targets are authentic enables the analysis of the types of genes most commonly targeted by mammalian miRNAs (Lewis et al., 2003). As in plants, the predicted targets are significantly enriched in genes involved in transcriptional regulation, suggesting that the model proposed for the roles of many plant miRNAs (Figure 4) could also be operating in animals. Nonetheless, this enrichment for transcriptional regulators is far (ess pronounced in mammals, and only a minority of the predicted mammalian targets are involved in development. The predicted targets represent a surprisingly broad diversity of molecular functions and biological processes. Thus, in contrast to the plant miRNAs, most mammalian miRNAs do not appear to be primarily involved at the upper levels of the gene regulatory cascades but instead appear to be operating at many levels to regulate the expression of a diverse set of genes, many of which do not go on to directly influence the expression of other genes (Lewis et al., 2003).

Function: The Question of Specificity

Although current lists of predicted miRNA targets provide insights and hypotheses for thousands of follow-up experiments, they could be far from comprehensive. For example, in the animal studies, the computational methods used evolutionary conservation to distinguish miRNA target sites from the multitude of 3' UTR segments that otherwise would score equally well with regard to the quality and stability of base pairing (Lewis et al., 2003; Stark et a!., 2003). The cell, on the other hand, cannot use the filter of evolutionary conservation to choose among the possibilities. Does this mean that many of these other mRNAs would in fact be targeted if expressed in the same cells as the cognate miRNAs? Perhaps not—perhaps miRNA base pairing is not the only major determinant of specificity. Proteins or mRNA structure could restrict miRNP accessibility to the UTRs. But if this were generally true, siRNA knockdown experiments might be expected to have a much lower success rate. Proteins or mRNA structure could also facilitate recognition of the authentic mRNA targets by means of elements in the mRNAs that have thus far escaped detection. One candidate for such a protein is the Fragile X-related protein, a Drosophila RISC component that is related to proteins known to bind specific mRNAs (Caudy et al., 2002; Ishizuka et al., 2002).

The alternative idea—that the quality and stability of base pairing is in fact the primary determinant of specificity—should also be considered. After all, this complementarity requirement includes a 7 nt perfect or near-perfect core match near the 5' terminus of the miRNA (Lai, 2002; Lewis et al., 2003; Stark et al., 2003), which by itself would represent a degree of specificity comparable to that of the DNA sites recognized by many transcription factors. Pairing outside the 7 nt core site, although perhaps less important than once thought, provides means of conferring added specificity. Just as chrogriatin structure limits the possibilities for transcription-factor binding, the restricted set of genes transcribed in each cell limits which genes of the genome will be under miRNA control in that cell. And in the same way that the cooperative action of multiple transcription factors increases the specificity of their control, the cooperative action of homotypic and hetero-typic miRNA:UTR interactions would provide an additional mechanism to increase specificity of miRNA control. Despite these mechanisms for increasing the regulatory specificity, the notion that target-site recognition is primarily determined by multiple instances of 7 nt core complementarity would imply that miRNAs influence the expression of a remarkably large number of different mRNAs (Lewis et al, 2003).

The "many targets" hypothesis is embraced and partially rationalized in a proposal that the miRNA milieu, unique to each cell type, provides important context for the evolution of all mRNA sequences and is productively used to dampen the utilization of thousands of mRNAs (D.B. and C.-Z. Chen, unpublished). For mRNAs that should not be expressed in a particular cell type, miRNAs reduce protein production to inconsequential levels. The result is equivalent to a discrete off switch, and thus these messages, which include targets of Table 1, can be thought of as "switch targets." In addition to these classical targets, at least three other categories of mRNAs can be imagined: For messages called "tuning targets,1* miRNAs could adjust protein output in a manner that allows for customized expression in different celt types yet a more uniform level within each cell type. Other mRNAs could be simply bystanders, "neutral targets," for which downregulation by miRNAs is tolerated or is negated by feedback processes. Finally, when thinking about the effects of the miRNA milieu on the evolution of mRNA sequences, it is also useful to consider "antitargets," messages under selective pressure to avoid fortuitous complementarity to the multitude of miRNAs in the cells where they are expressed, either because such complementarity would inappropriately dampen their expression or because it would titrate the miRNAs away from their proper targets.

While molecular biologists will have their hands full identifying and characterizing additional instances where miRNAs are playing the classical role of discrete gene regulatory switches, computational and systems biologists wiH have to contend with the prospect that a substantial fraction of all animal mRNAs could have their precise level of expression defined by miRNA regulation. To the extent that the miRNAs direct translational repression rather than mRNA cleavage, this regulation will be invisible to the most powerful tool of the systems biologist, microarray analysis of mRNA levels. Nonetheless, in only two years since the abundance of miRNA genes was reported, there has been rapid progress in cataloging the miRNA genes, determining their expression patterns, and identifying their regulatory targets, providing hope that the goal of accurately integrating their function into models of metazoan gene regulatory circuitry can one day be realized.

MicroRNAs: Genomics, Biogenesis, Mechanism, and FunctionD.P.Bartel (dbartel@wi.mit.edu)Cell V.116, No 2, P. 281-297, 2004

MicroRNAs: Genomics, Biogenesis, Mechanism, and Function
D.P.Bartel (dbartel@wi.mit.edu)
Cell V.116, No 2, P. 281-297, 2004