- Rfam: annotating families of non-coding RNA sequences. [PMID: 25577390]
Jennifer Daub, Ruth Y Eberhardt, John G Tate, Sarah W Burge
Methods in molecular biology (Clifton, N.J.) 2015:1269
1 Citations (Google Scholar as of 2015-12-26)
Abstract: The primary task of the Rfam database is to collate experimentally validated noncoding RNA (ncRNA) sequences from the published literature and facilitate the prediction and annotation of new homologues in novel nucleotide sequences. We group homologous ncRNA sequences into "families" and related families are further grouped into "clans." We collate and manually curate data cross-references for these families from other databases and external resources. Our Web site offers researchers a simple interface to Rfam and provides tools with which to annotate their own sequences using our covariance models (CMs), through our tools for searching, browsing, and downloading information on Rfam families. In this chapter, we will work through examples of annotating a query sequence, collating family information, and searching for data.
- Rfam 12.0: updates to the RNA families database. [PMID: 25392425]
Eric P Nawrocki, Sarah W Burge, Alex Bateman, Jennifer Daub, Ruth Y Eberhardt, Sean R Eddy, Evan W Floden, Paul P Gardner, Thomas A Jones, John Tate, Robert D Finn
Nucleic acids research 2015:43(Database issue)
Citation (to be updated)
Abstract: The Rfam database (available at http://rfam.xfam.org) is a collection of non-coding RNA families represented by manually curated sequence alignments, consensus secondary structures and annotation gathered from corresponding Wikipedia, taxonomy and ontology resources. In this article, we detail updates and improvements to the Rfam data and website for the Rfam 12.0 release. We describe the upgrade of our search pipeline to use Infernal 1.1 and demonstrate its improved homology detection ability by comparison with the previous version. The new pipeline is easier for users to apply to their own data sets, and we illustrate its ability to annotate RNAs in genomic and metagenomic data sets of various sizes. Rfam has been expanded to include 260 new families, including the well-studied large subunit ribosomal RNA family, and for the first time includes information on short sequence- and structure-based RNA motifs present within families. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
- Rfam 11.0: 10 years of RNA families. [PMID: 23125362]
Sarah W Burge, Jennifer Daub, Ruth Eberhardt, John Tate, Lars Barquist, Eric P Nawrocki, Sean R Eddy, Paul P Gardner, Alex Bateman
Nucleic acids research 2013:41(Database issue)
414 Citations (Google Scholar as of 2016-04-28)
Abstract: The Rfam database (available via the website at http://rfam.sanger.ac.uk and through our mirror at http://rfam.janelia.org) is a collection of non-coding RNA families, primarily RNAs with a conserved RNA secondary structure, including both RNA genes and mRNA cis-regulatory elements. Each family is represented by a multiple sequence alignment, predicted secondary structure and covariance model. Here we discuss updates to the database in the latest release, Rfam 11.0, including the introduction of genome-based alignments for large families, the introduction of the Rfam Biomart as well as other user interface improvements. Rfam is available under the Creative Commons Zero license.
- Clustering rfam 10.1: clans, families, and classes. [PMID: 24704975]
Felipe A Lessa, Tainá Raiol, Marcelo M Brigido, Daniele S B Martins Neto, Maria Emília M T Walter, Peter F Stadler
3 Citations (Google Scholar as of 2016-04-27)
Abstract: The Rfam database contains information about non-coding RNAs emphasizing their secondary structures and organizing them into families of homologous RNA genes or functional RNA elements. Recently, a higher order organization of Rfam in terms of the so-called clans was proposed along with its "decimal release". In this proposition, some of the families have been assigned to clans based on experimental and computational data in order to find related families. In the present work we investigate an alternative classification for the RNA families based on tree edit distance. The resulting clustering recovers some of the Rfam clans. The majority of clans, however, are not recovered by the structural clustering. Instead, they get dispersed into larger clusters, which correspond roughly to well-described RNA classes such as snoRNAs, miRNAs, and CRISPRs. In conclusion, a structure-based clustering can contribute to the elucidation of the relationships among the Rfam families beyond the realm of clans and classes.
- Rfam: Wikipedia, clans and the "decimal" release. [PMID: 21062808]
Paul P Gardner, Jennifer Daub, John Tate, Benjamin L Moore, Isabelle H Osuch, Sam Griffiths-Jones, Robert D Finn, Eric P Nawrocki, Diana L Kolbe, Sean R Eddy, Alex Bateman
Nucleic acids research 2011:39(Database issue)
325 Citations (Google Scholar as of 2016-04-27)
Abstract: The Rfam database aims to catalogue non-coding RNAs through the use of sequence alignments and statistical profile models known as covariance models. In this contribution, we discuss the pros and cons of using the online encyclopedia, Wikipedia, as a source of community-derived annotation. We discuss the addition of groupings of related RNA families into clans and new developments to the website. Rfam is available on the Web at http://rfam.sanger.ac.uk.
- Rfam: updates to the RNA families database. [PMID: 18953034]
Paul P Gardner, Jennifer Daub, John G Tate, Eric P Nawrocki, Diana L Kolbe, Stinus Lindgreen, Adam C Wilkinson, Robert D Finn, Sam Griffiths-Jones, Sean R Eddy, Alex Bateman
Nucleic acids research 2009:37(Database issue)
581 Citations (Google Scholar as of 2016-04-27)
Abstract: Rfam is a collection of RNA sequence families, represented by multiple sequence alignments and covariance models (CMs). The primary aim of Rfam is to annotate new members of known RNA families on nucleotide sequences, particularly complete genomes, using sensitive BLAST filters in combination with CMs. A minority of families with a very broad taxonomic range (e.g. tRNA and rRNA) provide the majority of the sequence annotations, whilst the majority of Rfam families (e.g. snoRNAs and miRNAs) have a limited taxonomic range and provide a limited number of annotations. Recent improvements to the website, methodologies and data used by Rfam are discussed. Rfam is freely available on the Web at http://rfam.sanger.ac.uk/and http://rfam.janelia.org/.
- Rfam: annotating non-coding RNAs in complete genomes. [PMID: 15608160]
Sam Griffiths-Jones, Simon Moxon, Mhairi Marshall, Ajay Khanna, Sean R Eddy, Alex Bateman
Nucleic acids research 2005:33(Database issue)
1035 Citations (Google Scholar as of 2016-04-27)
Abstract: Rfam is a comprehensive collection of non-coding RNA (ncRNA) families, represented by multiple sequence alignments and profile stochastic context-free grammars. Rfam aims to facilitate the identification and classification of new members of known sequence families, and distributes annotation of ncRNAs in over 200 complete genome sequences. The data provide the first glimpses of conservation of multiple ncRNA families across a wide taxonomic range. A small number of large families are essential in all three kingdoms of life, with large numbers of smaller families specific to certain taxa. Recent improvements in the database are discussed, together with challenges for the future. Rfam is available on the Web at http://www.sanger.ac.uk/Software/Rfam/ and http://rfam.wustl.edu/.
- Rfam: an RNA family database. [PMID: 12520045]
Sam Griffiths-Jones, Alex Bateman, Mhairi Marshall, Ajay Khanna, Sean R Eddy
Nucleic acids research 2003:31(1)
970 Citations (Google Scholar as of 2016-04-27)
Abstract: Rfam is a collection of multiple sequence alignments and covariance models representing non-coding RNA families. Rfam is available on the web in the UK at http://www.sanger.ac.uk/Software/Rfam/ and in the US at http://rfam.wustl.edu/. These websites allow the user to search a query sequence against a library of covariance models, and view multiple sequence alignments and family annotation. The database can also be downloaded in flatfile form and searched locally using the INFERNAL package (http://infernal.wustl.edu/). The first release of Rfam (1.0) contains 25 families, which annotate over 50 000 non-coding RNA genes in the taxonomic divisions of the EMBL nucleotide database.