- Gene3D: expanding the utility of domain assignments. [PMID: 26578585]
Su Datt Lam, Natalie L Dawson, Sayoni Das, Ian Sillitoe, Paul Ashford, David Lee, Sonja Lehtinen, Christine A Orengo, Jonathan G Lees
Nucleic acids research 2016:44(D1)
Citation (to be updated)
Abstract: Gene3D http://gene3d.biochem.ucl.ac.uk is a database of domain annotations of Ensembl and UniProtKB protein sequences. Domains are predicted using a library of profile HMMs representing 2737 CATH superfamilies. Gene3D has previously featured in the Database issue of NAR and here we report updates to the website and database. The current Gene3D (v14) release has expanded its domain assignments to ∼20 000 cellular genomes and over 43 million unique protein sequences, more than doubling the number of protein sequences since our last publication. Amongst other updates, we have improved our Functional Family annotation method. We have also improved the quality and coverage of our 3D homology modelling pipeline of predicted CATH domains. Additionally, the structural models have been expanded to include an extra model organism (Drosophila melanogaster). We also document a number of additional visualization tools in the Gene3D website. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
- Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis. [PMID: 24270792]
Jonathan G Lees, David Lee, Romain A Studer, Natalie L Dawson, Ian Sillitoe, Sayoni Das, Corin Yeats, Benoit H Dessailly, Robert Rentzsch, Christine A Orengo
Nucleic acids research 2014:42(Database issue)
26 Citations (Google Scholar as of 2016-01-14)
Abstract: Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of protein domain structure annotations for protein sequences. Domains are predicted using a library of profile HMMs from 2738 CATH superfamilies. Gene3D assigns domain annotations to Ensembl and UniProt sequence sets including >6000 cellular genomes and >20 million unique protein sequences. This represents an increase of 45% in the number of protein sequences since our last publication. Thanks to improvements in the underlying data and pipeline, we see large increases in the domain coverage of sequences. We have expanded this coverage by integrating Pfam and SUPERFAMILY domain annotations, and we now resolve domain overlaps to provide highly comprehensive composite multi-domain architectures. To make these data more accessible for comparative genome analyses, we have developed novel search algorithms for searching genomes to identify related multi-domain architectures. In addition to providing domain family annotations, we have now developed a pipeline for 3D homology modelling of domains in Gene3D. This has been applied to the human genome and will be rolled out to other major organisms over the next year.
- Gene3D: a domain-based resource for comparative genomics, functional annotation and protein network analysis. [PMID: 22139938]
Jonathan Lees, Corin Yeats, James Perkins, Ian Sillitoe, Robert Rentzsch, Benoit H Dessailly, Christine Orengo
Nucleic acids research 2012:40(Database issue)
40 Citations (Google Scholar as of 2016-01-14)
Abstract: Gene3D http://gene3d.biochem.ucl.ac.uk is a comprehensive database of protein domain assignments for sequences from the major sequence databases. Domains are directly mapped from structures in the CATH database or predicted using a library of representative profile HMMs derived from CATH superfamilies. As previously described, Gene3D integrates many other protein family and function databases. These facilitate complex associations of molecular function, structure and evolution. Gene3D now includes a domain functional family (FunFam) level below the homologous superfamily level assignments. Additions have also been made to the interaction data. More significantly, to help with the visualization and interpretation of multi-genome scale data sets, we have developed a new, revamped website. Searching has been simplified with more sophisticated filtering of results, along with new tools based on Cytoscape Web, for visualizing protein-protein interaction networks, differences in domain composition between genomes and the taxonomic distribution of individual superfamilies.
- The Gene3D Web Services: a platform for identifying, annotating and comparing structural domains in protein sequences. [PMID: 21646335]
Corin Yeats, Jonathan Lees, Phil Carter, Ian Sillitoe, Christine Orengo
Nucleic acids research 2011:39(Web Server issue)
16 Citations (Google Scholar as of 2016-01-14)
Abstract: The Gene3D structural domain database provides domain annotations for 7 million proteins, based on the manually curated structural domain superfamilies in CATH. These annotations are integrated with functional, genomic and molecular information from external resources, such as GO, EC, UniProt and the NCBI Taxonomy database. We have constructed a set of web services that provide programmatic access to this integrated database, as well as the Gene3D domain recognition tool (Gene3DScan) and protein sequence annotation pipeline for analysing novel protein sequences. Example queries include retrieving all curated GO terms for a domain superfamily or all the multi-domain architectures for the human genome. The services can be accessed using simple HTTP calls and are able to return results in a range of formats for quick downloading and easy parsing, graphical rendering and data storage. Hence, they provide a simple, but flexible means of integrating domain annotations and associated data sets into locally run pipelines and analysis software. The services can be found at http://gene3d.biochem.ucl.ac.uk/WebServices/.
- Gene3D: merging structure and function for a Thousand genomes. [PMID: 19906693]
Jonathan Lees, Corin Yeats, Oliver Redfern, Andrew Clegg, Christine Orengo
Nucleic acids research 2010:38(Database issue)
42 Citations (Google Scholar as of 2016-01-14)
Abstract: Over the last 2 years the Gene3D resource has been significantly improved, and is now more accurate and with a much richer interactive display via the Gene3D website (http://gene3d.biochem.ucl.ac.uk/). Gene3D provides accurate structural domain family assignments for over 1100 genomes and nearly 10,000,000 proteins. A hidden Markov model library, constructed from the manually curated CATH structural domain hierarchy, is used to search UniProt, RefSeq and Ensembl protein sequences. The resulting matches are refined into simple multi-domain architectures using a recently developed in-house algorithm, DomainFinder 3 (available at: ftp://ftp.biochem.ucl.ac.uk/pub/gene3d_data/DomainFinder3/). The domain assignments are integrated with multiple external protein function descriptions (e.g. Gene Ontology and KEGG), structural annotations (e.g. coiled coils, disordered regions and sequence polymorphisms) and family resources (e.g. Pfam and eggNog) and displayed on the Gene3D website. The website allows users to view descriptions for both single proteins and genes and large protein sets, such as superfamilies or genomes. Subsets can then be selected for detailed investigation or associated functions and interactions can be used to expand explorations to new proteins. Gene3D also provides a set of services, including an interactive genome coverage graph visualizer, DAS annotation resources, sequence search facilities and SOAP services.
- Gene3D: comprehensive structural and functional annotation of genomes. [PMID: 18032434]
Corin Yeats, Jonathan Lees, Adam Reid, Paul Kellam, Nigel Martin, Xinhui Liu, Christine Orengo
Nucleic acids research 2008:36(Database issue)
70 Citations (Google Scholar as of 2016-01-14)
Abstract: Gene3D provides comprehensive structural and functional annotation of most available protein sequences, including the UniProt, RefSeq and Integr8 resources. The main structural annotation is generated through scanning these sequences against the CATH structural domain database profile-HMM library. CATH is a database of manually derived PDB-based structural domains, placed within a hierarchy reflecting topology, homology and conservation and is able to infer more ancient and divergent homology relationships than sequence-based approaches. This data is supplemented with Pfam-A, other non-domain structural predictions (i.e. coiled coils) and experimental data from UniProt. In order to enhance the investigations possible with this data, we have also incorporated a variety of protein annotation resources, including protein-protein interaction data, GO functional assignments, KEGG pathways, FUNCAT functional descriptions and links to microarray expression data. All of this data can be accessed through a newly re-designed website that has a focus on flexibility and clarity, with searches that can be restricted to a single genome or across the entire sequence database. Currently Gene3D contains over 3.5 million domain assignments for nearly 5 million proteins including 527 completed genomes. This is available at: http://gene3d.biochem.ucl.ac.uk/