- In the field of life sciences, various data formats and terminologies have hindered integrative utilization of multiple databases.
- A portal site which provides various life science databases all in RDF format, which can readily facilitate data integration, is launched for the first time in Japan.
- This website is expected to aid in promoting multidisciplinary research and to contribute to the advancement of medical applications, such as personalized medicine.
The first portal website in Japan that aggregates various life science databases in RDF format1) has been launched by Japan Science and Technology Agency (JST) (President: Michinari Hamaguchi) and the Research Organization of Information and Systems (President: Genshiro Kitagawa) (Figure 1). This portal site provides RDF datasets deposited by various research organizations, and allows the user to browse the description of those datasets, to download the data and to search the data using SPARQL (a standard query language for RDF data) queries.
A wide variety of life science databases exist, but their diverse terminologies and data formats have hindered the integrative utilizations of them. To address the problem, RDF, which facilitates the interoperability and automated data processing, has been adopted as a new data format around the world, including in Japan (Figure 2).
The National Bioscience Database Center (NBDC) of JST and Database Center for Life Science (DBCLS) of the Research Organization of Information and Systems also have been encouraging the research groups developing databases in Japan and around the world to adopt the RDF format, and have been building a portal site.
The portal site already provides an initial collection of ten RDF datasets (Table ). Six or more RDF datasets are planned to be added soon. Contemporarily, major life science database centers in the U.S. and Europe started providing their data in the RDF format. The specialty of our portal site is to aggregate a large variety of databases from multiple organizations and thus helping collaboration of researchers from a broad range of domains.
Prior to the launch of the portal site, DBCLS developed and released guidelines for the generation of high quality RDF data. All the RDF datasets provided by the portal site conform to the guidelines, and have met the criteria to make them interoperable.
The RDF datasets provided by the portal are expected to be easily integratable with other RDF datasets available around the world, which will reduce the costs for data handling. For example, without such an RDF portal, when searching for potential drug candidates from distributed databases, a major challenge was in aggregating relevant databases into a unified one, which required expertise and enormous time. The RDF data portal can potentially eliminate the time and the cost for such a process and it is expected to advance multidisciplinary research in which data coordination is essential. Examples of such research also include personalized medicine based on the combinatorial use of genetic mutations and drug activity data; and metagenomics that needs to deal with environmental or intestinal flora information. In addition, because RDF data can be easily utilized by computer programs, the rich data in the portal is expected to contribute answering complicated questions in life sciences, when incorporated into an artificial intelligence system which is a major technological advancement in recent years.
NBDC RDF portal website: http://integbio.jp/rdf/
DBCLS RDF guidelines website: http://wiki.lifesciencedb.jp/mw/RDFizingDatabaseGuideline (In Japanese)
Figures and Table
Figure 1: Database list page of NBDC RDF portal site
Basic description, category information and link to detailed description of each database are listed. The list can be sorted by name or release data and so on, and narrowed by categories.
Figure 2: Overview of NBDC RDF portal site
Overview of NBDC RDF portal site Integrative utilization of life science databases of wide variety becomes easier by unifying the terminology and converting to RDF format, thereby the data in our portal site is expected to advance the research which requires coordination of various data (such as drug development and personalized medicine).
|RDF dataset name||Organization||Contents|
|FAMSBASE||Chuo University||Predicted protein structure models for GPCRs|
|GlycoEpitope||Niigata University / Ritsumeikan University||Antibodies for glycans|
|GlyTouCan||Soka University||Structures and compositions of glycans|
|ICGC Linked Data||The University of Tokyo||Cancer genomes|
|Metadata of JCM||RIKEN||Metadata of cultured microbial strains in JCM|
|MBGD||National Institute for Basic Biology||Ortholog information of microbial genes|
|NBDC NikkajiRDF||Japan Science and Technology Agency||Chemical compounds|
|wwPDB||Osaka University||Metadata of 3D protein structures databank|
|RefEx||The Research Organization of Information and Systems||Gene Expression data|
|WURCS||The Noguchi Institute||Description methods for glycans|
Table: List of the RDF databases of the first release
- 1) RDF (Resource Description Framework) format:
- In order to utilize the vast range of information available on the Internet, technology for automated processing using computers to attain high precision is essential. The international standards organization for the Internet, the World Wide Web Consortium, has suggested that the RDF format is an international standard format that allows easy processing of data on the Internet. When stored in the RDF format, computers can process the data automatically, and researchers can utilize the data from a diverse range of fields.
[About Portal Site]
Hiroko Tatesawa and Hideki Hatanaka
National Bioscience Database Center, JST
5-3, Yonbancho, Chiyoda-ku, Tokyo 102-8666, JAPAN
Tel: +81-3-5214-8491 Fax: +81-3-5214-8470
Shuichi Kawashima and Mari Minowa
Database Center for Life Science, Research Organization of Information and Systems
Univ. of Tokyo Kashiwa-no-ha Campus Station Satellite 6F 178-4-4 Wakashiba, Kashiwa-shi, Chiba 277-0871, JAPAN
Tel: +81-4-7135-5508 Fax: +81-4-7135-5534