January 30-February 1, 2017
San Diego, California, USA
From left, Motoki Sekine (Chief), Takahiro Kawamura (Senior Researcher), and Katsuji Matsumura (Manager)
The Buzzwords-of-the-year contest held at the end of the year always reminds us that the society creates new words every day. The same things happen even in the science and technology field. With the development of science and technology, new words are born one after another. For example, the word “IPS cells” (artificial pluripotent stem cells) was born only 10 years ago.
In order to use such scientific and technological terms accurately and properly, JST has been maintaining a structured dictionary called the JST thesaurus since 1975, including hyponym/hypernym relations and synonyms of the terms. Experts of each academic subject have periodically selected new terms out of a large amount of literature and put them in the proper places in the thesaurus manually. However, this work is obviously time consuming and costly.
Three members of the Office of Information Analysis, Department of Information Planning, JST—Takahiro Kawamura (Senior Researcher), Motoki Sekine (Chief), and Katsuji Matsumura (Manager)—thus developed a prototype of an automatic thesaurus maintenance system using AI technology. The method proposed in their paper extracts new important terms from scientific and technological literature and then estimates the relations with the terms in the thesaurus with high accuracy. The paper presented at the 11th IEEE International Conference on Semantic Computing held in San Diego, California, USA won the Best Paper Award.
The relation of the new term to the terms in the thesaurus is estimated based on the distance between the two terms. If two terms are too close, such terms would be synonyms. But, if they are moderately close, they can be regarded as related.
The system introduced a technique that represents the meanings of terms in the literature as multidimensional vectors for computing the distance, and also invented a novel clustering method that partitions about 3,000 terms of the electric and electronics category in the thesaurus for narrowing down the candidates. While the estimation accuracy of the conventional methods was 30 to 40%, the proposed methods realized more than 70% accuracy.
The proposed method is expected to be applied not only for the JST thesaurus and various dictionaries but also for an ontology that defines descriptions and interrelationships of the concepts of a knowledge, as well as machine translation and automatic keyword indexing in the future.