Progress Report

AI & Robots that Harmonize with Humans to Create Knowledge and Cross Its Borders[2] Claim & Analysis AI

Progress until the fiscal year 2023

1. Overview

This research project models the research loop as: hypothesis, experiment, analysis, description & dialogue, and new hypothesis... Within this loop, the assertion and analysis steps, which include hypothesis generation and verification, require AI capable of understanding multimodal scientific data and responding with language-based evidence.
Therefore, we are working on a "Multimodal XAI Foundation Model" as the Argumentation & Analysis AI, developing an AI that can understand relationships between parts corresponding to assertions and analyses described in individual papers, as well as comprehend summaries and similarities across multiple papers.

2. Progress

As a starting point for the Multimodal XAI Foundation Model developed in this research project, we are advancing the initial development of a Multimodal XAI that mutually understands papers. Aiming for the project's 2023 milestone state of "AI robots capable of mutual understanding of research described in existing papers through knowledge exploration using literature," we constructed a pipeline to learn the foundational model from multimodal data combining figures, tables, and text in papers.
We conducted verifications on the project's overall 2023 milestones: "verification of internal consistency understanding in papers," "verification of mutual understanding between papers," "verification of survey generation including mutual understanding between papers," and "verification of similarity understanding between papers." These verifications were performed as downstream tasks of the foundation model, following fine-tuning.

Multimodal XAI Foundation Model

In this research, we worked on achieving technical goals based on milestone evaluation items. To reduce the cost of manually creating datasets related to paper comprehension, we built a framework utilizing general data and paper knowledge. We also conducted research on hypothesis generation and law discovery using generative AI.
For paper consistency understanding, as shown in Fig. 1, we constructed a model using BERT and GPT-4 to detect and explain consistency between paper claims and experimental results. This improved the accuracy of consistency detection using pre-training datasets of papers. In user studies, we received high satisfaction ratings from researchers.

Fig. 1: Example of consistency (top) and pipeline (bottom)
Fig. 1: Example of consistency (top) and pipeline (bottom)

For mutual understanding between papers, we developed a model that learns paper similarities and estimates similar parts between new papers. We obtained high-accuracy similarity evaluation results in both information and chemistry fields.
Furthermore, we automated survey generation including mutual understanding between papers. Based on the pipeline shown in Fig. 2, we selected relevant papers using a retrieval model based on specified topics and created surveys including figures and tables. In user studies, we received high evaluations from researchers.

Fig. 2: Paper summary generation based on specified topics
Fig. 2: Paper summary generation based on specified topics

Lastly, for similarity understanding between papers, we created similar paper pairs and automatically generated text explaining their similarity. This also received high evaluations in user studies.

3. Future work

We will continue to advance research understanding from literature while aiming to realize hypothesis generation. We will embed scientific and technological paper knowledge into a continuous space using large language models and multimodal foundation models, explore and reason within this space, and generate scientific hypotheses considering the balance between novelty and validity.
As scientific and technological knowledge is niche and prone to hallucination, retrieval expansion and stepwise reasoning that maintain creativity while suppressing misinformation are necessary. It is also important to realize XAI with explainability on multimodal paper understanding models.
Furthermore, our goal is to research and develop foundation models that allow AI to create new knowledge and hypotheses in scientific and technological fields, making them available to human researchers. We aim to realize mutual relationship understanding in research using AI with multimodal input/output and explainability and achieve multimodal hypothesis generation in collaboration with other research and development tasks.