Materials Informatics - Materials Design by Digital Data Driven Method/CRDS-FY2013-SP-01
Executive Summary

The Center for Research and Development Strategy (CRDS) of the Japan Science and Technology Agency (JST) considers the establishment of “design”*1 methods for materials with target properties and functions to be important for the future of materials research. In this proposal, in order to support this activity, our study results concerning the utilization of informatics for materials research are summarized from a strategic standpoint.

As one of the future courses of materials research, the “design” of materials with functions that contribute to the solutions of problems is an essential and important concept. This involves the development of methods to rationally search for materials which possess the specified properties desired by a society or industry (inverse problem), which is unlike the traditional research procedure, in which the properties of a material are discovered from a given elemental composition and structure (direct problem). For example, if candidate materials with target properties and functions for lithium-ion batteries, thermoelectric conversional, or superconductivity can be rationally discovered, such a method will contribute greatly to the business solution in energy and resource issue and others.

In our proposal, as a new methodology necessary for the “design” of materials, materials informatics is defined as a “scientific and technical method of solving problems in materials science using computer science (data science*2 and computational science) in combination with diverse and massive collections of data regarding the physical and chemical properties of materials,” and the importance of this new study area is proposed. Its most major characteristic is to qualitatively or quantitatively develop the theoretical understanding of problems that cannot be solved without the aid of computers. This allows us to deal with a massive amount of data related to materials research in a comprehensive, scientific, and systematic manner, with the aim of both promoting the discovery of hidden rules, new theories and principle, and shortening the time between theoretical discoveries and the development of practical materials.

The development of advanced materials is essential to activate many industries, such as energy, health-care, materials, and chemical industries. For more than a decade, the Japanese government has invested substantially in research and development for the discovery and design of advanced materials. Nevertheless, a lengthy period of between 10 and 30 years is still generally required before an advanced material that is newly discovered by a research laboratory is put to practical use and entered into the marketplace. Without accelerating and shortening this process, it will be difficult to maintain the scientific and technological capabilities and industrial competitiveness of Japan in the recent situation where many industries are globalized and science and technology are no longer monopolized by certain advanced countries.
Using the traditional (existing) methods of experimental science, many types of substances have been synthesized based on experience and intuition, and their properties have been evaluated. In such methods, a plurality of experimental data regarding the structure and properties of a material is organized on the basis of various factors, and based on this activity, the functional properties of the material are estimated and theoretical models are constructed. This method has thus far made great contributions to materials research.

Due to recent advancements in computer science as well as computational power, and also reductions in computer costs, many laboratories have been able to perform analyses of experimental results to understand at an electronic theory level on a daily basis. Even before an experiment is undertaken, they are able to often estimate electronic orbits and electronic structure in order to predict whether a target function or property will be obtained.

Based on the above background, a collaborative effort between the three core study areas of materials synthesis/processing, measurement/analysis, and theory/simulation, has recently been initiated in order to advance the effectiveness in materials research, which should be resulted in substantial progress in communication between experimental and computational scientists. Fundamentally, however, such collaborative efforts, by themselves do not make it possible to “design” a material which possesses a target property or function. Accordingly, it remains difficult to progress beyond the standard method of obtaining a target function by a trial and error approach based on experience and intuition.

As a new means of promoting “design” beyond the existing approach, we propose the introduction of informatics, with the construction of a data-driven model to inductively gain knowledge (rules) from massive, complicated sets of data. This will produce a synergetic effect in parallel with use of a deductive approach (principle-driven) based on theories or models, which may lead to great advancements in future materials research.

The total number of substances that have heretofore been synthesized, and the amount of information on the properties of these synthesized substances are enormous.*3 In fact, it becomes increasingly difficult to obtain prospects for the development of novel materials using only traditional guiding attributes, such as crystal structure and molecular formula. Furthermore, recent advances in measurement and simulation techniques have resulted in the short-time generation of an enormous volume of data. Because the volume of this data will doubtless continue to increase steadily in the future, in order to extract meaningful information from such massive data, a higher-order concept (new axis) is required to obtain a holistic view of this massive materials data and to organize such data in a comprehensive manner.
In recent years, in the area of data (mathematical or information) science, rapid progress has been made in multivariate, high-dimensional data analysis, in terms of methods to extract the information structure inherent in massive or complicated data.
Methods for the scientific analysis of data which are qualitatively different from traditional analysis methods are now at a realizable stage.

“Integration of computational techniques and information analysis techniques that are related to materials research will reduced the materials development cycle from its current 10 to 20 years to 2 to 3 years,” according to a proposal written in“Integrated Computational Materials Engineering,” issued in 2008 by the National Research Council of the National Academies of Sciences. Having accepted this proposition, the American government initiated a national effort called the “Materials Genome Initiative” in 2012. Regarding specific programs, for example, the NSF started “Designing Materials to Revolutionize and Engineer our Future (DMREF)”, while the NIST started “Building the Materials Innovation Infrastructure: Data and Standards” in 2012.

Turning to the present state of affairs in Japan, although simulation analysis and prediction are in progress in materials research, prediction and visualization by exhaustive computation or machine learning, using data or data sets and computational capabilities, are not utilized at all. Moving forward, an urgent collaboration between experimental scientists and computational scientists as well as data scientists is demanded.

Major expected research subjects and scientific methods are as follows:
(1) The topics to be solved by materials researchers include: “Extraction of rules for structure and physical property correlation and clarification of complicated phenomena, etc. by the analysis of a wide diversity of massive data,” “Prediction of properties and structure using massive data (design of experiments),”“Search for materials (structure) using optimization techniques, etc.,” “Multi-scale modeling,” “Advancement of mathematical models using high-dimensional data,” and “Visualization of material space or analytical data (Method: Holistic visualization or highlight visualization; Target: raw(primary) data, such as images and spectra, and processed(secondary) data, such as numerical groups in papers).”

(2) As data scientific methods, “Machine learning,” “Bayesian inference,” “compressed sensing (sparse modeling),”“data assimilation,” “inverse problems,” “mathematical modeling,” “optimization,” and other interpolation or extrapolation methods should be applied to the above (1).

It is frequently difficult for only one laboratory to perform this data-driven approach; thus, the involvement of more than one person and group with a variety of specializations is essential. In order to promote the above-mentioned research subjects, it is therefore also necessary to build an open platform for materials informatics to serve as a common ground in which all of the persons involved in materials innovation, from the academic sector to the industrial sector, can easily use all data.

The roles of the Japanese government and research institutes to advance the above-mentioned efforts are as follows:

 Role of the Japanese government
In Japan, most of the measuring equipment and computers used for research, as well as simulation software and databases are foreign products, and the status of Japan is as a user, internationally. The harmful influence of this situation is not limited to an economic problem in which Japan purchases products from countries of manufacture at high prices. Because software is made so that researchers in the countries of manufacture can use it to their advantage, Japanese researchers are at great a disadvantage in the use of such software to obtain the results. For example, consider the situation in which a special analysis may be conducted using certain software, so that special visualization is conducted, and the efficacy of a given medicine can be determined. Now, further consider that this software may be commercialized and exported without revealing this special analysis method to persons other than the software developer and the university researchers involved in the collaborative research. In this case, the Japanese researchers who have to use that standardized software to conduct research will be greatly handicapped, from the outset.

The construction of infrastructure, such as database construction and the commercialization of simulation software equipped with new algorithms, does not tend to receive suitable support because of the nature of a competitive fund that focuses on the promotion of basic research and practical application of research results. The Japanese government should not expect short-term results, but should provide continuous support from a long-term standpoint, and the preparation and sharing of, not only of a hardware infrastructure, but also an “intellectual foundation (software infrastructure)” are its urgent duty. Thus far, special attention has been given to the development and commoditizing of (large) advanced measurement research facilities and equipment. In addition, the preparation of an information infrastructure technique to cope with big data is at an initial stage. However, there is no environment in which researchers can use a wide variety of data on materials, which is inseparably related to or more valuable than the above-mentioned efforts. Accordingly, the immediate construction of such a system is an urgent task.

 Role of research institutes
To build an open platform for materials informatics, continuous contributions from universities and public research institutes are essential, along with financial support from the Japanese government. With the aim of implementing integrated and comprehensive research on materials informatics, they should build the framework for a research system to promote close cooperation among researchers in materials process, measurement, and computation in materials science to create a new current of scientific and technological research via cooperation and integration with data science. To accelerate materials development, the materials research community needs to share data sets, simulated empirical formulae, and advanced algorithms across study fields.

The most important activity is to build an evaluation system to allow the academics themselves to discover and encourage new-risen and integrated areas, as written in our proposal, to positively recruit the necessary motivated persons, and to provide incentives.

Concretely speaking, we propose that the following activities should be conducted as related programs.

 Launch of a data-driven research program
A feasibility study (FS) program for a method for data exploration and analysis will be conducted to establish and advance this method, and result cases will be created. In particular, a program which involves basic collaborative proposals by materials and data scientists should be initiated with an annual research fund of approximately 10 to 20 million yen. Because collaborative proposals involving different research fields are demanded, it will be necessary to provide a longer application period than usual, in order to guarantee a cross-boundary exchange period through academic conferences and workshops.

 Establishment of a data integration/research & development center
A core site for data integration and research & development will be established to determine a data management policy (i.e., the type and range of data to collect) and to manage data and tools. This site will provide with the functions of a service center, to promote the construction of a materials informatics system in which research institutes and universities introduce the concept of Lined Open Data (LOD) and use standard data formats, such as the Resource Description Framework (RDF)/XML to exchange information for application to materials development.

*1: “Design” in our proposal means the examination of a function or property demanded for a material, and the determination of composition to realize it as a specification. This is an engineering concept, at the opposite end of the spectrum from “analysis” as a scientific means, and is sometimes called “synthesis.” Key factors describing substances in materials R&D are multilateral, such as texture, crystal structure, chemical composition, electronic structure, and magnetic structure, and these are determined in consideration of the degrees of freedom (such as size, electric charge, spin and orbital) of each element. “Design” in an analytical manner is impossible; accordingly, a trial-and-error approach based on experience and intuition is usually required. Our proposal brings forward a systematic design method to surmount the dependence on the traditional method based on experience and intuition.

*2: “Data science” in our proposal is a generic name for techniques to discover characteristic patterns from complicated or massive data to gain effective knowledge, and includes engineering techniques. Typical examples of data science include data mining and machine learning. Among information science, statistics, and applied mathematics (or computer science or mathematical science), areas related to data processing are applicable.

*3: Although the number of elements dealt with in materials science is approximately 80, the number of combinations of these elements is approximately 3,000 in a binary system and approximately 80,000 in a tertiary system. However, the number of possible compounds is greatly increased if composition ratios are considered. For example, in the case of a compound AxByCz, where x, y and z are integers and x + y + z = 10, the number of combinations is 100, with 8 million for the entire tertiary system. Even if the composition is the same, crystal polymorphism frequently occurs to yield many types of structure, so the number of combinations increases.

The number of data on up to tertiary inorganic compounds listed in the Inorganic Crystal Structure Database (ICSD) is 76,000 at present, and if duplicated data, which consists of plural data for each compound, are not counted, the number of inorganic compounds with experimentally-known crystal structures is less than 50,000.