The innovation of materials plays a very important role in technological progress and industrial development, but the traditional process of developing new materials uses trial and error methods, the experimental procedures are cumbersome, the research and development cycle is long, and resources are wasted. During the experiment, researchers often fail to meet their experimental expectations, and a lot of unsatisfactory data are generated. Although these experimental procedures provide us with trial and error experience, it seems that the failed experimental data is placed there to become useless; in addition, there are more and more technical means for material characterization, and the corresponding graphic data and dimensions are also increasing. It is complicated, and sometimes it is impossible to unearth the deep connection between material properties by relying on human experimental analysis. Moreover, with the development of computers, many methods such as first-principles calculations, phase field simulations, and finite element analysis have emerged. It is used to calculate the structure and performance of the material, but it is often computationally expensive and expensive. These are all major factors that limit the development and change of materials.
In order to solve the above-mentioned problems, combined with the current development trend of artificial intelligence, scientists have discovered that we can integrate all experimental data, calculation and simulation data, no matter how good or bad, we can form a certain number of databases; in the database, According to certain properties of materials, machine learning models can be established, which can quickly predict the performance of materials, and even design new materials, which solves the problems of long cycle and high cost. In recent years, this method of using machine learning to predict new materials has been increasingly favored by researchers. In 2018, a review article titled “Application of Machine Learning in Molecular and Materials Science” was published in the main issue of Nature . The article introduced in detail the important role of machine learning in guiding chemical synthesis, assisting in the characterization of multi-dimensional materials, and obtaining new material design methods, and stated that the new generation of computer science will have a transformative effect on materials science.
Based on this, this article gives a brief introduction to machine learning, and a detailed discussion of the research progress of machine learning in the field of materials. Based on the views of predecessors, it summarizes the new development trend of machine learning in the field of material design. More researchers pay more attention in this direction.
2Introduction to machine learning
The so-called machine learning is the process of giving computer humans the ability to acquire knowledge or skills, and then use these knowledge and skills to solve the problems we need to solve.
The process of using machine learning to solve problems is to define the problem-data collection-model building-evaluation-result analysis. As shown in Figure 2-1 . It is to establish a suitable database for a specific problem, combine computer and statistics and other disciplines together, establish a mathematical model and continuously evaluate and correct it, and finally obtain a model that can accurately predict.
Figure 2-1 Flow chart of the learning process of machine learning
In order to understand the concept of machine learning in general, let’s take a simple example:
When we are children, we are not very clear about the concept of gender, which belongs to the process of step 1: problem definition. Is this person a man or a woman?
As we grow up, more and more people are in contact, and we know more and more about the characteristics of men and women, such as timbre, dress, appearance, hairstyle, behavior, etc. This is step two: data collection
Based on these characteristics, our brain automatically builds a model to recognize gender. This way when we meet a stranger. We can immediately identify his gender. Belongs to step three: model establishment
However, people who just have the concept of gender characteristics often make mistakes in identifying gender. For example, they mistakenly believe that a man with long hair is a woman, and a woman with short hair is a man. To correct the wrong judgment, our brain will remember this feature and reconstruct the brain model, so that the gender distinction can be more accurate. This belongs to Step Four: Evaluation.
Finally, we have the ability to identify gender, and can accurately determine the gender of the other party. This is the final result analysis process.
Of course, the learning process of machine learning is not so simple. According to whether the machine learning training set has a corresponding mark, it can be divided into supervised learning, unsupervised learning, semi-supervised learning and reinforcement learning. The machine learning classification and corresponding part of the algorithm are shown in Figure 2-2. It should be noted that the scope of machine learning is very large, and it is difficult for some algorithms to be clearly classified into a certain category. For some categories, the same category of algorithms can address different types of problems
984), specific analysis should be done when solving practical problems. In addition, with the continuous development of machine learning, the concept of deep learning often appears around us. Deep learning is an extension of neural network algorithms in machine learning. It is the second stage of machine learning-deep learning. Multi-layer perceptrons in deep learning can make up for the shortcomings of shallow learning. Deep learning algorithms include Recurrent Neural Network (RNN), Convolutional Neural Network (CNN), etc. . This article does not give too much introduction to the algorithms of machine learning and deep learning. For the detailed content, please refer to related books on machine learning.
3 Application of machine learning algorithms in material design
The concept of “using computational models and machine learning for material prediction and design” was first proposed by Professor Gerbrand Ceder, a materials scientist at the University of California, Berkeley. Professor Ceder pointed out that we can learn from the methods of genetic science, just like DNA base pairs encode proteins and other biological materials, the “material genome” is used to encode various compounds, and the tool to achieve this “encoding” is computer data. Mining and machine learning algorithms, etc. This concept has received widespread attention. Subsequently, in the summer of 2011, the Obama administration announced the “Materials Genome Initiative” (MGI), which set off a revolution in materials science. At present, machine learning has made some progress in materials science, such as the analysis of material structure, phase change and defects [4-6], and the characterization of auxiliary material testing [7-9].
3.1 Analysis of material structure, phase transition and defects
In June 2017, Isayev  and others linked the AFLOW library and structure-performance descriptors to build a database, and used machine learning algorithms to predict thousands of inorganic materials. First, construct a material fragment model (PLMF) with attribute annotations: decompose the crystal structure of the material into interconnected topological fragments to indicate the connectivity of the structure; give the vertices in the PLMF graph unique physical and chemical properties of each atom (Such as the position of the atom in the periodic table, electronegativity, molar volume, etc.) to distinguish different materials. Then, using Gradient Boosting Decision Tree Algorithm, 8 prediction models (Figure 3-1) are established, one of which is a two-class model, which is used to predict whether the material is Metal is still an insulator; the other 7 models are regression models to predict the band gap energy (EBG), bulk modulus (BVRH), and shear modulus of insulator materials (GVRH), Debye temperature (θD), constant pressure heat capacity (CP), constant volume heat capacity (C< sub>v) and thermal diffusivity (αv). After calculation and verification, it is found that among the 26674 materials in the database, the accuracy of metal/insulator classification is 86%, and only 2414 materials are misclassified (Figure 3-2). It is found that polar inorganic materials have larger band gap energy (Figure 3-3), and the predicted thermo-mechanical properties are basically consistent with experimental and calculated data (Figure 3-4).
Figure 3-1 machine learning flowchart
Figure 3-2 Data set classification diagram Figure 3-3
Figure 3-3 The relationship between band gap energy and ionization potential
Figure 3-4 Comparison curve between model prediction data and calculated data
In 2018, Zong and others used random forest algorithm and regression model to study the critical temperature of superconductors. First, model the superconducting transition temperature (Tc) of more than 12,000 known superconductors and candidate materials based on the information in the SuperCon database. According to whether the Tc is higher or lower than 10K, the materials are divided into two categories, and a non-parametric random forest classification model is constructed to predict the category of superconductors. The random forest model and the scatter plot of superconducting material Tc are shown in Figures 3-5 and 3-6. Subsequently, a regression model was developed to predict the Tc value of various materials such as copper-based, iron-based, and low-temperature conversion compounds, and good results were also obtained, using the AFLOW online repository They further improved the accuracy of these models. Finally, the classification and regression models were combined into an integrated pipeline, which was used to search the entire inorganic crystal structure database and predict more than 30 new potential superconductors. Therefore, the application of complex ML algorithms greatly accelerates the search for candidate high-temperature superconductors.
Figure 3-5 Flow chart of random forest algorithm
Figure 3-6 Tc scatter plot of superconducting materials
3.2 Characterization of auxiliary material testing
In recent years, due to the emergence of in-situ probes, it has become possible for researchers to study the mechanism of ferroelectric domain structure flipping under external stimuli. However, the amount, type, accuracy and speed of the data generated by the experiment have increased in steps, making traditional analysis methods difficult. Therefore, in January 2018, J. C. Agar  of the University of California at Berkeley and others designed a machine learning workflow to help us understand and design ferroelectric materials. First, use Principal Component Analysis (PCA) to reduce the noise of the ferromagnetic hysteresis loop. The hysteresis curve after noise reduction is changed from (Figure 3-7) As shown in the line, it can fit all the structural characteristics of the hysteresis loop well, and solve the problem of insufficient fitting accuracy of the traditional 15-parameter function (Figure 3-7), red. Then, in order to quantitatively analyze the concave feature of the piezoelectric hysteresis loop, the convex structure curve shown in Fig. 3-8 is constructed. Using the k-means clustering algorithm, the transition process of the hysteresis loop is classified according to the distance between the center of the depression and the red line. When we perform PFM spectrum analysis, we can only characterize the transition between a1/a2/a1/a2 and c/a/c/a, but cannot find the inversion within a1/a2/a1/a2, so the above reduction The noisy data, convex hull curve and k-means clustering method are combined for analysis, and the transformation mechanism of the structure within a1/a2/a1/a2 is found. And using the method of cross-validation to explain the accuracy of the classification model, the accuracy is 92±0.01% (Figure 3-9). In addition, the author used Gaussian fitting to quantify the amplitude of the hysteresis transition curve, combined with machine learning to determine the “peak”/”valley” c/a/c/a- The feature of increased iron elasticity on the boundaries of the a1/a2/a1/a2 domain (Figure 3-10), and this feature cannot be discovered artificially. Once the feature is established, the workflow can quantify the effects of statistical significance and nanometer resolution.
Figure 3-7 The hysteresis loop of the piezoelectric response at a single pixel: the original data (blue circle), the traditional fitting curve (red line) and the curve after noise reduction (black line).
The above is my understanding of the role of machine learning in the development of the material field. If it is not enough, please correct me.
Link to this article：Abandon trial and error and let machine learning teach you to design new materials
Reprint Statement: If there are no special instructions, all articles on this site are original. Please indicate the source for reprinting:Alloy Wiki，thanks