v

New paradigm of life science research driven by artificial intelligence_China Net

China Net/China Development Portal News In 2007 Sugar Daddy, Turing Award winner Jim Gray proposed There are four types of paradigms for scientific research, which are basically widely recognized by the scientific community. The first paradigm is experimental (empirical) science, which mainly describes natural phenomena and summarizes laws through experiments or experiences; the second paradigm is theoretical science, where scientists summarize and form scientific theories through mathematical models; the third paradigm is computational science, which uses computers to Simulate scientific experiments; the fourth paradigm is data science, which uses large amounts of data collected by instruments or generated by simulation calculations for analysis and knowledge extraction. The paradigm change in scientific research reflects the evolution of the depth, breadth, method and efficiency of human exploration of the universe.

The development of life sciences has gone through multiple stages, and the evolution of its research paradigms also has its own unique disciplinary attributes. In the early stages of the development of life sciences, biologists Zelanian sugar mainly explored the general principles of biological existence by observing the morphology and behavioral patterns of different organisms. The common law of form and evolution. The representative of this stage is Darwin. Through global surveys, he accumulated a large number of descriptive data on species and proposed the theory of evolution. Since the mid-20th century, marked by the revelation of the double helix structure of DNA, life science research has entered the era of molecular biology, and biologists have begun to study the basic composition and operating laws of life at a deeper level. At this stage, biologists still mainly summarize rules and knowledge through observation and experiments of biological phenomena. With the further development of life sciences and the rapid emergence of new biotechnologies, scientists can conduct more extensive explorations of life sciences at different levels and at different resolutions, which has also led to explosive growth in data in the field of life sciences. Combining high-throughput, multi-dimensional omics data analysis with experimental science to more precisely describe and analyze biological processes has become the norm in modern life science research.

However, living systems have multi-level complexity, covering different levels from molecules, cells to individuals, as well as the population relationship between individuals and the interaction between the organism and the environment, showing multi-level, high-level Dimensional, highly interconnected, and dynamically regulated. When facing such complex Zelanian Escort living systems, the existing experimental scientific research paradigm can often only study a limited number of objects at specific scales. It is difficult to fully understand the operating mechanism of biological networks when observing, describing and studying samples; and it is highly dependent on human experience and prior knowledge to explore specific biological relationships, making it difficult to efficiently extract hidden associations from large-scale, diverse, and high-dimensional data.and mechanism. In the face of complex nonlinear relationships and unpredictable characteristics in life phenomena, artificial intelligence (AI) technology has demonstrated powerful capabilities, and has shown disruptive application potential in protein structure prediction and gene regulatory network simulation and analysis. Life science research has moved from the first paradigm of mainly experimental science to the new paradigm of life science research driven by artificial intelligence – the fifth paradigm (Figure 1).

This article will focus on typical examples of AI-driven life science research, the connotation and key elements of the new paradigm of life science research, and the empowerment of the new paradigm. Systematically discuss three aspects: the frontiers of life science research and the challenges faced by our country.

Typical examples of life science research driven by artificial intelligence

Life is a complex system with multiple levels, multi-scales, dynamic interconnection and mutual influence. When faced with the extreme complexity of life phenomena, multi-scale spans, and dynamic changes in space and time, traditional life science research paradigms can often only start from a local perspective and establish limited biological molecules and phenotypes through experimental verification or limited-level omics data analysis. relationship. However, even at great cost, it is often possible to discover only a single linear correlation mechanism in a specific situation, and the non-linearity of lifeZelanian Escort activities Properties vary significantly in complexity, making it difficult to fully understand the operating mechanism of the entire network.

AI technology, especially technologies such as deep learning and pre-trained large models, with its superior pattern recognition and feature extraction capabilities, can surpass human rational reasoning ability in the case of huge parameter stacking, and extract data from data. Better understand the patterns in complexNZ Escortsbiological systems. The continuous development of modern biotechnology has led to a leapfrog growth in data in the field of life sciences. In the past global life science research, humans have accumulated a large amount of data based on experimental description and verification, creating a foundation for AI to decipher the underlying laws of life sciences. ]. When there are sufficient and high-quality data and algorithms adapted to life sciences, AI models can be used at multiple levels Use “low-dimensional” data to predict “high-dimensional” data in massive data”Information and laws, realize the leap from low-dimensional data such as gene sequences and expressions to revealing the laws of high-dimensional complex biological processes such as cells and organisms, and analyze complex non-linear relationships, such as the laws of biological macromolecule structure generation and gene expression regulation mechanisms, Even the underlying laws in complex biological systems where multiple factors intersect, such as ontogeny Zelanian sugar and aging. Under this development trend, life in recent years has A number of typical examples of AI-driven development of life science research, such as protein structure analysis and gene regulation analysis, have emerged in the scientific field.

Examples of protein structure analysis

As the executor of key functions in organisms, protein’s structure directly affects important biological processes such as transportation, catalysis, binding and immune functions. Although sequencing technology can reveal the amino acid sequence contained in a protein, any protein with a known amino acid sequence Chains have the potential to fold into any of an astronomical number of possible conformations, making accurate resolution of protein structures a long-standing challenge. Traditional techniques such as nuclear magnetic resonance, X-ray crystallography, cryo-electron microscopy, etc. are used to resolve protein structures of known sequences. This method takes several years to depict the shape of a single protein, is expensive and time-consuming, and does not guarantee the successful analysis of its structure. Therefore, capturing the underlying laws of protein folding to achieve accurate prediction of protein structure has always been the most important in the field of structural biology. One of the challenges.

AlphaFold 2 uses a deep learning algorithm based on the attention mechanism to train on a large amount of protein sequence and structure data, and combines priors from physics, chemistry and biology NZ Escorts knowledge, built a protein containing feature extraction, encoding, and decoding modules. Pei Yi’s eyes widened instantly, and Yue said involuntarily: “Where are you?” So much money? “After a while, he suddenly remembered the love his parents-in-law had for his only daughter, his wife, and wrinkled the structure analytical model. In the 2020 International Protein Structure Prediction Competition (CASP14), AlphaFold 2 achieved remarkable results, and its protein three-dimensional structure prediction accuracy was even It is comparable to the results of experimental analysis. This breakthrough brings a new perspective and unprecedented opportunities to the field of life sciences, mainly reflected in three points.

It has a direct impact on the field of drug discovery. Big Most drugs trigger changes in protein function by binding to special structural domains of proteins in the body. AlphaFold 2 can quickly calculate the structures of massive target proteins and then design drugs to effectively bind to these proteins.

The rational design of proteins provides new possibilitiesZelanian sugarPerformance. Once AI has a deep understanding of the underlying laws of protein folding, it can use this knowledge to design protein sequences that fold into the desired structure. This makes biology Scientists can freely design and modify the structure of proteins or enzymes according to their needs, such as designing higher activity gene editing enzymes, or even protein structures that do not exist in nature. At the same time, it also promotes people’s understanding of the structural projection rules of genetically encoded information at the protein level. Understand and will greatly improve human beings’ ability to transform life.

AlphaFold 2 has completely changed the research paradigm in the field of protein structure analysis. From the time-consuming and labor-intensive traditional experimental technology to analyze protein structure to a low-threshold, A new paradigm for predicting three-dimensional protein structures with high accuracy and throughput “Yes. “Lan Yuhua nodded slightly, her eyes warmed, and the tip of her nose Zelanian sugar was slightly sour, not only because of the impending separation, but also because His concern. Formula proves that by combining protein knowledge and AI technology, high-dimensional and complex knowledge can be extracted and learned, promoting a deeper understanding of the physical structure and function of proteins.

Analysis of gene regulation rulesNewzealand SugarExample

The Human Genome Project is known as the third major project of mankind in the 20th century. One of the big science projects has unveiled the mystery of life. Although the genetic information encoding living individuals is stored in DNA sequences, the fate and phenotype of each cell vary widely due to its unique spatiotemporal background. This complex Life processes are controlled by sophisticated gene expression regulatory systems, and exploring the ubiquitous gene regulatory mechanisms of life is one of the most important life science issues after the Human Genome Project. Gene expression profiles of different cells are an important step in understanding gene regulatory activities within biological systems. an ideal window. However, comprehensive interpretation of gene regulation mechanisms only through biological experiments requires controlled experiments that capture different cell types of different biological individuals in different environmental backgrounds. Traditional bioinformatics analysis methods can only handle a small amount of data, and are not suitable for large amounts of data. Biological big data that is large-scale, high-dimensional Newzealand Sugar and lacks accurate annotation is difficult to capture the complex non-linear relationships in the data.

In recent years, continuous breakthroughs in natural language processing technology, especially the rapid development of large language models, can make the model have the ability to understand human language description knowledge through training corpus data, bringing new ideas to solve problems in this field . Many international research teams draw lessons from the training ideas of large language models,Based on tens of millions of human single-cell transcriptome profile data and huge computing resources, using advanced algorithms such as Transformer and a variety of biological knowledge, multiple basic life models with the ability to understand the dynamic relationships of genes have been constructed. Such as GeneCompass, scGPT, Geneformer and scFoundation, etc. These large life basic models are trained based on underlying life activity information such as gene expression, and use machines to learn and understand these “low-dimensional” life sciencesZelanian EscortThe correlation and correspondence between data and complex “high-dimensional” gene expression regulatory networks, cell fate transitions and other underlying life mechanisms enable effective simulation and prediction of high-dimensional information with low-dimensional data. This kind of simulation of gene expression regulatory networks can show excellent performance in a wide range of downstream tasks, providing a new way to deeply understand the laws of gene regulation.

Existing successful cases of AI-driven life science research prove to us that in the face of deeper and more systematic life science problems, AI is expected to break through the dilemmas that are difficult to solve with traditional research methods and build a system from the basic biological level. Projection theoretical system to the entire life system, and further promote the development of life science to a higher stageNZ EscortsNewzealand Sugar exhibition opens a new paradigm in life science research.

The connotation and key elements of the new paradigm of life science research

With the continuous progress of biotechnology, the rapid growth of life science data, and the rapid development of AI technology Development and its in-depth cross-integration with the field of life, AI has demonstrated an in-depth understanding and generalization ability of life science knowledge, which not only improves the research height and breadth of life sciences, but also promotes the first phase of life science research to focus on experimental science. First paradigm, leaping into a new paradigm of AI-driven life science research Zelanian sugar (the fifth paradigm, hereinafter referred to as the “new paradigm”).

Through an in-depth analysis of typical examples of AI-driven life science research, the author believes that the new paradigm of life science research is like an intelligent new energy vehicle, benchmarking the battery system and electronic control system of new energy vehicles. , motor systems, assisted driving systems, chassis systems and other core technologies, the new paradigm should have five key elements: life science big data, intelligent algorithm models, computing power platforms, expert prior knowledge and cross-research teams (FigureNewzealand Sugar2). Just like a battery system provides energy for a vehicle, life science big data provides basic resources for scientific research; the algorithm model is like an intelligent electronic control system, empowering It can deeply understand the operating mechanism of biological systems; the computing platform can be likened to a motor system, responsible for processing massive scientific data and complex computing tasks; expert prior knowledge is like an assisted driving system, providing direction guidance and implementation experience for scientists; cross The research team is similar to a chassis system, responsible for integrating knowledge and skills in different fields, improving research efficiency through interdisciplinary cooperation, and promoting the development of life sciences.

Key element one: life science big data

Life science big data is the “battery” system of the new paradigm “car”. With the development of new biotechnology, life science big data has the characteristics of multi-modality, multi-dimensionality, dispersed distribution, hidden correlation, and multi-level intersection. Data is gradually formed; only by effectively integrating life science big data and fully mining the data using innovative AI technology can we break the cognitive limitations of human scientists, promote the generation of new discoveries, and expand the scope of life science exploration. For example, large medical vision models , by integrating multi-source, multi-modal, and multi-task medical image data, it is possible to achieve optimal performance under few-sample Zelanian Escort and zero-sample conditions. A variety of NZ Escorts applications; GeneCompass, a large cross-species model of the basis of life, effectively integrates global open source single-cell data on more than 120 million The analysis of multiple life science issues such as panoramic learning and understanding of gene expression regulation rules has been realized on the training data set of single cells.

Key element two: intelligent algorithm model

The intelligent algorithm model is the “electronic control” system of the new paradigm “car”. New laws and new knowledge of life emerge from the vast sea of ​​life science big data, which requires innovative AI algorithms and models; how to develop and utilize them The AI ​​algorithm adapted from life sciences extracts effective information. The husband said that he had something to deal with on the night of the wedding, showing this kind of avoidance reaction. For any bride, it is like being slappedZelanian Escort was like a slap in the face. Biological characterization and building dynamic models of large-scale biological processes are central issues in the current new paradigm. For example, the results of the Gerstein team using the Bayesian network algorithm to predict protein interactions were published in Science, laying the foundation for the development of classic machine learning in the field of biological information; the graph convolutional neural network algorithm was used to analyze protein-protein interaction networks and Biomolecular networks such as gene regulatory networks have expanded research directions in the field of life sciences; AlphaFold 2 uses the Transformer model to quickly calculate the structures of a large number of proteins on the basis of high accuracy, all of which demonstrate the application of AI algorithm models in life science research. importance in the new paradigm.

Key element three: computing power platform

The computing power platform is the “motor” system of the new paradigm “car”. Computing power is the basis for AI operation. The continuous development of AI algorithm models suitable for new paradigms in life science research, such as deep learning and large model technology, requires the support of more powerful and efficient computing power platforms for AI model training. Facing the new paradigm, in the future we should build a hardware capability platform that can support AI-enabled life science research, including building high-speed and large-capacity storage systems, building high-performance and high-throughput supercomputers, developing chips specifically for processing life science data, and designing Special processors for accelerating biological model reasoning and training provide efficient and reliable computing and processing capabilities for life science research to cope with the massive amounts of data generated in the life science fieldZelanian sugar data, meet the computing needs of complex model construction in the field of life sciences, and ensure the application and innovation of AI in the field of life sciences.

Key element four: Expert prior knowledge

Expert prior knowledge is the “assisted driving” system of the new paradigm “car”. Under the new paradigm, existing life science knowledge will provide valuable training constraints, important background and feature relationships for AI algorithm models, help explain and understand the complexity of life science data, and verify and optimize the application of AI in the field of life sciences. ; It can play an important guiding role in AI algorithm design and model construction, promote more accurate and efficient solutions to life science problems, and promote the development of life science research in a more in-depth and comprehensive direction. For example, by embedding the prior knowledge of life science experts and encoding human annotation information, the new gene expression pre-trained large model improves the interpretation of complex feature correlations between biological data and demonstrates better model performance.

Key element five: Cross-research team

The cross-research team is the “chassis” system of the new paradigm “car”. Under the new paradigm, a team composed of AI experts and data scientistsMultidisciplinary research teams composed of scientists, biologists, and medical scientists are crucial to achieving leap-forward life science discoveries. Cross-research teams with diverse backgrounds that work closely together can integrate professional knowledge in AI, biology, medicine and other fields, provide diversified perspectives and methods, provide a solid foundation for comprehensive understanding and solving of complex mechanism problems in life sciences, and provide innovative solutions. The program provides more possibilities to promote breakthrough discoveries and progress in the life sciences.

The frontiers of life science research empowered by the new paradigm and the challenges faced by our country

The traditional research paradigm’s exploration of life is like peeking through a tube. Different subdivisions of life sciences are struggling on their own. With the continuous development of new paradigms, life science research will usher in new research modalities characterized by AI prediction, guidance, hypothesis proposing, and hypothesis verification, bursting out a number of rapidly developing new life science paradigmsZelanian sugar along the research direction and shows the development gains brought about by the new paradigm change. However, accelerating the establishment and promotion of a new paradigm for life science research in my country under current conditions still faces a series of huge challenges.

The frontier of life science research empowered by new paradigms

Structural biology. Currently, in the field of structural biology, AI application technology represented by AlphaFold is still stuck in the “from sequence to structure” protein structure prediction and design stage, and cannot yet achieve the simulation and prediction of protein structure and function under complex physiological conditions. The emergence of higher-quality, larger-scale protein data and new algorithms is expected to systematically analyze the structure and function of biological macromolecules under different physiological states and spatio-temporal conditions, and realize protein “from sequence to function” or even “from sequence”. Intelligent structural analysis and refined design to multi-scale interactions.

Systems biology. Current omics data analysis is still limited to lower-dimensional biological omics observation levels, and has not yet formed full-dimensional observations from the gene level to the cell level or even to the individual or even group omics level. The new paradigm will integrate multi-dimensional and multi-modal biological big data and expert prior knowledge, extract key features of biological phenotypes, build multi-scale biological process analytical models, restore the underlying laws of the operation of complex biological systems, and form a foundation that is widely applicable A new system of systems biology research.

Genetics. With the accumulation of multi-omics data and the emergence of new large genetic models, genetic research has /”>Zelanian sugar‘s habit. If the daughter goes to say hello to her mother too early, her mother-in-law will have pressure to get up early, due to the push to enter a new paradigmIn the rapid development stage, self-supervised pre-trained large models based on gene expression profile data are expected to become a powerful tool for analyzing gene regulation rules, predicting disease targets, and expanding the exploration boundaries of genetics research.

Drug design and development. With the emergence of AlphaFold and the development of a number of molecular dynamics models, AI models have been used to predict and screen drug candidate molecules. In the future, the new paradigm will further promote the development of this field. It is expected that an AI-assisted full-process drug design and development system will emerge, which can independently complete the optimized design of drug structure and properties, realize the simulation prediction of the effectiveness and safety of candidate drugs, and efficiently generate drugs. Synthesis and production process solutions greatly accelerate the development and production process of drugs.

Precision medicine. AI technologies such as computer vision, natural language processing and machine learning have widely penetrated into biological imaging, Sugar Daddy medical imaging, disease intelligent analysis and target Point prediction and other precision medicine subfields. For example, AI-based diagnostic systems are already comparable to or even surpassing experienced clinicians in accuracy in some aspects. However, most of the existing models are subject to the preference of data, and have problems such as poor robustness and low versatility NZ Escorts, with the The emergence of universal precision medicine models driven by new paradigms will help diagnose diseases more quickly and accurately, analyze the molecular mechanisms of diseases, discover new treatment targets, and improve human health.

Challenges facing the new paradigm of life science research in my country

Faced with the new situation and new requirements of the development of the new paradigm of life science research, our country still faces high-quality There are huge challenges such as the lack of life science data resource systems, the lack of key AI technologies and infrastructure, and the lack of new ecosystems for cross-innovation scientific research under the new paradigm.

Lack of high-quality life science data resource system

Although my country’s investment in scientific research in the field of life continues to increase, in some frontier fields, Chinese scientists still rely on Foreign high-quality data, while the construction and use of domestic data are relatively lagging behind. my country’s life science data resources still have uneven distribution problems, which require better overall coordination and Zelanian EscortResource integration to achieve efficient aggregation and systematic improvement of high-quality life science data resources. In addition, during the collection, transmission and storage of life science data, data security issues need to be strengthened urgently. In particular, the privacy and security issues of biological data still need to be paid attention to.

Facing these challenges, our country needs to strengthen the integration and sharing of scientific data resources, promote the sustainable development of life science data resources, improve the quality and security of data, strengthen the transformation of data management and supply models, and promote the improvement of cross-domain and multi-modal scientific and technological resource integration service capabilities to meet the needs of scientific research under the new paradigm. develop.

Insufficient AI key technologies and infrastructure

my country’s core technologies for AI-driven new scientific research paradigms are relatively scarce, and independent and original algorithms, models, and tools are still needed. Develop. In view of the massive, high-dimensional, sparse distribution and other characteristics of life science big data, there is an urgent need to develop advanced computing and analysis methods for complex data. In the future, hardware, software and new computing media that are more suitable for life science applications should be developed, and new computing-biology interaction models should be explored during the integration of life sciences and computing sciences. In short, new paradigm research has put forward new requirements for the comprehensive capabilities of data, networks, computing power and other resources. It is necessary to accelerate the construction of a new generation of information infrastructure and solve the problem of “stuck neck” in computing power.

The lack of new ecology for cross-innovation scientific research under the new paradigm

Most of the existing AI-driven life science research methods are “small workshops” spontaneously assembled by research groups ” model and lacks the cross-innovation environment required for the development of new paradigms. The updated version of the National Artificial Intelligence R&D Strategic Plan released by the United States in 2023 also emphasized the importance of the interdisciplinary development of artificial intelligence research. Therefore, the scientific research ecology under the new paradigm should encourage more extensive multidisciplinary “big crossover” and “big integration”, establish a new research model that combines dry and wet methods, and integrate theory and practice, and continue to cultivate high-level compound cross-research talents.

Under the new situation, our country has also begun to extensively deploy and promote the development of interdisciplinary subjects. The “Fourteenth Five-Year Plan for National Economic and Social Development of the People’s Republic of China and the Outline of Long-term Goals for 2035” points out the need to promote the deep integration of various industries such as the Internet, big data, and artificial intelligence. Combined with the actual development of my country’s life sciences field, the development of my country’s life sciences field should focus on integrating the paradigm change of AI-enabled life science research into my country’s national development vision layout in the new era, so as to achieve an overall effect of point-to-point and area-wide effects and establish a more open new model. Scientific research ecology and development environment.

In recent years, the field of life sciences has been undergoing unprecedented changes. The development of this field is not only driven by biotechnology and information technology, but also by AI. The huge impact of technological progress. The core of this change lies in the evolution from the traditional scientific research paradigm driven by hypotheses and experiments that mainly rely on human experience to a new research paradigm driven by big data and AI. This means that we no longer just rely on experiments and hypotheses, but actively reveal the secrets of life through big data analysis and AI technologySugar Daddy Mystery. More broadly, this evolution willIt broadly changes or promotes changes in scientific research activities at different levels, covering epistemology, methodology, research organization form, economic Newzealand Sugar Ethical and legal aspects.

To sum up, we are living in an era full of change and hope. The innovation of life sciences and the advancement of science and technology jointly draw a future blueprint for mankind’s deeper exploration of the mysteries of life. It is foreseeable that with the further development of general AI, life science research will realize a new model of dry and wet integration and human-machine collaboration in the near future, ushering in the “unprecedented” AI self-driven abstraction of new knowledge and new laws. , a new era of science that thinks about things no one has ever thought about.

(Author: Li Xin, Institute of Zoology, Chinese Academy of Sciences, Beijing Institute of Cellular and Regenerative Medicine; Yu Hanchao, Bureau of Frontier Science and Education, Chinese Academy of Sciences. Contributor to “Proceedings of the Chinese Academy of Sciences”)