Daily variation in blood glucose levels during continuous enteral nutrition in patients on the intensive care unit: a retrospective observational study.
EBioMedicine
BACKGROUND:The circadian timing system coordinates daily cycles in physiological functions, including glucose metabolism and insulin sensitivity. Here, the aim was to characterise the 24-h variation in glucose levels in critically ill patients during continuous enteral nutrition after controlling for potential sources of bias. METHODS:Time-stamped clinical data from adult patients who stayed in the Intensive Care Unit (ICU) for at least 4 days and received enteral nutrition were extracted from the Medical Information Mart for Intensive Care (MIMIC)-IV database. Linear mixed-effects and XGBoost modelling were used to determine the effect of time of day on blood glucose values. FINDINGS:In total, 207,647 glucose measurements collected during enteral nutrition were available from 6,929 ICU patients (3,948 males and 2,981 females). Using linear mixed-effects modelling, time of day had a significant effect on blood glucose levels (p < 0.001), with a peak of 9.6 [9.5-9.6; estimated marginal means, 95% CI] mmol/L at 10:00 in the morning and a trough of 8.6 [8.5-8.6] mmol/L at 02:00 at night. A similar impact of time of day on glucose levels was found with the XGBoost regression model. INTERPRETATION:These results revealed marked 24-h variation in glucose levels in ICU patients even during continuous enteral nutrition. This 24-h pattern persists after adjustment for potential sources of bias, suggesting it may be the result of endogenous biological rhythmicity. FUNDING:This work was supported by a VENI grant from the Netherlands Organisation for Health Research and Development (ZonMw), an institutional project grant, and by the Dutch Research Council (NWO).
10.1016/j.ebiom.2024.105169
Derivation, external and clinical validation of a deep learning approach for detecting intracranial hypertension.
NPJ digital medicine
Increased intracranial pressure (ICP) ≥15 mmHg is associated with adverse neurological outcomes, but needs invasive intracranial monitoring. Using the publicly available MIMIC-III Waveform Database (2000-2013) from Boston, we developed an artificial intelligence-derived biomarker for elevated ICP (aICP) for adult patients. aICP uses routinely collected extracranial waveform data as input, reducing the need for invasive monitoring. We externally validated aICP with an independent dataset from the Mount Sinai Hospital (2020-2022) in New York City. The AUROC, accuracy, sensitivity, and specificity on the external validation dataset were 0.80 (95% CI, 0.80-0.80), 73.8% (95% CI, 72.0-75.6%), 73.5% (95% CI 72.5-74.5%), and 73.0% (95% CI, 72.0-74.0%), respectively. We also present an exploratory analysis showing aICP predictions are associated with clinical phenotypes. A ten-percentile increment was associated with brain malignancy (OR = 1.68; 95% CI, 1.09-2.60), intracerebral hemorrhage (OR = 1.18; 95% CI, 1.07-1.32), and craniotomy (OR = 1.43; 95% CI, 1.12-1.84; P < 0.05 for all).
10.1038/s41746-024-01227-0
High serum magnesium level is associated with increased mortality in patients with sepsis: an international, multicenter retrospective study.
MedComm
Magnesium imbalances commonly exist in septic patients. However, the association of serum magnesium levels with mortality in septic patients remains uncertain. Herein, we elucidated the association between serum magnesium and all-cause mortality in septic patients from American and Chinese cohorts by analyzing data from 9099 patients in the Medical Information Mart for Intensive Care-IV (MIMIC-IV) database and 1727 patients from a university-affiliated hospital' intensive care unit in China. Patients in both cohorts were categorized into five groups based on serum magnesium quintiles from the MIMIC-IV dataset. Patients with higher serum magnesium levels exhibited an increased risk of 28-day mortality in both cohorts. The restricted cubic spline (RCS) curves revealed a progressively elevated risk of 28-day mortality with increasing serum magnesium in MIMIC-IV cohort, while a J-shaped correlation was observed in institutional cohort. Our findings have validated the association between high serum magnesium and high mortality in sepsis across different races and medical conditions. Serum magnesium levels might be useful in identifying septic patients at higher mortality risk.
10.1002/mco2.713
Association between lactate-to-albumin ratio and 28-days all-cause mortality in patients with acute pancreatitis: A retrospective analysis of the MIMIC-IV database.
Frontiers in immunology
Objective:The Lactate-to-Albumin Ratio (LAR) has been applied as a new predictor in sepsis, heart failure, and acute respiratory failure. However, the role of LAR in predicting all-cause mortality in patients with acute pancreatitis has not been evaluated. Therefore, this study aimed to elucidate the correlation between LAR and 28-d all-cause mortality in patients with Acute Pancreatitis (AP). Methods:This study is a retrospective cohort study with the data from the MIMIC-IV (v1.0) database. We included adult patients with acute pancreatitis who were admitted to the intensive care unit in the study. The primary outcome was to evaluate the ability of LAR to predict death at 28-d of hospital admission in patients with AP. Results:A total of 539 patients with acute pancreatitis were included in this study. They were divided into a survival group (486 patients) and a death group (53 patients) according to whether they survived within 28-d of admission, and the mortality rate of patients within 28-d of admission was 9.8%. LAR was shown to be an independent predictor of all-cause mortality within 28-d of admission in patients with AP by multivariate COX regression analysis (HR, 1.59; 95% CI, 1.23 - 2.05; P < 0.001). the Area Under the Curve (AUC) value for LAR was 74.26% (95% CI: 67.02% - 81.50%), which was higher than that for arterial blood lactate (AUC = 71.25%) and serum albumin (AUC = 65.92%) alone. It was not inferior even when compared to SOFA (AUC = 75.15%). The optimal cutoff value for separating the survival and death groups according to Receiver Operating Characteristic (ROC) was found to be 1.1124. plotting Kaplan-Meier analysis with this cutoff value showed that patients with LAR ≥ 1.1124 had significantly higher all-cause mortality within 28-d of admission than those with LAR < 1.1124 (P < 0.001). The final subgroup analysis showed no significant interaction of LAR with each subgroup (P for interaction: 0.06 - 0.974). Conclusion:LAR can be used as an independent predictor of all-cause mortality in AP patients within 28-d of admission, with superior prognostic performance than arterial blood lactate or serum albumin alone.
10.3389/fimmu.2022.1076121
Machine Learning Predicts Oxaliplatin Benefit in Early Colon Cancer.
Journal of clinical oncology : official journal of the American Society of Clinical Oncology
PURPOSE:A combination of fluorouracil, leucovorin, and oxaliplatin (FOLFOX) is the standard for adjuvant therapy of resected early-stage colon cancer (CC). Oxaliplatin leads to lasting and disabling neurotoxicity. Reserving the regimen for patients who benefit from oxaliplatin would maximize efficacy and minimize unnecessary adverse side effects. METHODS:We trained a new machine learning model, referred to as the colon oxaliplatin signature (COLOXIS) model, for predicting response to oxaliplatin-containing regimens. We examined whether COLOXIS was predictive of oxaliplatin benefits in the CC adjuvant setting among 1,065 patients treated with 5-fluorouracil plus leucovorin (FULV; n = 421) or FULV + oxaliplatin (FOLFOX; n = 644) from NSABP C-07 and C-08 phase III trials. The COLOXIS model dichotomizes patients into COLOXIS+ (oxaliplatin responder) and COLOXIS- (nonresponder) groups. Eight-year recurrence-free survival was used to evaluate oxaliplatin benefits within each of the groups, and the predictive value of the COLOXIS model was assessed using the value associated with the interaction term (int ) between the model prediction and the treatment effect. RESULTS:Among 1,065 patients, 526 were predicted as COLOXIS+ and 539 as COLOXIS-. The COLOXIS+ prediction was associated with prognosis for FULV-treated patients (hazard ratio [HR], 1.52 [95% CI, 1.07 to 2.15]; .017). The model was predictive of oxaliplatin benefits: COLOXIS+ patients benefited from oxaliplatin (HR, 0.65 [95% CI, 0.48 to 0.89]; .0065; int .03), but COLOXIS- patients did not (COLOXIS- HR, 1.08 [95% CI, 0.77 to 1.52]; .65). CONCLUSION:The COLOXIS model is predictive of oxaliplatin benefits in the CC adjuvant setting. The results provide evidence supporting a change in CC adjuvant therapy: reserve oxaliplatin only for COLOXIS+ patients, but further investigation is warranted.
10.1200/JCO.23.01080
Prediction of post-traumatic stress disorder in family members of ICU patients: a machine learning approach.
Intensive care medicine
PURPOSE:Post-traumatic stress disorder (PTSD) can affect family members of patients admitted to the intensive care unit (ICU). Easily accessible patient's and relative's information may help develop accurate risk stratification tools to direct relatives at higher risk of PTSD toward appropriate management. METHODS:PTSD was measured 90 days after ICU discharge using validated instruments (Impact of Event Scale and Impact of Event Scale-Revised) in 2374 family members. Various supervised machine learning approaches were used to predict PTSD in family members and evaluated on an independent held-out test dataset. To better understand variables' contributions to PTSD predicted probability, we used machine learning interpretability methods on the best predictive algorithm. RESULTS:Non-linear ensemble learning tree-based methods showed better predictive performances (Random Forest-area under curve, AUC = 0.73 [0.68-0.77] and XGBoost-AUC = 0.73 [0.69-0.78]) than regularized linear models, kernel-based models, or deep learning models. In the best performing algorithm, most important features that positively contributed to PTSD's predicted probability were all non-modifiable factors, namely, lower patient's age, longer duration of ICU stay, relative's female sex, lower relative's age, relative being a spouse/child, and patient's death in ICU. A sensitivity analysis in bereaved relatives did not alter the algorithm's predictive performance. CONCLUSION:We propose a machine learning-based approach to predict PTSD in relatives of ICU patients at an individual level. In this model, PTSD is mostly influenced by non-modifiable factors.
10.1007/s00134-023-07288-1
Machine learning and deep learning predictive models for long-term prognosis in patients with chronic obstructive pulmonary disease: a systematic review and meta-analysis.
The Lancet. Digital health
BACKGROUND:Machine learning and deep learning models have been increasingly used to predict long-term disease progression in patients with chronic obstructive pulmonary disease (COPD). We aimed to summarise the performance of such prognostic models for COPD, compare their relative performances, and identify key research gaps. METHODS:We conducted a systematic review and meta-analysis to compare the performance of machine learning and deep learning prognostic models and identify pathways for future research. We searched PubMed, Embase, the Cochrane Library, ProQuest, Scopus, and Web of Science from database inception to April 6, 2023, for studies in English using machine learning or deep learning to predict patient outcomes at least 6 months after initial clinical presentation in those with COPD. We included studies comprising human adults aged 18-90 years and allowed for any input modalities. We reported area under the receiver operator characteristic curve (AUC) with 95% CI for predictions of mortality, exacerbation, and decline in forced expiratory volume in 1 s (FEV). We reported the degree of interstudy heterogeneity using Cochran's Q test (significant heterogeneity was defined as p≤0·10 or I>50%). Reporting quality was assessed using the TRIPOD checklist and a risk-of-bias assessment was done using the PROBAST checklist. This study was registered with PROSPERO (CRD42022323052). FINDINGS:We identified 3620 studies in the initial search. 18 studies were eligible, and, of these, 12 used conventional machine learning and six used deep learning models. Seven models analysed exacerbation risk, with only six reporting AUC and 95% CI on internal validation datasets (pooled AUC 0·77 [95% CI 0·69-0·85]) and there was significant heterogeneity (I 97%, p<0·0001). 11 models analysed mortality risk, with only six reporting AUC and 95% CI on internal validation datasets (pooled AUC 0·77 [95% CI 0·74-0·80]) with significant degrees of heterogeneity (I 60%, p=0·027). Two studies assessed decline in lung function and were unable to be pooled. Machine learning and deep learning models did not show significant improvement over pre-existing disease severity scores in predicting exacerbations (p=0·24). Three studies directly compared machine learning models against pre-existing severity scores for predicting mortality and pooled performance did not differ (p=0·57). Of the five studies that performed external validation, performance was worse than or equal to regression models. Incorrect handling of missing data, not reporting model uncertainty, and use of datasets that were too small relative to the number of predictive features included provided the largest risks of bias. INTERPRETATION:There is limited evidence that conventional machine learning and deep learning prognostic models demonstrate superior performance to pre-existing disease severity scores. More rigorous adherence to reporting guidelines would reduce the risk of bias in future studies and aid study reproducibility. FUNDING:None.
10.1016/S2589-7500(23)00177-2
Integration of 3D bioprinting and multi-algorithm machine learning identified glioma susceptibilities and microenvironment characteristics.
Cell discovery
Glioma, with its heterogeneous microenvironments and genetic subtypes, presents substantial challenges for treatment prediction and development. We integrated 3D bioprinting and multi-algorithm machine learning as a novel approach to enhance the assessment and understanding of glioma treatment responses and microenvironment characteristics. The bioprinted patient-derived glioma tissues successfully recapitulated molecular properties and drug responses of native tumors. We then developed GlioML, a machine learning workflow incorporating nine distinct algorithms and a weighted ensemble model that generated robust gene expression-based predictors, each reflecting the diverse action mechanisms of various compounds and drugs. The ensemble model superseded the performance of all individual algorithms across diverse in vitro systems, including sphere cultures, complex 3D bioprinted multicellular models, and 3D patient-derived tissues. By integrating bioprinting, the evaluative scope of the treatment expanded to T cell-related therapy and anti-angiogenesis targeted therapy. We identified promising compounds and drugs for glioma treatment and revealed distinct immunosuppressive or angiogenic myeloid-infiltrated tumor microenvironments. These insights pave the way for enhanced therapeutic development for glioma and potentially for other cancers, highlighting the broad application potential of this integrative and translational approach.
10.1038/s41421-024-00650-7
Machine Learning to Allocate Palliative Care Consultations During Cancer Treatment.
Journal of clinical oncology : official journal of the American Society of Clinical Oncology
PURPOSE:For patients with advanced cancer, early consultations with palliative care (PC) specialists reduce costs, improve quality of life, and prolong survival. However, capacity limitations prevent all patients from receiving PC shortly after diagnosis. We evaluated whether a prognostic machine learning system could promote early PC, given existing capacity. METHODS:Using population-level administrative data in Ontario, Canada, we assembled a cohort of patients with incurable cancer who received palliative-intent systemic therapy between July 1, 2014, and December 30, 2019. We developed a machine learning system that predicted death within 1 year of each treatment using demographics, cancer characteristics, treatments, symptoms, laboratory values, and history of acute care admissions. We trained the system in patients who started treatment before July 1, 2017, and evaluated the potential impact of the system on PC in subsequent patients. RESULTS:Among 560,210 treatments received by 54,628 patients, death occurred within 1 year of 45.2% of treatments. The machine learning system recommended the same number of PC consultations observed with usual care at the 60.0% 1-year risk of death, with a first-alarm positive predictive value of 69.7% and an outcome-level sensitivity of 74.9%. Compared with usual care, system-guided care could increase early PC by 8.5% overall (95% CI, 7.5 to 9.5; < .001) and by 15.3% (95% CI, 13.9 to 16.6; < .001) among patients who live 6 months beyond their first treatment, without requiring more PC consultations in total or substantially increasing PC among patients with a prognosis exceeding 2 years. CONCLUSION:Prognostic machine learning systems could increase early PC despite existing resource constraints. These results demonstrate an urgent need to deploy and evaluate prognostic systems in real-time clinical practice to increase access to early PC.
10.1200/JCO.23.01291
Machine learning models predicts risk of proliferative lupus nephritis.
Frontiers in immunology
Objective:This study aims to develop and validate machine learning models to predict proliferative lupus nephritis (PLN) occurrence, offering a reliable diagnostic alternative when renal biopsy is not feasible or safe. Methods:This study retrospectively analyzed clinical and laboratory data from patients diagnosed with SLE and renal involvement who underwent renal biopsy at West China Hospital of Sichuan University between 2011 and 2021. We randomly assigned 70% of the patients to a training cohort and the remaining 30% to a test cohort. Various machine learning models were constructed on the training cohort, including generalized linear models (e.g., logistic regression, least absolute shrinkage and selection operator, ridge regression, and elastic net), support vector machines (linear and radial basis kernel functions), and decision tree models (e.g., classical decision tree, conditional inference tree, and random forest). Diagnostic performance was evaluated using ROC curves, calibration curves, and DCA for both cohorts. Furthermore, different machine learning models were compared to identify key and shared features, aiming to screen for potential PLN diagnostic markers. Results:Involving 1312 LN patients, with 780 PLN/NPLN cases analyzed. They were randomly divided into a training group (547 cases) and a testing group (233 cases). we developed nine machine learning models in the training group. Seven models demonstrated excellent discriminatory abilities in the testing cohort, random forest model showed the highest discriminatory ability (AUC: 0.880, 95% confidence interval(CI): 0.835-0.926). Logistic regression had the best calibration, while random forest exhibited the greatest clinical net benefit. By comparing features across various models, we confirmed the efficacy of traditional indicators like anti-dsDNA antibodies, complement levels, serum creatinine, and urinary red and white blood cells in predicting and distinguishing PLN. Additionally, we uncovered the potential value of previously controversial or underutilized indicators such as serum chloride, neutrophil percentage, serum cystatin C, hematocrit, urinary pH, blood routine red blood cells, and immunoglobulin M in predicting PLN. Conclusion:This study provides a comprehensive perspective on incorporating a broader range of biomarkers for diagnosing and predicting PLN. Additionally, it offers an ideal non-invasive diagnostic tool for SLE patients unable to undergo renal biopsy.
10.3389/fimmu.2024.1413569
Machine learning applications in stroke medicine: advancements, challenges, and future prospectives.
Neural regeneration research
Stroke is a leading cause of disability and mortality worldwide, necessitating the development of advanced technologies to improve its diagnosis, treatment, and patient outcomes. In recent years, machine learning techniques have emerged as promising tools in stroke medicine, enabling efficient analysis of large-scale datasets and facilitating personalized and precision medicine approaches. This abstract provides a comprehensive overview of machine learning's applications, challenges, and future directions in stroke medicine. Recently introduced machine learning algorithms have been extensively employed in all the fields of stroke medicine. Machine learning models have demonstrated remarkable accuracy in imaging analysis, diagnosing stroke subtypes, risk stratifications, guiding medical treatment, and predicting patient prognosis. Despite the tremendous potential of machine learning in stroke medicine, several challenges must be addressed. These include the need for standardized and interoperable data collection, robust model validation and generalization, and the ethical considerations surrounding privacy and bias. In addition, integrating machine learning models into clinical workflows and establishing regulatory frameworks are critical for ensuring their widespread adoption and impact in routine stroke care. Machine learning promises to revolutionize stroke medicine by enabling precise diagnosis, tailored treatment selection, and improved prognostication. Continued research and collaboration among clinicians, researchers, and technologists are essential for overcoming challenges and realizing the full potential of machine learning in stroke care, ultimately leading to enhanced patient outcomes and quality of life. This review aims to summarize all the current implications of machine learning in stroke diagnosis, treatment, and prognostic evaluation. At the same time, another purpose of this paper is to explore all the future perspectives these techniques can provide in combating this disabling disease.
10.4103/1673-5374.382228
Risk Factors for Perinatal Arterial Ischemic Stroke: A Machine Learning Approach.
Neurology
BACKGROUND AND OBJECTIVES:Perinatal arterial ischemic stroke (PAIS) is a focal vascular brain injury presumed to occur between the fetal period and the first 28 days of life. It is the leading cause of hemiparetic cerebral palsy. Multiple maternal, intrapartum, delivery, and fetal factors have been associated with PAIS, but studies are limited by modest sample sizes and complex interactions between factors. Machine learning approaches use large and complex data sets to enable unbiased identification of clinical predictors but have not yet been applied to PAIS. We combined large PAIS data sets and used machine learning methods to identify clinical PAIS factors and compare this data-driven approach with previously described literature-driven clinical prediction models. METHODS:Common data elements from 3 registries with patients with PAIS, the Alberta Perinatal Stroke Project, Canadian Cerebral Palsy Registry, International Pediatric Stroke Study, and a longitudinal cohort of healthy controls (Alberta Pregnancy Outcomes and Nutrition Study), were used to identify potential predictors of PAIS. Inclusion criteria were term birth and idiopathic PAIS (absence of primary causative medical condition). Data including maternal/pregnancy, intrapartum, and neonatal factors were collected between January 2003 and March 2020. Common data elements were entered into a validated random forest machine learning pipeline to identify the highest predictive features and develop a predictive model. Univariable analyses were completed post hoc to assess the relationship between each predictor and outcome. RESULTS:A machine learning model was developed using data from 2,571 neonates, including 527 cases (20%) and 2,044 controls (80%). With a mean of 21 features selected, the random forest machine learning approach predicted the outcome with approximately 86.5% balanced accuracy. Factors that were selected a priori through literature-driven variable selection that were also identified as most important by the machine learning model were maternal age, recreational substance exposure, tobacco exposure, intrapartum maternal fever, and low Apgar score at 5 minutes. Additional variables identified through machine learning included in utero alcohol exposure, infertility, miscarriage, primigravida, meconium, spontaneous vaginal delivery, neonatal head circumference, and 1-minute Apgar score. Overall, the machine learning model performed better (area under the curve [AUC] 0.93) than the literature-driven model (AUC 0.73). DISCUSSION:Machine learning may be an alternative, unbiased method to identify clinical predictors associated with PAIS. Identification of previously suggested and novel clinical factors requires cautious interpretation but supports the multifactorial nature of PAIS pathophysiology. Our results suggest that identification of neonates at risk of PAIS is possible.
10.1212/WNL.0000000000209393
Machine learning-based radiomics in neurodegenerative and cerebrovascular disease.
MedComm
Cognitive impairments, which can be caused by neurodegenerative and cerebrovascular disease, represent a growing global health crisis with far-reaching implications for individuals, families, healthcare systems, and economies worldwide. Notably, neurodegenerative-induced cognitive impairment often presents a different pattern and severity compared to cerebrovascular-induced cognitive impairment. With the development of computational technology, machine learning techniques have developed rapidly, which offers a powerful tool in radiomic analysis, allowing a more comprehensive model that can handle high-dimensional, multivariate data compared to the traditional approach. Such models allow the prediction of the disease development, as well as accurately classify disease from overlapping symptoms, therefore facilitating clinical decision making. This review will focus on the application of machine learning-based radiomics on cognitive impairment caused by neurogenerative and cerebrovascular disease. Within the neurodegenerative category, this review primarily focuses on Alzheimer's disease, while also covering other conditions such as Parkinson's disease, Lewy body dementia, and Huntington's disease. In the cerebrovascular category, we concentrate on poststroke cognitive impairment, including ischemic and hemorrhagic stroke, with additional attention given to small vessel disease and moyamoya disease. We also review the specific challenges and limitations when applying machine learning radiomics, and provide our suggestion to overcome those limitations towards the end, and discuss what could be done for future clinical use.
10.1002/mco2.778
Machine learning and preoperative risk prediction: the machines are coming.
British journal of anaesthesia
Preoperative risk prediction is an important component of perioperative medicine. Machine learning is a powerful tool that could lead to increasingly complex risk prediction models with improved predictive performance. Careful consideration is required to guide the machine learning approach to ensure appropriate decisions are made with regard to what we are trying to predict, when we are trying to predict it, and what we seek to do with the results.
10.1016/j.bja.2024.07.015
Prediction and Interpretation Microglia Cytotoxicity by Machine Learning.
Journal of chemical information and modeling
Ameliorating microglia-mediated neuroinflammation is a crucial strategy in developing new drugs for neurodegenerative diseases. Plant compounds are an important screening target for the discovery of drugs for the treatment of neurodegenerative diseases. However, due to the spatial complexity of phytochemicals, it becomes particularly important to evaluate the effectiveness of compounds while avoiding the mixing of cytotoxic substances in the early stages of compound screening. Traditional high-throughput screening methods suffer from high cost and low efficiency. A computational model based on machine learning provides a novel avenue for cytotoxicity determination. In this study, a microglia cytotoxicity classifier was developed using a machine learning approach. First, we proposed a data splitting strategy based on the molecule murcko generic scaffold, under this condition, three machine learning approaches were coupled with three kinds of molecular representation methods to construct microglia cytotoxicity classifier, which were then compared and assessed by the predictive accuracy, balanced accuracy, F-score, and Matthews Correlation Coefficient. Then, the recursive feature elimination integrated with support vector machine (RFE-SVC) dimension reduction method was introduced to molecular fingerprints with high dimensions to further improve the model performance. Among all the microglial cytotoxicity classifiers, the SVM coupled with ECFP4 fingerprint after feature selection (ECFP4-RFE-SVM) obtained the most accurate classification for the test set (ACC of 0.99, BA of 0.99, F-score of 0.99, MCC of 0.97). Finally, the Shapley additive explanations (SHAP) method was used in interpreting the microglia cytotoxicity classifier and key substructure smart identified as structural alerts. Experimental results show that ECFP4-RFE-SVM have reliable classification capability for microglia cytotoxicity, and SHAP can not only provide a rational explanation for microglia cytotoxicity predictions, but also offer a guideline for subsequent molecular cytotoxicity modifications.
10.1021/acs.jcim.4c00366
Augmenting genetic algorithms with machine learning for inverse molecular design.
Chemical science
Evolutionary and machine learning methods have been successfully applied to the generation of molecules and materials exhibiting desired properties. The combination of these two paradigms in inverse design tasks can yield powerful methods that explore massive chemical spaces more efficiently, improving the quality of the generated compounds. However, such synergistic approaches are still an incipient area of research and appear underexplored in the literature. This perspective covers different ways of incorporating machine learning approaches into evolutionary learning frameworks, with the overall goal of increasing the optimization efficiency of genetic algorithms. In particular, machine learning surrogate models for faster fitness function evaluation, discriminator models to control population diversity on-the-fly, machine learning based crossover operations, and evolution in latent space are discussed. The further potential of these synergistic approaches in generative tasks is also assessed, outlining promising directions for future developments.
10.1039/d4sc02934h