Global Task Force on Radiotherapy for Cancer Control.
Failure to adhere to protocol specified radiation therapy guidelines was associated with decreased survival in RTOG 9704-a phase III trial of adjuvant chemotherapy and chemoradiotherapy for patients with resected adenocarcinoma of the pancreas. Critical impact of radiotherapy protocol compliance and quality in the treatment of advanced head and neck cancer: results from TROG 02.02. Noninferiority study of automated knowledge-based planning versus human-driven optimization across multiple disease sites.
Do as AI say: susceptibility in deployment of clinical decision-aids. Causability and explainability of artificial intelligence in medicine. Holzinger, A., Langs, G., Denk, H., Zatloukal, K. The practical implementation of artificial intelligence technologies in medicine. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Addressing bias in artificial intelligence in health care. Artificial intelligence, bias and clinical safety. Randomized clinical trials of artificial intelligence. The proliferation of reports on clinical scoring systems. System for high-intensity evaluation during radiation therapy (SHIELD-RT): a prospective randomized study of machine learning-directed clinical evaluations during radiation and chemoradiation. Do no harm: a roadmap for responsible machine learning for health care.
High-performance medicine: the convergence of human and artificial intelligence. Insulin dose optimization using an automated artificial intelligence-based decision support system in youths with type 1 diabetes. Effect of a machine learning-derived early warning system for intraoperative hypotension vs standard care on depth and duration of intraoperative hypotension during elective noncardiac surgery: the HYPE randomized clinical trial. Retrospective validation and clinical implementation of automated contouring of organs at risk in the head and neck: a step toward automated radiation treatment planning for low- and middle-income countries. Near real-time intraoperative brain tumor diagnosis using stimulated Raman histology and deep neural networks. Early prediction of circulatory failure in the intensive care unit using machine learning. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network.
Dermatologist-level classification of skin cancer with deep neural networks. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography.
International evaluation of an AI system for breast cancer screening. Axes of a revolution: challenges and promises of big data in healthcare. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis.
These findings highlight that retrospective or simulated evaluation of ML methods, even under expert blinded review, may not be representative of algorithm acceptance in a real-world clinical setting when patient care is at stake. While ML RT plan acceptability remained stable between the simulation and deployment phases (92 versus 86%), the number of ML RT plans selected for treatment was significantly reduced (83 versus 61%, respectively). RT planning using ML reduced the median time required for the entire RT planning process by 60.1% (118 to 47 h). Overall, 89% of ML-generated RT plans were considered clinically acceptable and 72% were selected over human-generated RT plans in head-to-head comparisons. Consistently throughout the study phases, treating physicians assessed ML- and human-generated RT treatment plans in a blinded manner following a priori defined standardized criteria and peer review processes, with the selected RT plan in the prospective phase delivered for patient treatment. ML- and human-generated RT treatment plans were directly compared in a retrospective simulation with retesting ( n = 50) and a prospective clinical deployment ( n = 50) phase.
We prospectively deployed and evaluated a random forest algorithm for therapeutic curative-intent radiation therapy (RT) treatment planning for prostate cancer in a blinded, head-to-head study with full integration into the clinical workflow. Machine learning (ML) holds great promise for impacting healthcare delivery however, to date most methods are tested in ‘simulated’ environments that cannot recapitulate factors influencing real-world clinical practice.