Integrating Medical AI with Human Workflow

Policy recommendations for the FDA

Keeping Humans in the Loop

Research has shown that humans and machines bring complementary skill sets, and often achieve better results when working together. We therefore need to focus on the interaction between humans and machines.

In terms of capabilities, machines are better at storing, computing, and analyzing data. With AI, machines look at large amounts of data to recognize patterns and determine the significance of different indicators to predict future outcomes. Within specific AI, the field most developed in the medical field, a recommendation is given based on a specific objective function, such as accurately detecting whether or not a spot on an X-ray is benign or cancerous. Erik Brynjolfsson and Tom Mitchell outline the eight criteria the most fitting tasks for specific AI: 1) learning a function that maps well-defined inputs to well-defined outputs 2) Large (digital) data sets exist or can be created containing input-output pairs 3) The task provides clear feedback with clearly definable goals and metrics 4) No long chains of logic or reasoning that depend on diverse background knowledge or common sense 5) No need for detailed explanation of how the decision was made 6) A tolerance for error and no need for provably correct or optimal solutions 7) The phenomenon or function being learned should not change rapidly over time 8) No specialized dexterity, physical skills, or mobility required. This explains why AI is being discussed in certain clinical areas, such as diagnosing spots on medical images, and not others, such as complex heart surgery.

Humans, however, still have distinct advantages against even the most sophisticated AI systems. Firstly, humans have responsibility to shape the questions that AI seek to answer. This is done in two primary ways: setting the algorithm’s objective functions, model type, and its error preferences. The objective function is an attempt to express a business goal in mathematical terms for use in decision analysis. Deciding which objective function to set requires measured consideration of a healthcare organization’s competing priorities. For instance, if using AI to set an operating room schedule, is the AI seeking to maximize efficiency, health outcomes, or profits? Right now, AI cannot set their own objective functions, so it relies on humans. A second major decisions humans must make in AI design involves the model type used. For instance, in diagnostics, results could be linear or logarithmic. Linear results would assign someone’s changes of having a disease in percentage terms. For instance, AI could look at a breast PET scan and determine that a woman has a 77.9% chance of having breast cancer. Alternatively, logarithmic results would produce a binary results: a yes or no indication that the patient should complete a biopsy, versus a percentage likelihood that the spot is cancerous. The last component of human-influenced algorithm decision most relevant to healthcare is setting error preferences. When testing AI, there are three main types of accuracy metrics: overall accuracy, sensitivity, and specificity. Overall accuracy is true negatives and positives divided by total results. Sensitivity measures false negatives by dividing true positives by true positives and false negatives. Specificity measures false positives by dividing true negatives by true negatives and false positives. The degree to which errors, false negatives versus false positives, are produced can be controlled in algorithmic design by, for example, increasing or decreasing the threshold for binary categories. Within healthcare, prioritizing which errors are preferred is extremely challenging and should be done in consultation with clinicians. For instance, a false negative could be catastrophic for a condition that spreads quickly with devastating consequences and has a relatively simple treatment (e.g., HIV). In contrast, a false positive for a condition that is addressed through a risky surgery could put unnecessary lives at risk, not to mention the unnecessary psychological trauma and economic costs it would cause patients.

The human roles above relate to humans designing the AI systems, but what about the comparative advantages of humans using the AI system? Humans are much better at contextualizing decisions using outside factors. The need to contextualize decisions produced by AI differs significantly by profession. Backoffice professionals may see that the AI recommends scheduling operations on a holiday that surgeons will want to protect, such a their child’s birthday or Valentine’s Day, factors that may not be programmed into the system. Nurses may be told to check on patients at hours where their loved ones are visiting and would therefore require more privacy. Doctors may be told to test for one condition based on biometric factors, but the machine may fail to see physical changes that the doctor can, such as yellowing nails or a green hue to the skin.

Humans are also empathetic in a way machines are not. Although there is progress on reading and responding to facial expressions and vocal cues, we are a long way from a machine being able to have the incredibly difficult conversations with patients demanded of doctors every day. Some depict a world in which machines may actually be better than humans at reading emotion. Humans rely on sight and sound, whereas machines could supplant sight and sound with reading biometric data linked with emotion, such as blood pressure and brain activity. Machines developing emotional intelligence is being trialed in call centers with emotional customers, but, as mentioned, is far from being trialed in medical settings whereby life-altering information is regularly shared.

Policy questions: Should machines support or supplant humans in clinical settings? What are the malpractice implications?

Fundamentally, human-machine interactions in healthcare can take one of two paths: Humans can assist machines, or machines can assist humans. In other words, machines can supplant or support humans. In the former, humans work to train and sustain the accuracy of the predictive models, as well as explain the results produced by these models. This would involve increasing the size of datasets, where necessary, labelling those data sets, and comparing the results of models developed on training data to test data.

There is a question as to who within the healthcare ecosystem is responsible for sharing information the data. Is it the responsibility of the radiologist, oncologist, lab, or hospital back-office? Who is it shared with — an internal data science team at the provider or a third-party developer? How does sharing this data comply with HIPPA and other relevant regulations?

This model whereby machines supplant doctors has significant implications for liability. If machines are producing better health outcomes, on average, would be it be malpractice to override a machine’s analysis on an individual patient? In some areas, health systems are already forced to confront this issue. Samuel Nessbaum, the former Chief Medical Officer for Anthem/ Wellpoint, claimed that, in tests, Watson’s successful diagnosis rate for lung cancer was 90 percent, compared to 50 percent of human doctors. Using quality of care standards, the logic would require a shift to a machine learning led diagnostic regime. However, this could have unintended consequences. According to Froomkin et al, “If we reach the point where the bulk of clinical outcomes collected in databases are ML-generated diagnoses, this may result in future decisions that are not easily audited or understood by human doctors. Given the well-documented fact that treatment strategies are often not as effective when deployed in clinical practice compared to preliminary evaluation, the lack of transparency introduced by the ML algorithms could lead to a decrease in quality of care.”

Alternatively, machines can support humans. According to a Harvard Business Review article, “Machines can amplify your cognitive strengths; interact with customers and employees to free us for higher-level tasks; and embody human skills to extend our physical capabilities.” In this instance, machines would provide recommendations to clinicians, who have the freedom to override the machine’s recommendation with no malpractice implication. The objective of this AI would not necessarily lead to directly better outcomes, but rather increase the speed of decision making for clinicians and indirectly affect quality of care outcomes.

Policy recommendation: Our recommendation is to not require the dedicated use of machine-learning diagnostic or prescription tools at this time, even where trials have indicated better results. The field of applied machine learning is still nascent, so charing malpractice to doctors who do not strictly follow machine learning diagnostic or prescriptive recommendations is premature and could decrease the quality of care or patients long-term. Furthermore, we recommend that the FDA require disclosures targeted licensed professionals on how the technology was designed and should be used, such as medical device and pharmaceutical requires are required to do.

Effects of AI on healthcare job market

Policy question: How will AI naturally affect the job market in healthcare? Should policy makers change incentives to alter these outcomes?

Healthcare, as of January 2018, is the largest employer in the United States. Over 16 million Americans work in healthcare, and seven million work in hospitals. Of the ten fastest growing careers in the next decade, five are in healthcare, according to the Bureau of Labor Statistics. So it is incredibly important to consider AI’s effects on the healthcare job market and whether or not it will lead to widespread job displacement, as it is expected to do to the three million professional drivers in the U.S. in the next decade. However, when doing the analysis of which full professionals or, more accurately, tasks within professionals, it is important to remember that healthcare does not always abide by strict free market principles. So while the supply of professionals relative to the demand may increase, this will not necessarily lead to a decline in wages that you would see in other industries.

There are two different types of applications within healthcare whereby AI tools could integrate into human workflow: administrative and clinical.


Healthcare professionals spend between 15 and 70 percent of their working time performing administrative tasks. An estimated 14 percent of wasted health care spending — $91 billion — is the result of inefficient administration.

These administrative tasks are both related to the operations of the hospital and patient care. Potential applications of AI to improve systems efficiency and productivity of provider operations include optimisation and prediction of patient flow through the hospital; improvements to the current efficacy and efficiency of procurement; and enhancement of workforce logistics and service planning. While the previous era of automation displaced routine, repetitive, and structured tasks, the new wave of AI will displace tasks focused on prediction, or tasks for which humans are unable to articulate a strategy but where statistics in data reveal regularities that entail a strategy.

Administrative tasks related to patient care relate to care teams improving communication. Often, patients are served by a team of specialists working across disciplines. Conversations are often had among specialists or with patients that do not get communicated to the broader care team. Communication across care providers is currently so old-school, many use fax. Using speech-to-text technology, AI could convert unstructured physical examination notes and clinical laboratory results into structured electronic medical records could be far more comprehensive and collaboration among specialists could be improved. Commercial products, such as IBM’s Watson for Patient Record Analytics, are already using natural language processing and machine learning to provide intelligence insights from a longitudinal patient record for patient care.

In addition, AI systems will, in the near future, will be capable of other administrative functions related to patient care, such as ordering lab tests or proactively encouraging a patient to schedule a visits if certain indicators are met.

How will these new tools interact with human workflow? On the back office side, it is likely that several positions focused on operations management will be consolidated. On the clinical side, the nurses and physicians assistants typically tasked with taking notes, updating electronic medical records, ordering lab tests, and other administrative functions, will still be needed within the healthcare system. Instead of, however, performing these more routine tasks, their interpersonal responsibilities are likely to increase. The clinical staff supporting physicians and specialists will likely spend less of their time on administration and more of their time caring for and counseling patients.


AI can also assist in clinical outcomes in healthcare, specifically around aiding doctors in diagnosis and setting a path for prescriptive action. Deep learning technologies have already shown expert-level performance in medical image analysis, in domains such as screening for breast cancer, skin cancer or eye diseases. Whilst radiology, pathology and ophthalmology are frequently cited as the disciplines most likely to be influenced by AI tools, the impact will inevitably affect all specialties and every clinician from doctors to nurses, pharmacists to paramedics and beyond.

In 2017, successful use of deep neural networks was reported for the analysis of skin cancer images with greater accuracy than a dermatologist and the diagnosis of diabetic retinopathy from retinal images. AI diagnostic tools perform best when they can learn from substantial amounts of data. Specifically, AI tools are most relevant for diagnosis imaging, genetic testing, and electrodiagnosis.

Similarly, prescriptive action is typically guided by clinical pathways, or a structured multidisciplinary care plan, which guides care management for a well defined group of patients for a well defined period of time. AI can analyze care plans across several patients to streamline clinical pathways for different conditions and patient characteristics. This plan can also be optimized for specific objectives, such as reducing the likelihood of possible hospital re-admittance or complicating health factors.

An example from the National Health Service in the UK is detailed below:

“The number of optical coherence tomography (OCT) scans in the UK needing analysis is set to rise dramatically and scans are increasingly being performed by community optometrists. Deep learning algorithms have been trained to identify pathology automatically from OCT scans, enabling robust early detection and triage of sight-threatening diabetic retinopathy and age-related macular degeneration. The use of this technology in clinical practice would greatly enable both ophthalmologists and optometrists, streamlining their capacity and capability to treat the most at-risk patients, and allowing them to enhance their own diagnostic skills through learning from the technology.”

Unlike in administrative functions where AI could replace back office staff, there appears to be a consensus in the medical and technology communities that AI will augment, rather than replace, doctors. This is in large part to AI’s lack of general intelligence or ability to empathize. While the discussion around AI “replacing” doctors is frequently had at the job level, it is more accurate to have the conversation about which specific tasks AI will automate.

In Prediction Machines, the authors lay out, within medical imaging, five clear roles for humans, even when leveraging cutting-edge AI: choosing the image, using real-time images in medical procedures, interpreting the machine output, training machines on new technologies, and employment judgement that may lead to overriding the prediction machine’s recommendation, perhaps based on information unavailable to the machine. The automation of interpreting a machine’s image may lead to a decrease in diagnostic radiologists, who specialize in objective identification, but may in fact increase demand for interventional radiologists, who use real-time images to aid medical procedures and frequently relay medical imaging results to primary care doctors. Within the clinical space, the jobs in greatest jeopardy are not those directly related to the fields were AI is supporting human decision making, but rather fields that exist to mitigate the risk of a human being wrong. For instance, within medical imaging, if a radiologist is unsure of a patient’s results, the patient is often sent to undergo a biopsy to come to a conclusion on whether or not a tumor is malignant. In the case of the H. Plyori detection research going on at Stanford’s AI Lab, as the AI diagnostic tools gets more accurate, the need for supplementary lab tests will decrease. So the jobs at risks are less likely to be radiologists, and more likely to be pathologists.

The implication of certain tasks, specifically those related to prediction, being automated is that the same specialist can see a higher number of patients. For instance, a primary care provider may be able to see 8,000, rather than the typical 1,500, patients per year. As a consequence, either physicians who deal with complicated cases serve a greater number of hospital locations or the number of provider options for specialized care decreases. Either way, the healthcare system as we currently know it contorts to allow specialists to deal with a larger patient base, as the average amount of time required per patient decreases as a result of AI tools.

New employment opportunities:

In terms of the change of employment structures of providers, there will likely be a new function dedicated to data science or informatics in large provider networks. For instance, in 2016, the NHS appointed the first Chief Clinical Information Officer (CCIO), who focus on patient outcomes and healthcare worker productivity to continuously improve patient care through implementing and monitoring the efficacy of technology.

Policy recommendation: Because AI will, within the highly specialized clinical space, displace only tasks and not full professions, we recommend that there is no action needed at this time. Professions that will be displaced, namely administrative roles or professions linked to risk mitigation technologies, can either be retrained to new functions outside of computation and prediction, the two functions automated by IT and AI, but this is outside the purview of the FDA.

read original article at——artificial_intelligence-5