Sunday 8 July 2018

On the Use of Big Data and AI for Health

Pitfalls of Big Data Analytics

High-precision medicine requires reliable decisions whom to treat best in what way, when and with what dose of what medicine, ideally even before a disease breaks out. This challenge, however, can only be met with large amounts of personal and/or group-specific data, which may be extremely sensitive, as such data may be used against the interest of the patients (e.g. in the interest of profit maximization). Consequently, there are plenty of technical, scientific, ethical and political challenges.

This situation makes it particularly important to protect personal data from misuse by means of cybersecurity, to ensure a professional use of the data, and to implement suitable measures to achieve a maximum level of human dignity (including informational self-determination).

In the past, empirical and experimental analyses have often been suffering from lack of data or small amounts of data. In many areas, including medical studies, this has changed, or is about to change. Big Data is, therefore, promising to overcome some common limitations of previous medical treatments, which were often not personalized, imprecise, ineffective and connected with many side effects.

In the early days of Big Data, people expected to have found a general purpose tool, something like a holy grail. It was believed that, if one had just enough data, data quantity would turn into data quality; the truth would basically reveal itself. This idea is probably best expressed by a quote by Chris Anderson, who – back in 2008 – predicted “the end of theory” and wrote in the Wired Magazine: “The data deluge makes the scientific method obsolete.”

Along these lines it was claimed that it would now be possible to predict, or at least to “nowcast” the flu from Google searches, as reflected by the platform Google Flu Trends. The company 23andMe offered to identify ethnic origin, phenotype, and likely diseases. Angelina Jolie said “knowledge is power” and had her breasts removed, because her genetic test identified a high chance she would get breast cancer.

Later on, Google Flu Trends was closed down, doctors warned that Angelina Jolie should not be taken as example, and 23andMe’s genetic test was temporally taken off the market by the health authority. How could this happen? Google searches were not anymore a reliable measurement instrument, as Google had started to manipulate people with suggestions (both through the autocomplete function and by means of personalized advertisement). Regarding attempts to predict diseases by means of genetic data, it was discovered that some people were doing very well, even though they were predicted to be very ill. Moreover, predictions were sometimes quite sensitive to adding or subtracting data points, to the choice of the Big Data algorithm, or (in some cases) even to the hardware used for the analysis.

Generally, it was thought that – the more data one would have the more accurate the implications of data analyses would be. However, the analyses often took correlations for causation, and they were not checking for statistical significance – in many cases, it was not even clear what the appropriate null hypothesis was. So, in many cases, Big Data analytics was initially not compatible with established statistical and medical standards.

In fact, the more data one has, the higher the probability to find patterns in the data just by chance. These patterns will often be not meaningful or significant. Spurious correlations are a well-known example for this problem. These are correlations that do not reflect a causal relationship, or where a third factor causes two effects to correlate, where neither effect influences the other. In such cases, increasing or decreasing the measured variables would not have the expected effect. It could even be counterproductive. Careful causality analysis (by concepts such as Granger causality) are, therefore, absolutely required.

Another problem concerns undesirable discrimination. Suppose a health insurance wants to incentivize certain kinds of “healthy” diets – by reducing tariffs for people who eat more salad and less meat, for example. As a side effect, it would then be likely that men will pay different tariffs from women, and Christians, Jews, and Muslims would on average pay different tariffs as well, just because of their different religious and cultural traditions. Such effects are considered discriminatory and need to be avoided. If one, furthermore, wants to avoid discrimination based on age, sexual orientation and other features that should not be discriminated against, Big Data analytics becomes a quite sophisticated challenge.

Last but not least, even Big Data analytics will produce errors of first kind and of second kind, i.e. false alarms and alarms that don’t go off. This is a problem for many medical tests. Say, a medical test costs x and a correct diagnosis creates a benefit of y, while a wrong one will cause a damage of z. Moreover, assume that that the test is correct with probability p and incorrect with probability (1-p). Then, the overall utility of the test is u = – x + p*y – (1-p)*z, which might be neutral or even negative, depending on the impact of wrong diagnoses. For example, false negatives are an issue for many kinds of cancer, and it is therefore sometimes advised, not to test the entire population.

In conclusion, the scientific method is absolutely indispensable to make sense of Big Data, i.e. to refine raw data into reliable information and useful knowledge. Hence, Big Data is not the end of theory, but rather the beginning.

A good example to illustrate this is the example of flu prediction. When the spatio-temporal spreading of the flu is studied, one will often find a wide scattering of the data and a low predictive power. This is related to the fact that the spreading of the flu is related to air travel. However, it is possible to use data of the passenger volumes of air travel to define an effective distance between cities, where cities with high mutual passenger flows are located next to each other. In this effective distance representation, the spreading pattern becomes circular and predictable. This approach makes it possible to identify the likely city in which a new disease emerged and to forecast the likely order in which cities will be suffering from the flu. Hence, it is possible to take proactive measures to fight the disease more effectively.

Pitfalls of Machine Learning and Artificial Intelligence

With the rise of machine learning methods, new hopes emerged that the previously mentioned problems could be overcome with Artificial Intelligence (AI). The expectation was that, AI systems would sooner or later become superintelligent and capable of performing any task better than humans, at least any specialized task.

In fact, AI systems are now capable of performing many diagnoses more reliably than doctors, e.g. diagnoses of certain kinds of cancer. Such applications can certainly be of tremendous use.

However, AI systems will make errors, too, just perhaps with lower frequency. So, decisions or suggestions of AI systems must be critically questioned, particularly when a decision may have large-scale impact, i.e. when a single mistake can potentially create large damage. This is necessary also because of a serious weakness of most of today’s AI systems: they do not explain how they come to their conclusions. For example, they do not tell us what is the likelihood that the suggestion is based on a spurious correlation. In fact, if AI systems turn correlations into laws (as cybernetic control systems or autonomous systems may do), this could eliminate important freedoms of decision-making.

Last but not least, it has been found that not only humans, but also AI systems can be manipulated. Moreover, intelligent machines are not necessarily objective and fair: they may discriminate people. For example, it has been shown that people of color and women are potential victims of such discrimination, in part because AI systems are typically trained with biased, historical data. So, machine bias is a frequent, undesired side effect and it is a serious risk of machine learning, which must be tested for and properly counter-acted.