by Dirk Helbing
This is third in a
series of blog posts that form chapters of my forthcoming book Digital
Society. Last week's chapter was titled:
COMPLEXITY TIME BOMB: When systems get out of control.
Data sets bigger than the largest library
The idea that we could solve the
problems in the world, if we just had enough data about them, is intriguing. In
fact, we are now entering an era of "Big Data" – masses of
information, mostly in digital form, about all aspects of our lives,
institutions and cultures. It will probably not be long before each newborn
baby will have its genome sequenced at birth. Every purchase we make on the Internet
releases data about our location, preferences, and finances that will be stored
somewhere and quite possibly used without our consent. Cell phones disclose
where we are, and private messages and conversations are not really private at
all. Books are being digitized beyond the advent of printing, and are available
in immense, searchable databases of words that are now being mined in
"culturomics" studies that put history, society, art and cultural
trends under the lens. Aggregated data can be used to reveal unexpected facts,
such as flu epidemics being inferred from Google searches.
This avalanche of data is ever increasing: with the introduction of technologies such as Google Glass, people will have the option of documenting and archiving almost every aspect of their lives. Big Data such as credit-card transactions, communication and mobility data, public news, Google Earth imagery, comments and blogs, are creating an increasingly accurate digital picture of our physical and social world, including all its social and economic activities.
"Big Data" will change our world. The term, coined more than 15 years ago, means data sets so big that one can no longer cope with them with standard computational methods. To benefit from Big Data, we must learn to "drill" and "refine" data, i.e. to transform them into useful information and knowledge. The global data volume doubles every 12 months. Therefore, each year we produce as much data as in all previous years together.
These tremendous amounts of data relate to four important technological innovations: the Internet, which enables our global communication, the World Wide Web (WWW), a network of globally accessible websites that evolved after the invention of hypertext protocol (HTTP), the emergence of Social Media such as Facebook, Google+, Whatsup, or Twitter, which have created social communication networks, and the emergence of the "Internet of Things" (IoT), which allows sensors, smartphones, gadgets, and machines ("things") to connect to the Internet. Note that there are already more things connected to the Internet than humans.
Meanwhile, the data sets collected by companies such as ebay, Walmart or Facebook, reach the size of petabytes (1 million billion bytes) – one hundred times the information content of the largest library in the world: the U.S. Library of Congress. The mining of Big Data opens up entirely new possibilities for the optimization of processes, the identification of interdependencies, and the support of decisions. However, Big Data also comes with new challenges, which are often characterized by four criteria: volume (the file sizes and number of records are huge), velocity (the data evaluation has often to be done in real-time), variety (the data are often very heterogeneous and unstructured), and veracity (the data are probably incomplete, not representative, and contain errors).
Gold rush for the 21st century's oil
When the social media portal WhatsApp with its 450 million users was recently sold to Facebook for $19 billion – almost half a billion dollars was made per employee. There’s no doubt that Big Data create tremendous business opportunities – not just because of its value for, say, marketing, but because the information itself is becoming monetarized.
Technology gurus preach that Big Data is becoming the new oil of the 21st century: a commodity that can be tapped for profit. With the virtual currency BitCoin temporarily becoming more valuable than gold, one can even literally say that data can be turned into value to an extent we only knew from fairy tales. Even though many sets of Big Data are proprietary, the consultancy company McKinsey recently estimated the potential value of Open Data alone to be 3 to 5 trillion dollars per year. If the worth of this publicly available information were to be evenly apportioned among the public itself, it would bring $500 to each person in the world.
Big Data applications are now spreading very rapidly. They enable personalized services and products, open up entirely new possibilities to optimize production and distribution processes or services, allow us to run "smart cities," and reveal unexpected interconnections between our activities. Big Data also hold great potential for fostering evidence-based decision-making, particularly in big business and politics. Where is all of this leading us?
In this post, I explain what Big Data can and cannot do. In particular, I will show that it will never be enough on its own to avoid crises and catastrophes, or to solve all societal problems. Indeed, the suggestion sometimes heard – that Big Data is the key to the future – can be misleading in pretty dangerous ways. Most obviously, it could precipitate a descent into an authoritarian surveillance state where there is very little personal liberty or autonomy, and where no one can have any more secrets. But even if that were not to happen, Big Data could create a false sense that we can control our own destiny, if only we have enough data. Information is potentially useful, but it can only release its potential if it is coupled to a sound understanding of how complex social systems work.
Imagine you are the president of a country, intending to ensure the welfare of all its people. What would you do? You might well wish to prevent wars and financial crises, economic recessions, crime, terror, and the spread of diseases. You may want people to be rich, happy, and healthy. You would like to avoid unhealthy drug consumption, corruption, and perhaps traffic jams as well. You would like to ensure a safe, reliable supply of food, water, and energy, and to keep the environment in good shape. In sum, you would like to create a prosperous, sustainable and resilient society.
What would it take to achieve all this? You would certainly have to take the right decisions and avoid those that would have harmful, unintended side effects. So you should know about alternatives for impending decisions, along with their opportunities and risks. For your country to thrive, you would have to avoid ideological, instinctive or traditional decisions in favor of evidence-based decisions. To have the evidence needed to inform this decision-making, you would need a lot of data about all quantifiable aspects of society, and excellent data analysts to interpret it. You might well decide to collect all the data you can get, just in case it might turn out to be useful one day to counter threats and crises, or to exploit opportunities that might arise.
The potential of Big Data spans all
areas of social activity: from natural language processing to financial asset
management, or to a smart management of cities that better balances energy
consumption and production. It could enable better protection of our
environment, risk detection and reduction, and the discovery of opportunities that
would otherwise be missed. And it could make it possible to tailor medicine to
patients, thereby increasing drug effectiveness, accelerating drug discovery and
reducing side effects.
Big Data applications are now spreading very rapidly. They enable personalized services and products, open up entirely new possibilities to optimize production and distribution processes or services, allow us to run "smart cities," and reveal unexpected interconnections between our activities. Big Data also hold great potential for fostering evidence-based decision-making, particularly in big business and politics. Where is all of this leading us?
In this post, I explain what Big Data can and cannot do. In particular, I will show that it will never be enough on its own to avoid crises and catastrophes, or to solve all societal problems. Indeed, the suggestion sometimes heard – that Big Data is the key to the future – can be misleading in pretty dangerous ways. Most obviously, it could precipitate a descent into an authoritarian surveillance state where there is very little personal liberty or autonomy, and where no one can have any more secrets. But even if that were not to happen, Big Data could create a false sense that we can control our own destiny, if only we have enough data. Information is potentially useful, but it can only release its potential if it is coupled to a sound understanding of how complex social systems work.
Big Data fueling super-governments
The development of human
civilization has depended on the establishment of mechanisms that promote
cooperation and social order. One of these is based on the idea that everything
we do is seen and judged by God. Bad deeds will be punished, while good ones
will be rewarded. The age of information has inspired the dream that we might
be able to know and control everything ourselves: to acquire God-like
omniscience and omnipotence. There are now hopes and fears that such power lies
within the reach of huge IT companies such as Google or Facebook and
clandestine secret services such as the CIA or NSA. CIA Chief Technology
Officer Ira "Gus" Hunt[2] has explained how easy it
is for such institutions to gather a great deal of information about each of us:
"You're already a walking sensor platform… You are aware of the fact that somebody can know where you are at all times because you carry a mobile device, even if that mobile device is turned off. You know this, I hope? Yes? Well, you should… Since you can't connect dots you don't have, it drives us into a mode of, we fundamentally try to collect everything and hang on to it forever… It is really very nearly within our grasp to be able to compute on all human generated information."
Could such a massive
data-collection process be good for the world, helping to eliminate terrorism
and crime? Or should we fear that the use of this information will undermine
human rights and the basis of democratic societies and free economies? In this
chapter, I explore the possibilities and limits of such an approach, for better
or worse.
Imagine you are the president of a country, intending to ensure the welfare of all its people. What would you do? You might well wish to prevent wars and financial crises, economic recessions, crime, terror, and the spread of diseases. You may want people to be rich, happy, and healthy. You would like to avoid unhealthy drug consumption, corruption, and perhaps traffic jams as well. You would like to ensure a safe, reliable supply of food, water, and energy, and to keep the environment in good shape. In sum, you would like to create a prosperous, sustainable and resilient society.
What would it take to achieve all this? You would certainly have to take the right decisions and avoid those that would have harmful, unintended side effects. So you should know about alternatives for impending decisions, along with their opportunities and risks. For your country to thrive, you would have to avoid ideological, instinctive or traditional decisions in favor of evidence-based decisions. To have the evidence needed to inform this decision-making, you would need a lot of data about all quantifiable aspects of society, and excellent data analysts to interpret it. You might well decide to collect all the data you can get, just in case it might turn out to be useful one day to counter threats and crises, or to exploit opportunities that might arise.
Previously, rulers
and governments were not in this position: they generally lacked the quality or
quantity of data needed to take well-informed decisions. But that is now
changing. Over the past several decades, the processing power of computers has
exploded, roughly doubling every 18 months. The capacity for data storage is
growing even faster. The amount of data is doubling every year. With the now
emerging "Internet of Things," cell phones, computers and factories
will be connected to the most mundane devices – coffee machines, fridges, shoes
and clothes – creating an overwhelming stream of information that feeds an
ocean of "Big Data."
Humans governed by computers?
The more that data is
generated, stored and interpreted, the easier is it to find out about each
individual’s behavior. Everyone's computer and everyone's device-encoded behavior
(such as the record of our movements produced by the cell phones we carry) leaves
a unique fingerprint, such that it is possible to know our interests, our
thinking, our passions, and our feelings. Some companies analyze "consumer
genes" to offer personalized products and services. They have already collected
up to 3,000 personal data from almost a billion people in the world: names,
contact data, incomes, consumer habits, health information, and more. This is
pretty much everyone with a certain level of income and Internet connectivity.
Would it be beneficial
if a well-intentioned government had access to all this data? It could help
politicians and administrations to take better informed decisions: to reduce
terrorism and crime, say, and to use energy more efficiently, protect our
environment, improve traffic flows, avoid financial meltdowns, mitigate
recessions, enhance our health system and education, and provide better fitting
services to citizens.
Moreover, could a
government use the information not only to understand but also to predict our
behavior, and map out the course of our society? Could it optimize our social
systems and take the best decisions for everyone?
In the past, we have
used supercomputers for almost everything except understanding our economy,
society, and politics. Every new car or airplane is designed, simulated, and
tested on a computer. Increasingly, so are new drugs. Thus, why shouldn't we use
computers to understand and guide our economy and society too? In fact, we are
slowly moving towards that very situation. As a minor (yet revealing) example,
since their early days computers have been used for traffic control. Today’s economic
production and the management of supply chains would not be conceivable without
computer control as well, and large airplanes are now controlled by a majority
decision among several computers. Computers can already beat the best chess
players, and about 70 percent of financial transactions are executed by trading
computers in the meantime. IBM's Watson computer has started to take care of
some customer hotlines, and computer-driven Google cars will soon move around
without a driver, perhaps picking up the goods we ordered on the Web without us
being present. In all these cases, computers already do a better job than
humans. Why shouldn't they eventually make better policemen, administrators,
lawyers, and politicians?
It no longer seems
unreasonable, then, to imagine a gigantic computer program that could simulate
the actions and interactions of all the humans in the world, perhaps even equipping
these billions of agents with cognitive abilities and intelligence. If we fed
these agents with our own personal data, would they behave as we do? In other
words, would it be possible to create a virtual mirror world? And would machine
learning eventually be able to build the computer agents so similar to us that
they would take decisions undistinguishable from ours? Attempts to construct or
at least envisage such a scheme are already underway. If they were realized,
would they represent a kind of Crystal Ball with which we could predict the
future of society?
The prospect might
sound unnerving to some, but in principle the potential benefits aren’t hard to
see. There are many huge problems that such a predictive capability might help
to solve. The
financial crisis has created global losses of at least 20 trillion US dollars.
Crime and corruption consume about 2-5% of the gross domestic product (GDP) of
all nations on earth – about 2 trillion US dollars each year. The lost output
of the US economy as a result of the 9/11 terror attacks is estimated to be of
the order of 90 billion dollars. A major influenza pandemic infecting 1% of the
world’s population would cause losses of 1-2 trillion dollars per year.
Cybercrime costs Europe alone 750 billion Euros a year. The negative economic impact
of traffic congestion amounts to 7-8 billion British Pounds in the United
Kingdom alone.
If a computer
simulation of the entire global socio-economic system could produce just a 1
percent improvement in dealing with these problems, the benefits to society
would be immense. And in fact, if experiences with managing smaller complex social
systems this way are any guide, an improvement of 10-30 percent seems
conceivable. Overall this would amount to savings of more than 1 trillion
dollars annually. Even if we had to invest billions in creating such a system, the
benefits could exceed the investments hundred-fold. Even if the success rates
were significantly smaller, this would represent a substantial gain. It would
be hard to see how any responsible politician could decline to support such an
investment.
But would such a
system work as one might hope? Is Big Data all a government needs to get our
world under control?
Crystal Ball and Magic Wand
Recent studies using
smartphone data and GPS traces suggest that more than 90 percent of the
mobility of people – where they will be at a certain time – can be forecast,
because of its repetitive nature. If other aspects of our behavior show the
same degree of predictability, it’s not hard to imagine that the trajectory of
society can indeed be mapped out in advance, with all that this entails for
successful social planning. While some people might not like this prospect at
all, many would perhaps appreciate a predictable life.
How far does this
idea extend? If we have enough data about every aspect of life, could we become
omniscient about the future? In order to achieve that, we would need to be able
to manipulate people’s choices using the information provided to them. Personalized
Internet searches, systems such as Google
Now, and personalized advertising are already going in this direction.But given the overwhelming
amount of data available, it needs to be filtered before to be useful.
Such
filtering will inevitably be done in the interests of those who do the
filtering. For example, companies want potential customers to see their ads and
buy their products. The better people’s characteristics are known, the easier it
becomes to manipulate their choices. A recent, controversial Facebook experiment with 600 million
users showed that it is possible to manipulate people’s feelings and mood. Therefore,
it’s not hard to imagine that omniscience might indeed imply omnipotence: those
who know everything could control everything. Let's call the hypothetical tool creating
such power a "Magic Wand".
Assuming that we had
a Magic Wand, could we take the right decisions for our society, or even for
every individual? Many people might say that forecasting societal trends is
different from forecasting the weather: the weather does not care about the
forecast, but people will respond to it, and this will defeat the prophecy. That
seems to imply that successful forecasting of societal developments would
require that people don't know about the forecasts, while governments do. This again
suggests that one would need a secretly operating authority advising the
government about the right decisions, and that it would use the Magic Wand
according to the evidence provided by the data-collecting Crystal Ball. Could
such a scheme work?
A New World Order based on information?
Our "wise
king" or "benevolent dictator" would probably see the Crystal Ball
and the Magic Wand as perfect tools to create social order. Singapore is
sometimes seen as an approximation of such a system. The country has indeed
been enormously successful in the past decades, but despite great advances and fast
economic growth, people's satisfaction has decreased. Why?
A wise king would
certainly sometimes have to interfere with our individual freedoms, if we would
otherwise take decisions that would create more damage for the economy and
society than benefits. This might end up in a situation where we would always have
to execute what the government wants us to do, pretty much as if they were
commands from God. If we were manipulated in our decision-making, this might
even happen without our knowledge. Although the wise king would not be able to
fulfill our wishes all the time, on average he might create better outcomes for
everyone as long as do as we are told. Sure, this sounds dystopian, but let us
nevertheless pursue the concept for a while to see whether it is feasible in
principle. If we obediently followed the dictates of the wise king, could this improve
the state of the world and turn it into a perfectly working clockwork?
Why top-down control is destined to fail
In short, it would
not work. This kind of top-down management, even if guided by comprehensive
information, is destined to fail. This book is, therefore, concerned with elaborating alternative and better ways of using data, which are compatible with constitutional rights and cultural values such as privacy. But let us first figure out what are the reasons why a well-working Crystal Ball and Magic Wand can't exist.
One of the problems is
statistical in nature. To distinguish “good” from “bad” behavior, we need
criteria that clearly separate the two. In general, however, reliable criteria
of this sort don’t exist. We face the problems of false positive classifications
(false alarms, so-called type I errors) and false negatives (type II errors,
where the alarm is not triggered when it should be).
For example, imagine a population of
500 million people, among which there are 500 terrorists. Let’s assume that we can
identify terrorists with an extremely impressive 99 percent accuracy. Then
there are 1 percent false negatives (type II error), which means that 5 terrorists
are not detected, while 495 will be discovered. It has been revealed that about
50 terror acts were prevented over the past 12 years or so, while a few, such
as the one during the Boston marathon, were not prevented even though the
terrorists were listed in some databases of suspects (in other words, they
turned out to be false negatives).
How many false positives (false alarms)
would the above numbers create? If the type I error is just 1 out of 10,000,
there will be 50,000 wrong suspects, while if it is 1 in one thousand then
there will be 500,000 wrong suspects. If it is 1 percent (which is entirely
plausible), there will be 5 million false suspects! It has been reported that
there are indeed between 1 and 8 million people on lists of suspects in the
USA. If these figures are correct, this would mean that for every genuine terrorist,
up to 10,000 innocent citizens would be wrongly categorized as potential
terrorists. Since the 9/11 attacks, about 40,000 suspects have had to
undergo special questioning and screening procedures at international airports,
even though in 99 percent of these cases it was concluded that the suspects were
innocent. And yet the effort needed to reach even this level of accuracy is considerable
and costly: according to media reports, it involved around a million people who
had a National Security Agency (NSA) clearance on the level of Edward Snowden.
So, large-scale surveillance is not an
effective means of fighting terrorism. This conclusion has, in fact, been
reached by several independent empirical studies. Applying surveillance to the
whole population is not sensible, for the same reasons why it is not generally
useful to apply prevention-oriented medical tests or medical treatments to the
entire population: since such mass screenings imply large numbers of false
positives, millions of people might be wrongly treated, often with negative
side effects on their health. Thus, for most diseases, patients should be
tested only if they show worrying symptoms.
Besides these errors
of first and second kind, one may face errors of third kind, namely
inappropriate models for separating "good" from "bad"
cases. For example, unsuitable risk models have been identified as one reason for
the recent financial and economic crisis. The risks of many financial products
turned out to be wrongly rated, creating immense losses. Adair Turner, head of
the UK Financial Service Authority, has said that there is
“a strong belief ... that bad or rather over-simplistic and overconfident economics helped create the crisis. There was a dominant conventional wisdom that markets were always rational and self-equilibrating, that market completion by itself could ensure economic efficiency and stability, and that financial innovation and increased trading activity were therefore axiomatically beneficial.”
Limitations of the Crystal Ball
One might think that errors of first, second, and third kind could be overcome if only we had enough data. But is this true? There are a number of fundamental scientific factors that will impair the Crystal Ball’s functioning (see Information Box 1). The problem known as "Laplace's Demon" reflects on the history-dependence of future developments, and our inability to ever measure all the past information needed to predict the future, even if the world changed according to deterministic rules (that is, if there were no randomness). This is why we are still influenced by cultural inventions, ideas, and social norms that are hundreds or thousands of years old.
Furthermore, turbulence and chaos are well-known properties of many complex dynamical systems. These factors imply that even the slightest change in the system at a certain point in time may fundamentally change the outcome over a sufficiently long period of time. The phenomenon, also named the "butterfly effect," is well-known to impose limits on the time horizon of weather forecasts.[3] In social systems as in the weather system, this extreme sensitivity to small but unpredictable disturbances arises from the complexity of the system: the existence of many inter dependencies between the component parts.
Furthermore, we can determine the parameters of our models only with a finite accuracy. However, even small changes in these parameters may fundamentally change the outcome of the model. There is also a problem of ambiguity: the same information may have several different meanings depending on the respective context, and the particular interpretation we choose may influence the future course of the system. Beyond this, we also know that certain statements are fundamentally “undecidable” in the sense that there are questions that cannot be answered with formal logic. Lastly, too much information may reduce the quality of predictions because of over-fitting, spurious correlations, and herding effects. The Information Box at the end of this chapter elaborate these points in more detail.So one can say that Big Data is not the universal tool that it is often claimed to be.[4] Any attempt to predict the future will be limited to probabilistic and mostly short-term forecasts. It is therefore dangerous to suggest that a Crystal Ball could be built that would reliably predict the future.
Limitations of the Magic Wand
If the Crystal Ball is cloudy, it doesn’t augur well for the Magic Wand that would depend on it. In fact, top-down control is still very ineffective, as the abundance of problems in our world shows. To control complex systems, i.e. to force them to behave in certain ways, we often do not understand them well enough and lack effective means. Therefore, in many cases attempting to control a complex dynamical system in a top down way undermines its functionality. The result is often a broken system, for example, an accident or crisis.
An example of the failure of top-down control is the fact that even the most sophisticated technological control mechanisms for airplane flight safety increased it less efficiently than introducing a non-hierarchical culture of collaboration in the cockpit, when co-pilots were encouraged to question the decisions and actions of the pilot. In another example, the official report on the Fukushima nuclear disaster in Japan stresses that it was not primarily the earthquake and tsunami that were responsible for the nuclear meltdowns, but
“our reflexive obedience; our reluctance to question authority; our devotion to ‘sticking with the program’; our groupism.”
In other words, the problem was too much top-down control. Attempts to control complex systems in a top-down way are also very expensive, and we find it increasingly hard to pay for them: most industrialized countries already have debt levels of at least 100 or 200 percent of their gross domestic product. But do we have any alternatives? In fact, the next chapters of this book will elaborate one.
Complexity is the greatest challenge, but also the greatest opportunity
There are further reasons why the concept of a "super-government", "wise king" or "benevolent dictator" can’t really work. These are related to the complexity of socio-economic systems. There are at least four kinds of complexity that matter: dynamic complexity, structural complexity, functional complexity and algorithmic complexity. The problem of complex dynamics has been addressed in the previous chapter. Here, I will focus on implications of structural, functional and algorithmic complexity. In fact, with a centralized super-computing approach we can only solve those optimization problems, which have sufficiently low algorithmic complexity. However, many problems are "NP-hard," i.e. so computationally demanding that they cannot be handled in real-time even by super-computing. This problem is particularly acute in systems that are characterized by a large variability. In such cases, top-down control cannot reach optimal results. In the next chapter, I will illustrate this by the example of traffic light control.
Given the quick increase in computing power, couldn’t we overcome this challenge in the future? The surprising answer is “no.” While the processing power doubles every 18 months (blue curve in the illustration above), the amount of data doubles every year (green curve above). This implies that we are heading from a situation in which we did not have enough data to take good decisions, to a situation where we can take evidence-based decisions. However, despite the rising processing power, we will be able to process a decreasing share of all the data existing in the world. Moreover, the lack of processing power will be quickly increasing. So we are moving to a situation where we can shed light on everything with a spotlight, but many things will remain unseen in the dark. This creates a new kind of problem: paying too much attention to some problems, while neglecting others. In fact, governments didn't see the financial crisis coming, they didn't see the Arab Spring coming, they didn't see the crisis in the Ukraine coming, and they didn't see the Islamic State (IS) fighters in Iraq coming. Thus, keeping a well-balanced overview of everything will become progressively more difficult. Instead, politics will be increasingly driven by problems that suddenly happen to gain public attention, i.e. made in a reactive rather than anticipatory way.
But let’s now have a look at the
question of how the world is expected to change depending on its complexity.
The possibility to network the components of our world creates ever more
options. We have, in fact, a combinatorial number of possibilities to produce
new systems and functionalities. If we have two kinds of objects, we can
combine them to produce a third one. These three kinds of objects allow us to
create six ones, and those already 720. This is mathematically reflected by a
factorial function, which grows much faster than exponentially (see the red
curve above). For example, we will soon have more devices communicating with
the Internet than people. In about 10 years from now, 150 billion (!) things
will be part of the Internet, forming the "Internet of Things." Thus,
even if we realize just every thousandth or millionth of all combinatorial
possibilities, the factorial curve will eventually overtake the exponential
curves representing data volumes and computational power. It has probably
overtaken both curves already some time ago.
In other words, attempts to
optimize systems in a top-down way will become less and less effective – and
cannot be done in real time. Paradoxically, as economic diversification and
cultural evolution progress, a "big government", "super-government"
or "benevolent dictator" would increasingly struggle to take good
decisions, as it becomes more difficult to satisfy the diverse local
expectations and demands. This means that centralized governance and top-down
control are destined to fail. Given the situation in Afghanistan and Iraq,
Syria, Ukraine, and the states experiencing the Arab Spring, given the
financial, economic and public debt crisis, and given the quick spreading of
the Ebola disease in Africa, have we perhaps lost control already? Are we
fighting a hopeless battle against complexity?
Simplifying our world by
homogenization and standardization would not fix the problem, as I will
elaborate in the chapter on the Innovation Accelerator. It would undermine
cultural evolution and innovation, thereby causing a failure to adjust to our
ever-changing world. Thus, do we have alternatives? Actually, yes: rather than
fighting the properties of complex systems, we can use them for us, if we learn
to understand their nature. The fact that the complexity of our world has
surpassed our capacity to grasp it, even with all the computers and information
systems assisting us, does not mean that our world must end in chaos. While our
current system is based on administration, planning, and optimization, our
future world will be built on evolutionary principles and collective
intelligence, i.e. intelligence surpassing that of the brightest people and best
expert systems.
How to get there?
In the next
chapters, I will show how the choice of suitable local interaction mechanisms
can, in fact, create desirable outcomes. Information and communication systems
will enable us to let things happen in a favorable way. This is the path we
should take, because we don't have better alternatives. The proposed approach
will create more efficient socio-economic institutions and new opportunities
for everyone: politics, business, science, and citizens alike. As a positive
side effect, our society will become more resilient to the future challenges
and shocks that we will surely face.
Conclusions
"Big Data"
has great potential, in particular for better, evidence-based decision-making. But
it is not a universal solution, as it is often suggested. In particular,
data-driven approaches are notoriously bad at predicting systemic shifts, where
the entire way of organizing or doing things change. Moreover, like any
technology, Big Data can be seriously misused, posing a "dual use
problem" (see the Information Box 2 below). Without suitable precautions – for example, the use of
"data safes," decentralization, encryption, the logging of
large-scale data-mining activities, the limitation of
large processing volumes to qualified and responsible users, the accountability
of Big Data users for damage created by them, and large fines in cases of
damage, misuse, or discrimination – mining Big Data may create massive problems
(intentionally or not). It is, therefore, crucial to design socio-technical
systems in ways that promote their ethical use.
INFORMATION BOX 1: Limitations to Building a Crystal Ball
Sensitivity - When all the data in the world can't help
How close can computer-modeled behavior ever come to real human social behavior? To specify the parameters and starting conditions of a computer model, these are varied by calibration procedures until the difference between measurement data and model predictions becomes as small as possible. However, the best fitting model parameters are usually not the correct parameters. These parameters are typically within a certain "confidence interval." But if the parameters are randomly picked from the confidence interval, the model predictions may vary a lot. This problem is known as sensitivity.
Turbulence and chaos
Two further problems of somewhat similar nature are "chaos" and "turbulence." Rapid flows of gases or liquids produce swirly patterns – the characteristic forms of turbulence. In chaotically behaving systems, too, the motion becomes unpredictable after a certain time period. Even though the way a "deterministically chaotic" system evolves can be precisely stated in mathematical terms, without random elements, the slightest change in the starting conditions can eventually cause a completely different global state of the system. In such a case, no matter how accurately we measure the initial conditions of the system, we will effectively not be able to predict the later behavior.
Ambiguity
Information can have different meanings. In many cases, the correct interpretation can be found only with some additional piece of information: the context. This contextualization is often difficult and not always available when needed. Different pieces of information can also be inconsistent, without any means of resolving the conflict.A typical problem in "data mining" challenges is that data might be plentiful, but inconsistent, incomplete, and not even representative. Moreover, a lot of it might be wrong, because of measurement errors, misinterpretations or application of wrong procedures, or manipulation.
Laplace's Demon and measurement problems
Laplace's Demon is a hypothetical being who could calculate all future states of the world, if he knew the exact positions and speeds of all particles and the physical laws governing their motion and interactions. Laplace's Demon cannot exist in reality, not least because of the fundamental limitation that measurements to determine all particle speeds would be impossible due to the restriction of special relativity: all velocities must be less than the speed of light. This would prevent one from gathering all the necessary data.Information overloadHaving a lot of data does not necessarily mean that we'll see the world more accurately. A typical problem is that of "over-fitting," where a model characterized by many parameters is fitted to the fine details of a data set in ways that are actually not meaningful. In such a case, a model with less parameters might provide better predictions. Spurious correlations are a somewhat similar problem: we tend to see patterns where they actually don't exist (see http://www.tylervigen.com/ for some examples).Note that we are currently moving from a situation where we had too little data about the world to a situation where we have too much. It’s like moving from darkness, where we can't see enough, to a world flooded with light, in which we are blinded. We will need "digital sunglasses": information filters that will extract the relevant information for us. But as the gap between the data that exists and the data we can analyze increases, it might become harder to pay attention to those things that really matter. Although computer processing power doubles every 18 months, we will be able to process an ever decreasing fraction of all the data we possess, because the data storage capacity doubles every year. In other words, there will be increasing volumes of data that will never be touched.HerdingWhen people feel insecure, they tend to follow decisions and actions of others. This produces undesirable herding effects. The economics Nobel laureates George Akerlof (*1940) and Robert Shiller (*1946) have called this behaviour "animal spirits," but in fact the idea of herding in economics goes back at least to the French mathematician Louis Bachelier (1870-1946). Bubbles and crashes in stock markets are examples of where herding can lead.Randomness and innovationRandomness is a ubiquitous feature of socio-economic systems. However, even though we would often like to reduce the risks it generates, we would be unwise to try to eliminate randomness completely. It is an important driver of creativity and innovation; predictability excludes positive surprises and cultural evolution. We will see later that some important and useful social mechanisms can only evolve in the presence of randomness. Although newly emerging behaviors are often costly in the beginning, when they are in a minority position, the random coincidence or accumulation of such behaviors in the same neighborhood can be very beneficial, and such behavior may then eventually succeed and spread.
INFORMATION BOX 2: Side effects of massive data collection
Like any technology, Big Data has not only great potential but also harmful side effects. Not all Big Data applications come with these problems, but they are not uncommon. What we need to identify, are those problems that can lead to major crises rather than just localized, small-scale defects.
Crime
In the past years, cybercrime has increased exponentially, costing Europe alone around 750 million EUR per year. Some of this has resulted from the undermining of security standards (for example those of financial transactions) for the purpose of surveillance. Other common problems are data theft or identity theft, data manipulation, and the fabrication of false evidence. These crimes are often committed by means of “Trojan horses”, computer codes that can steal passwords and PIN codes. Further problems are caused by computer viruses or worms that damage software or data.
Military risks
Because most of our critical infrastructures are now connected with other systems via information and communications networks, they have become pretty vulnerable to cyber attacks. In principle, malicious intruders can manipulate the production of chemicals, energy (including nuclear power stations), and communication and financial networks. Attacks are sometimes possible even if the computers controlling such critical infrastructures are not connected to the Internet. Given our dependence on electricity, information and money flows as well as other goods and services, this makes our societies vulnerable as never before. Coordinated cyber-attacks could be launched within microseconds and bring the functioning of our economy and societies to a halt.
The US government apparently reserves the right to respond to cyberwar with a nuclear counter-strike. We are now seeing a digital arms race for the most powerful information-based surveillance and manipulation technologies. It is doubtful whether governments will be able to prevent serious misuse of such powerful tools. Just imagine, a Crystal Ball or Magic Wand or other powerful digital tools would exist. Then, of course, everyone wanted to use them, including our enemies, and criminals as well. It is obvious that, sooner or later, these powerful tools would get into wrong hands and finally out of control. If we don't take suitable precautions, mining massive data may (intentionally or not) create problems of any scale – including digital weapons of mass destruction. Therefore, international efforts towards confidence-building and digital disarmament are crucial and urgent.
Economic risks
Cybercrime poses obvious risks to the economy, as do illicit access to sensitive business secrets and theft of intellectual property. Loss of customer trust in products can cause sales losses of the order of billions of dollars for some companies. Systems that would not work effectively without a sufficient level of trust include electronic banking, sensitive communication by email, eBusiness, eVoting, and social networking. Yet more than two thirds of all Germans say they do not trust government authorities and Big Data companies any longer to not misuse their personal data. More than 50 percent even feel threatened by the Internet. The success of the digital economy is further threatened by information pollution, for example, spam and undesired ads.Social and societal risksTo contain "societal diseases" such as terrorism and organized crime, it often seems that surveillance is needed. However, the effectiveness of mass surveillance in improving the level of security is questionable and hotly debated: the evidence is missing or weak. At the same time, mass surveillance erodes the trustful relationship between citizens and the state. The perceived loss of privacy is also likely to promote conformism and to endanger diversity and useful criticism. Independent judgments and decision-making could be undermined. Excessive state control of the behavior of citizens would, therefore, impair our society’s ability to innovate and adapt.For such reasons, the constitutions of many countries consider it of fundamental importance to protect privacy, informational self-determination, private communication, and the principle of assumed innocence without proof of guilt. These things are also considered to be essential for human dignity, and elementary preconditions for democracies to function well.However, today the Internet lacks good mechanisms for forgetting, forgiveness, and re-integration. There are also concerns that the increasing use of Big Data could lead to greater discrimination, which in turn could promote increasing fragmentation of our society into subcultures. For example, it is believed that the spreading of social media has promoted the polarization of US society.Political risksIt is often pointed out that leaking confidential political communication can undermine the success of sensitive negotiations. Moreover, if incumbent governments have better access to Big Data applications than parties in opposition, this could result in unfair competition and non-representative election outcomes. Last but not least, in the hands of extremist political groups or criminals, Big Data could become a dangerous tool for acquiring and exerting power.
[1] Dear
Reader,
thank you for your
interest in this chapter, which is thought to stimulate debate.
What you are seeing
here is work in progress, a chapter of a book on the emerging Digital Society
I am currently
writing. My plan was to elaborate and polish this further, before I share this
with anybody else. However, I often feel that it is more important to share my
thoughts with the public now than trying to perfect the book first while keeping
my analysis and insights for myself in times requiring new ideas.
So, please apologize
if this does not look 100% ready. Updates will follow. Your critical thoughts
and constructive feedback are very welcome. You can reach me via dhelbing(AT) ethz.ch or
@dirkhelbing at twitter.
I hope these
materials can serve as a stepping stone towards mastering the challenges ahead
of us and towards developing an open and participatory information
infrastructure for the Digital Society of the 21st century that would enable
everyone to take better informed decisions and more effective actions.
I believe that our
society is heading towards a tipping point, and that this creates the
opportunity for a better future.
But it will take many
of us to work it out. Let’s do this together!
Thank you very much,
I wish you an enjoyable reading,
Dirk Helbing
PS: Special thanks go to the FuturICT community and to
Philip Ball.
[2] See: http://www.businessinsider.com/cia-presentation-on-big-data-2013-3?op=1 and http://gigaom.com/2013/03/20/even-the-cia-is-struggling-to-deal-with-the-volume-of-real-time-social-data/2/. For similar recent FBI priorities see http://www.slate.com/blogs/future_tense/2013/03/26/andrew_weissmann_fbi_wants_real_time_gmail_dropbox_spying_power.html
[3] These
prediction limits are not just a matter of getting enough measurement data and
having a sufficiently powerful computer – one cannot get beyond a certain
precision because of the physical nature of the underlying process.
[4] To
convince me of the opposite, in analogy to the "Turing test" checking
whether a computer can communicate undistinguishable from a human, one would
have to demonstrate that a computer system passes the "Helbing test,"
i.e. finds all fundamental laws of physics discovered by scientists so far,
just by mining the experimental data accumulated over time.
No comments:
Post a Comment
Note: only a member of this blog may post a comment.