by Dirk Helbing
This chapter is a free translation
of an introductory article on "Big Data - Zauberstab und Rohstoff des 21.Jahrhunderts" originally published in Die Volkswirtschaft - Das Magazin für
Wirtschaftspolitik (5/2014),
Abstract
Information and communication
technology (ICT) is the economic sector that is developing most rapidly in the
USA and Asia and generates the greatest value added per employee. Big Data - the
algorithmic discovery of hidden treasures in large data sets - creates new
economic value. The development is increasingly understood as a new
technological revolution. Switzerland could establish itself as data bank and
Open Data pioneer in Europe and turn into a leading place in the area of
information technologies.
What is Big Data?
When the social media portal
WhatsApp with its 450 million users was recently sold to Facebook for$19 billion - almost half a
billion dollars was made per employee. "Big
Data" is changing our world. The term, coined more than 15 years ago,
means data sets so big that one can no longer cope with them with standard
computational methods. Big Data is increasingly referred to as the oil of the
21st century. To benefit from it, we must learn to "drill" and
"refine" data, i.e. to transform them into useful information and
knowledge. The global data volume doubles every 12 months. Therefore, in just
two years, we produce as much data as in the entire history of humankind.
Tremendous amounts of data have
been created by four technological innovations:
- the Internet, which enables our global communication
- the World Wide Web, a network of globally accessible websites that evolved after the invention of hypertext protocol (HTTP) at CERN in Geneva
- the emergence of social media such as Facebook, Google+, Whatsup, or Twitter, which have created social communication networks, and
- the emergence of the "Internet of Things'', which also allows sensors and machines to connect to the Internet. Soon there will be more machines than human users in the Internet.
Data sets bigger than the largest library
Meanwhile, the data sets
collected by companies such as eBay, Walmart or Facebook, reach the size of
petabytes (1 million billion bytes) - one hundred times the information content
of the largest library in the world: the U.S. Library of Congress. The mining
of Big Data opens up entirely new possibilities for process optimization, identification
of interdependencies, and decision support. However, Big Data also comes with
new challenges, which are often characterized by four criteria:
- volume: the file sizes and number of records are huge,
- velocity: the data evaluation has often to be done in real-time,
- variety: the data is often very heterogeneous and unstructured,
- veracity: the data is probably incomplete, not representative, and contains errors
Therefore, one had to develop
completely new algorithms: new computational methods. Because it is inefficient
for Big Data processing to load all relevant data into a shared memory, the
processing must take place locally, where the data resides, on potentially, thousands
of computers. This is accomplished with massively parallel computing approaches
such as: MapReduce or Hadoop. Big Data algorithms detect
interesting interdependencies in the data ("correlations"), which may
be of commercial value, for example, between weather and consumption or between
health and credit risks. Today, even the prosecution of crime and terrorism is
based on the analysis of large amounts of behavioral data.
What do applications look like?
Big Data applications are
spreading like wildfire. They facilitate personalized offers, services and
products. One of the greatest successes of Big Data is automatic speech
recognition and processing. Apple's Siri understands you when asking for a
Spanish restaurant, and Google Maps can lead you there. Google Translate
interprets foreign languages by comparing them with a huge collection of
translated texts. IBM's Watson computer even understands human language. It can
not only beat experienced quiz show players, but even take care of customer
hotlines - often better than humans. IBM has recently decided to invest $1
billion to further develop and commercialize the system.
Of course, Big Data plays an
important role in the financial sector. Approximately seventy percent of all
financial market transactions are now made by automated trading algorithms. In
just one day, the entire money supply of the world is traded. Such quantities
of money also attract organized crime and financial transactions are scanned by
Big Data algorithms for abnormalities to detect suspicious activities. The
company Blackrock uses a similar software called "Aladdin", to
successfully speculate with funds amounting to multiple times the gross
domestic product (GDP) of Switzerland.
Box 1:
To get an overview of the ICT trends, it is worthwhile to look at Google with over 50 software platforms. The company invests nearly $6 billion in research and development annually. Within just one year, Google has introduced self-driving cars, invested heavily in robotics, and started a Google Brain project to add intelligence to the Internet. Through the purchase of Nest Labs, Google has also invested $3.2 billion in the "Internet of Things". Furthermore, Google X has been reported to have around 100 secret projects in the pipeline.
The potential is great...
No country today can afford to
ignore the potentials of Big Data. The additional economic potential of Open Data alone - i.e. of data sets that
are made available to everyone - is estimated by McKinsey to be between 3,000
to 5,000 billion dollars globally each year [2]. This can benefit almost all
sectors of society. For example, energy production and consumption can be
better matched with "smart metering",
and energy peaks can be avoided. More generally, new information and
communication technologies allow us to build "smart cities". Resources can be managed more efficiently and
the environment protected better. Risks can be better recognized and avoided,
thereby reducing unintended consequences of decisions and identifying
opportunities that would otherwise have been missed. Medicine can be better
adapted to the patients, and disease prevention may become more important than
curing diseases.
... but also the implicit risks
Like all technologies, Big Data
also implies risks. The security of digital communication has been undermined.
Cyber crime, including data, identity and financial theft, quickly spread on
ever greater dimensions. Critical infrastructures such as energy, financial and
communication systems are threatened by cyber attacks. They could, in
principle, be made dysfunctional for an extended time period.
Moreover, while common Big Data
algorithms are used to reveal optimization potentials, their results may be
unreliable or may not reflect causal relationships. Therefore, a naive
application of Big Data algorithms can easily lead to wrong conclusions. The
error rate in classification problems (e.g. the distinction between
"good" and "bad" risks) is often relevant. Issues such as
wrong decisions or discrimination must be seriously considered. Therefore, one
much find effective procedures for quality control. In this connection,
universities will likely play an important role. One must also find effective
mechanisms to protect privacy and the right of informational
self-determination, for example, by applying the Personal Data Purse [1] concept.
The digital revolution creates an urgency to act
Information and communication
technologies are going to change most of our traditional institutions: our
educational system (personalized learning), science (Data Science), mobility
(self-driving cars), the transport of goods (drones), consumption (see amazon
and ebay), production (3D printers), the health system (personalized medicine),
politics (more transparency), and the entire economy (with co-producing
consumers, so-called prosumers). Banks are losing more and more ground to algorithmic
trading, alternative payment systems such as Bitcoins, Paypal and Google
Wallet. Moreover, a substantial part of the insurance business takes place in
financial products such as credit default swaps. For the economic and social
transformation into a ``digital society'', we may perhaps just have 20 years.
This is an extremely short time period, considering that the planning and
construction of a road often requires 30 years or more.
The foregoing implies an urgent
need for action on the technological, legal and socio-economic level. Some
years ago, the United States started a Big Data research initiative amounting
to 200 million dollars followed by further substantial investments. In Europe,
the FuturICT project (www.futurict.eu) has developed concepts for
the digital society within the context of the EU flagship competition. Other
countries have already started to implement this concept, for example, Japan has
recently launched a $100 million 10-year project at the Tokyo Institute of Technology. In addition, numerous other projects
exist, particularly in the military and security sector, which often have multiples
of the budgets mentioned above.
Switzerland can become a European driver of innovation for the digital era
Switzerland is well positioned
to benefit from the digital age. However, it is insufficient to reinvent and
build upon already existing technologies in Switzerland. New inventions that
will shape the digital age must be invented. The World Wide Web was once
invented in Switzerland, the largest civil Big Data competence in the world exists
at CERN, however the USA and Asian countries have the lead in commercializing
Big Data to date. With the NSA controversy, the ubiquity of wireless communication
sensors as well as the "Internet of Things", a new opportunity is emerging.
With targeted support of ICT
activities at its universities, Switzerland could take the lead in Europe's
research and development. Swiss academia has excelled with the scientific
coordination of three out of six finalists of the EU FET flagship competition.
At the moment, however, there is
only a focus on the digital modeling of the human brain and robotics. However from
2017 onwards, the ETH domain plans to increasingly invest into the area of Data Science, the emerging research
field centered around the scientific analysis of data.
In view of the fast development
of the ICT area, the huge economic potential as well as the transformative
power of these technologies, a prioritized, broad and substantial financial
support is a matter of Swiss national interest. With its basic democratic
values, legal framework and ICT focus, Switzerland is well prepared to become
Europe's innovation driver for the digital age.
Box 2:
How will the digital revolution change our economy and society? How can we use this as an opportunity for us and reduce the related risks? For illustration, it is helpful to recall the factors that enabled the success of the automobile age: the invention of cars and of systems of mass production; the construction of public roads, gas stations, and parking lots; the creation of driving schools and driver licenses; and last but not least, the establishment of traffic rules, traffic signs, speed controls, and traffic police.
What are the technological infrastructures and the legal, economic and societal institutions needed to make the digital age a big success? This question would set the agenda of the Innovation Alliance. A partial answer is already clear: we need trustworthy, transparent, open, and participatory ICT systems, which are compatible with our values. For example, it would make sense to establish the emergent "Internet of Things" as a Citizen Web. This would enable self-regulating systems through real-time measurements of the state of the world, which would be possible with a public information platform called the "Planetary Nervous System". It would also facilitate a real-time measurement and search engine: an open and participatory "Google 2.0."
To protect privacy, all data collected about individuals should be stored in a Personal Data Purse and, given informed consent, processed in a decentralized way by third-party Trustable Information Brokers, allowing everyone to control the use of their sensitive data. A Micro-Payment System would allow data providers, intellectual property right holders, and innovators to get rewards for their services. It would also encourage the exploration of new and timely intellectual property right paradigms ("Innovation Accelerator"). A pluralistic, User-centric Reputation System would promote responsible behavior in the virtual (and real) world. It would even enable the establishment of a new value exchange system called "Qualified Money," which would overcome weaknesses of the current financial system by providing additional adaptability.
A Global Participatory Platform would empower everyone to contribute data, computer algorithms and related ratings, and to benefit from the contributions of others (either free of charge or for a fee). It would also enable the generation of Social Capital such as trust and cooperativeness, using next-generation User-controlled Social Media. A Job and Project Platform would support crowdsourcing, collaboration, and socio-economic co-creation. Altogether, this would build a quickly growing Information and Innovation Ecosystem, unleashing the potential of data for everyone: business, politics, science, and citizens alike.
Further Reading
[1] Y.-A. de Montjoye, E. Shmueli, S. S. Wang, and A.
S. Pentland (2014) openPDS: Protecting the Privacy of Metadata through
SafeAnswers,
see also see also http://newsoffice.mit.edu/2014/own-your-own-data-0709
[2] McKinsey & Company
(2013) Open data: Unlocking innovation and performance with liquid information,