by Dirk Helbing (ETH Zurich, dhelbing@ethz.ch)
(an almost identical version has been
forwarded to some Members of the European Parliament on April 7, 2013)
(an almost identical version has been forwarded to some Members of the European Parliament on April 7, 2013)
Some serious, fundamental problems to be solved
The first
problem is that, when two or more anonymous data sets are being combined, this
may allow deanonymization, i.e. the identification of the individuals of which
the data have been recorded. Mobility data, in particular, can be easily
deanonymized.
A second fundamental problem is that it must be assumed that the large
majority of people in developed countries, including the countries of the European
Union, have already been profiled in detail, given that individual devices can
be identified with high accuracy through individual configurations (including
software used and their configurations). There are currently about 700 Million
commercial data sets about users specifying an estimated number of 1500 variables
per user.
A third problem is that both, the CIA and the FBI have revealed that,
besides publicly or semipublicly available data in the Web or Social Media,
they are or will be storing or processing private data including Gmail and Dropbox
data. The same applies to many secret services around the world. It has also
become public that the NSA seems to collect all data they can get hold of.
A fourth fundamental problem is that Europe currently does not have the
technical means, algorithms, software, data and laws to counter foreign
dominance regarding Big Data and its potential misuse.
The age of information will only be sustainable, if people can trust
that their data are being used in their interest. The spirit and goal of data
regulations should be to ensure this.
Personal data are data characterizing individuals or data derived from
them. People should be the primary owners of their personal data. Individuals, companies
or government agencies, who gather, produce, process, store, or buy data should
be considered secondary owners. Whenever personal data are from European citizens,
or are being stored, processed, or used in a European country or by a company
operating in a European country, European law should be applied.
Individuals should be allowed to use their own personal data in any way
compatible with fundamental rights (including sharing them with others, for
free or at least for a small monthly fee covering the use of ALL their personal
data – like the radio and TV fee). [Note: This is important to unleash the
power of personal data to the benefit of society and to close the data gap that
Europe has.]
Individuals should have a right to access a full copy of all their
personal data through a central service and be suitably protected from misuse
of these data.
They should have a right to limit the use of their personal data any
time and to request their correction or deletion in a simple and timely way and
for free.
Fines should apply to any person or company or institution having or
creating financial or other advantages by the misuse of personal data.
Misuse includes in particular sensitive use that may have a certain
probability of violating human rights or justified personal interests. Therefore,
it must be recorded what error rate the processing (and, in particular, the
classification) of personal data has, specifying what permille of users feel
disadvantaged.
A central institution (which might be an open Web platform) is needed to
collect user complaints. Sufficient transparency and decentralized institutions
are required to take efficient, timely and affordable action to protect the
interest of users.
The execution of user rights must be easy, not time consuming, and cheap
(essentially for free). For example, users must not be flooded with requests
regarding their personal data. They must be able to effectively ensure a
self-determined use of personal data with a small individual effort.
To limit misuse, transparency is crucial. For example, it should be
required that large-scale processing of personal data (i.e. at least the
queries that were executed) must be made public in a machine-readable form,
such that public institutions and NGOs can determine how dangerous such queries
might be for individuals.
As indicated above, there is practically no data that can not be deanonymized, if combined with other data. However, the following definition may be considered to be a practical definition of anonymity:
Anonymous data are data in which a person of interest can only be identified with a
probability smaller than 1/2000, i.e. there is no way to find out which one among
two thousand individuals has the property of interest.
Hence, the principles is that of diluting persons with a certain
property of interest by 2000 persons with significantly other properties in
order to make it unlikely to identify persons with the property of interest. This
principle is guided by the way election data or other sensitive data are being
used by public authorities. It also makes sure that private companies do not
have a data processing advantage over public institutions (including research
institutions).
I would propose to characterize pseudonymous
data as data not suited to reveal or track the user and properties
correlated with the user that he or she has not explicitly chosen to reveal in
the specific context. I would furthermore suggest to characterize pseudonymous
transactions as processing and storing the minimum amount of data required to
perform a service requested by a user (which particularly implies not to
process or store technical details that would allow one to identify the device
and software of the user). Essentially, pseudonymous transactions should not be
suited to identity the user or variables that might identify him or her.
Typically, a pseudonym is a random or user-specified variable that allows one to
sell a product or perform a service for a user anonymously, typically in exchange
for an anonymous money transfer.
To allow users to check pseudonymity, the data processed and stored
should be fully shared with the user via an encrypted webpage (or similar) that
is accessible for a limited, but sufficiently long time period through a unique
and confidential decryption key made accessible only to the respective user. It
should be possible for the user to easily decrypt, view, copy, download and
transfer the data processed and stored by the pseudonymous transaction in a way
that is not being tracked.
Difficulty to anonymize data
- Researchers reverse Netflix anonymization, see www.securityfocus.com/news/11497
- Unique in the crowd: The privacy bounds of human mobility, see www.nature.com/srep/2013/130325/srep01376/full/srep01376.html
Danger of surveillance society
- Google as God? Opportunities and risks of the information age, see www.synthesisips.net/blog/google-as-god/
- Big data is opening doors, but maybe too many, see www.nytimes.com/2013/03/24/technology/big-data-and-a-renewed-debate-over-privacy.html?ref=stevelohr&_r=2&
- Future planet – future of surveillance, see www.international.to/index.php?option=com_content&view=category&id=94&layout=blog&Itemid=104
- CIA and FBI strategies to mine personal data, see www.businessinsider.com/cia-presentation-on-big-data-2013-3?op=1 and www.gigaom.com/2013/03/20/even-the-cia-is-struggling-to-deal-with-the-volume-of-real-time-social-data/2/ and http://www.slate.com/blogs/future_tense/2013/03/26/andrew_weissmann_fbi_wants_real_time_gmail_dropbox_spying_power.html
- US Consumer Privacy Bill of Rights, see www.whitehouse.gov/sites/default/files/privacy-final.pdf
- Personal data: The emergence of a new asset class, see www.weforum.org/reports/personal-data-emergence-new-asset-class
- HP software allowing personalized advertisement without revealing personal data to companies, contact: Prof. Dr. Bernardo Huberman: huberman@hpl.hp.com
- FuturICT – The road towards ethical ICT, see http://link.springer.com/article/10.1140%2Fepjst%2Fe2012-01691-2#page-1
- From social data mining to forecasting socio-economic crises, see http://link.springer.com/article/10.1140%2Fepjst%2Fe2011-01401-8
- FuturICT Facebook page: www.facebook.com/FuturICT
- FuturICT twitter channel: https://twitter.com/FuturICT
Dirk Helbing is Professor of Sociology, in particular of Modeling and Simulation, and member of the Computer Science Department at ETH Zurich. He is also elected member of the German Academy of Sciences. He earned a PhD in physics and was Managing Director of the Institute of Transport & Economics at Dresden University of Technology in Germany. He is internationally well-known for his work on pedestrian crowds, vehicle traffic, and agent-based models of social systems. Furthermore, he is coordinating the FuturICT Initiative (www.futurict.eu), which focuses on the understanding of techno-socio-economic systems, using Big Data. His work is documented by hundreds of well-cited scientific articles, dozens of keynote talks and hundreds of media reports in all major languages. Helbing is also chairman of the Physics of Socio-Economic Systems Division of the German Physical Society, co-founder of ETH Zurich’s Risk Center, and elected member of the World Economic Forum’s Global Agenda Council on Complex Systems.
No comments:
Post a Comment
Note: only a member of this blog may post a comment.