Predictive privacy: Collective data protection in the context of artificial intelligence and big data

Rainer Mühlhoff

Abstrak

Big data and artificial intelligence pose a new challenge for data protection as these techniques allow predictions to be made about third parties based on the anonymous data of many people. Examples of predicted information include purchasing power, gender, age, health, sexual orientation, ethnicity, etc. The basis for such applications of “predictive analytics” is the comparison between behavioral data (e.g. usage, tracking, or activity data) of the individual in question and the potentially anonymously processed data of many others using machine learning models or simpler statistical methods. The article starts by noting that predictive analytics has a significant potential to be abused, which manifests itself in the form of social inequality, discrimination, and exclusion. These potentials are not regulated by current data protection law in the EU; indeed, the use of anonymized mass data takes place in a largely unregulated space. Under the term “predictive privacy,” a data protection approach is presented that counters the risks of abuse of predictive analytics. A person's predictive privacy is violated when personal information about them is predicted without their knowledge and against their will based on the data of many other people. Predictive privacy is then formulated as a protected good and improvements to data protection with regard to the regulation of predictive analytics are proposed. Finally, the article points out that the goal of data protection in the context of predictive analytics is the regulation of “prediction power,” which is a new manifestation of informational power asymmetry between platform companies and society.

Artikel Ilmiah Terkait

Confidential Training and Inference using Secure Multi-Party Computation on Vertically Partitioned Dataset

Nirmalya Sarkar Kapil Tiwari Jossy P George

17 November 2023

Digitalization across all spheres of life has given rise to issues like data ownership and privacy. Privacy-Preserving Machine Learning (PPML), an active area of research, aims to preserve privacy for machine learning (ML) stakeholders like data owners, ML model owners, and inference users. The Paper, CoTraIn-VPD, proposes private ML inference and training of models for vertically partitioned datasets with Secure Multi-Party Computation (SPMC) and Differential Privacy (DP) techniques. The proposed approach addresses complications linked with the privacy of various ML stakeholders dealing with vertically portioned datasets. This technique is implemented in Python using open-source libraries such as SyMPC (SMPC functions), PyDP (DP aggregations), and CrypTen (secure and private training). The paper uses information privacy measures, including mutual information and KL-Divergence, across different privacy budgets to empirically demonstrate privacy preservation with high ML accuracy and minimal performance cost.

A Decentralized Private Data Marketplace using Blockchain and Secure Multi-Party Computation

Oscar Lage Julen Bernabé-Rodríguez Albert Garreta

16 Maret 2024

Big data has proven to be a very useful tool for companies and users, but companies with larger datasets have ended being more competitive than the others thanks to machine learning or artificial intelligence. Secure multi-party computation (SMPC) allows the smaller companies to jointly train arbitrary models on their private data while assuring privacy, and thus gives data owners the ability to perform what are currently known as federated learning algorithms. Besides, with a blockchain it is possible to coordinate and audit those computations in a decentralized way. In this document, we consider a private data marketplace as a space where researchers and data owners meet to agree the use of private data for statistics or more complex model trainings. This document presents a candidate architecure for a private data marketplace by combining SMPC and a public, general-purpose blockchain. Such a marketplace is proposed as a smart contract deployed in the blockchain, while the privacy preserving computation is held by SMPC.

Trustworthy Distributed AI Systems: Robustness, Privacy, and Governance

Ling Liu Wenqi Wei

2 Februari 2024

Emerging Distributed AI systems are revolutionizing big data computing and data processing capabilities with growing economic and societal impact. However, recent studies have identified new attack surfaces and risks caused by security, privacy, and fairness issues in AI systems. In this paper, we review representative techniques, algorithms, and theoretical foundations for trustworthy distributed AI through robustness guarantee, privacy protection, and fairness awareness in distributed learning. We first provide a brief overview of alternative architectures for distributed learning, discuss inherent vulnerabilities for security, privacy, and fairness of AI algorithms in distributed learning, and analyze why these problems are present in distributed learning regardless of specific architectures. Then we provide a unique taxonomy of countermeasures for trustworthy distributed AI, covering (1) robustness to evasion attacks and irregular queries at inference, and robustness to poisoning attacks, Byzantine attacks, and irregular data distribution during training; (2) privacy protection during distributed learning and model inference at deployment; and (3) AI fairness and governance with respect to both data and models. We conclude with a discussion on open challenges and future research directions toward trustworthy distributed AI, such as the need for trustworthy AI policy guidelines, the AI responsibility-utility co-design, and incentives and compliance.

Exploring Homomorphic Encryption and Differential Privacy Techniques towards Secure Federated Learning Paradigm

S. Bouzefrane S. Banerjee Thinh Le Vinh + 1 lainnya

13 September 2023

The trend of the next generation of the internet has already been scrutinized by top analytics enterprises. According to Gartner investigations, it is predicted that, by 2024, 75% of the global population will have their personal data covered under privacy regulations. This alarming statistic necessitates the orchestration of several security components to address the enormous challenges posed by federated and distributed learning environments. Federated learning (FL) is a promising technique that allows multiple parties to collaboratively train a model without sharing their data. However, even though FL is seen as a privacy-preserving distributed machine learning method, recent works have demonstrated that FL is vulnerable to some privacy attacks. Homomorphic encryption (HE) and differential privacy (DP) are two promising techniques that can be used to address these privacy concerns. HE allows secure computations on encrypted data, while DP provides strong privacy guarantees by adding noise to the data. This paper first presents consistent attacks on privacy in federated learning and then provides an overview of HE and DP techniques for secure federated learning in next-generation internet applications. It discusses the strengths and weaknesses of these techniques in different settings as described in the literature, with a particular focus on the trade-off between privacy and convergence, as well as the computation overheads involved. The objective of this paper is to analyze the challenges associated with each technique and identify potential opportunities and solutions for designing a more robust, privacy-preserving federated learning framework.

A Deeper Look into the EU Text and Data Mining Exceptions: Harmonisation, Data Ownership, and the Future of Technology

T. Margoni M. Kretschmer

14 Juli 2021

This paper focuses on the two exceptions for text and data mining (TDM) introduced in the Directive on Copyright in the Digital Single Market (CDSM). While both are mandatory for Member States, Art. 3 is also imperative and finds application in cases of text and data mining for the purpose of scientific research by research and cultural institutions; Art. 4, on the other hand, permits text and data mining by anyone but with rightholders able to ‘contract-out’ (Art. 4). We trace the context of using the lever of copyright law to enable emerging technologies such as AI and the support innovation. Within the EU copyright intervention, elements that may underpin a transparent legal framework for AI are identified, such as the possibility of retention of permanent copies for further verification. On the other hand, we identify several pitfalls, including an excessively broad definition of TDM which makes the entire field of data-driven AI development dependent on an exception. We analyse the implications of limiting the scope of the exceptions to the right of reproduction; we argue that the limitation of Art. 3 to certain beneficiaries remains problematic; and that the requirement of lawful access is difficult to operationalize. In conclusion, we argue that there should be no need for a TDM exception for the act of extracting informational value from protected works. The EU’s CDSM provisions paradoxically may favour the development of biased AI systems due to price and accessibility conditions for training data that offer the wrong incentives. To avoid licensing, it may be economically attractive for EU-based developers to train their algorithms on older, less accurate, biased data, or import AI models already trained abroad on unverifiable data.

Daftar Referensi

0 referensi

Tidak ada referensi ditemukan.

Artikel yang Mensitasi

0 sitasi

Tidak ada artikel yang mensitasi.