DOI: 10.1145/3510540

Terbit pada 14 Juni 2020 Pada ACM Transactions on Intelligent Systems and Technology

The OARF Benchmark Suite: Characterization and Implications for Federated Learning Systems

Q. Li Yuan Li Bingsheng He + 3 penulis

Abstrak

This article presents and characterizes an Open Application Repository for Federated Learning (OARF), a benchmark suite for federated machine learning systems. Previously available benchmarks for federated learning (FL) have focused mainly on synthetic datasets and use a limited number of applications. OARF mimics more realistic application scenarios with publicly available datasets as different data silos in image, text, and structured data. Our characterization shows that the benchmark suite is diverse in data size, distribution, feature distribution, and learning task complexity. The extensive evaluations with reference implementations show the future research opportunities for important aspects of FL systems. We have developed reference implementations, and evaluated the important aspects of FL, including model accuracy, communication cost, throughput, and convergence time. Through these evaluations, we discovered some interesting findings such as FL can effectively increase end-to-end throughput. The code of OARF is publicly available on GitHub.1

Artikel Ilmiah Terkait

FedML: A Research Library and Benchmark for Federated Machine Learning

R. Raskar Praneeth Vepakomma Yan Kang + 13 lainnya

27 Juli 2020

Federated learning is a rapidly growing research field in the machine learning domain. Although considerable research efforts have been made, existing libraries cannot adequately support diverse algorithmic development (e.g., diverse topology and flexible message exchange), and inconsistent dataset and model usage in experiments make fair comparisons difficult. In this work, we introduce FedML, an open research library and benchmark that facilitates the development of new federated learning algorithms and fair performance comparisons. FedML supports three computing paradigms (distributed training, mobile on-device training, and standalone simulation) for users to conduct experiments in different system environments. FedML also promotes diverse algorithmic research with flexible and generic API design and reference baseline implementations. A curated and comprehensive benchmark dataset for the non-I.I.D setting aims at making a fair comparison. We believe FedML can provide an efficient and reproducible means of developing and evaluating algorithms for the federated learning research community. We maintain the source code, documents, and user community at this https URL.

Federated Learning Systems: Architecture Alternatives

J. Bosch Hongyi Zhang H. H. Olsson

1 Desember 2020

Machine Learning (ML) and Artificial Intelligence (AI) have increasingly gained attention in research and industry. Federated Learning, as an approach to distributed learning, shows its potential with the increasing number of devices on the edge and the development of computing power. However, most of the current Federated Learning systems apply a single-server centralized architecture, which may cause several critical problems, such as the single-point of failure as well as scaling and performance problems. In this paper, we propose and compare four architecture alternatives for a Federated Learning system, i.e. centralized, hierarchical, regional and decentralized architectures. We conduct the study by using two well-known data sets and measuring several system performance metrics for all four alternatives. Our results suggest scenarios and use cases which are suitable for each alternative. In addition, we investigate the trade-off between communication latency, model evolution time and the model classification performance, which is crucial to applying the results into real-world industrial systems.

Federated Machine Learning: Survey, Multi-Level Classification, Desirable Criteria and Future Directions in Communication and Networking Systems

O. A. Wahab A. Mourad T. Taleb + 1 lainnya

10 Februari 2021

The communication and networking field is hungry for machine learning decision-making solutions to replace the traditional model-driven approaches that proved to be not rich enough for seizing the ever-growing complexity and heterogeneity of the modern systems in the field. Traditional machine learning solutions assume the existence of (cloud-based) central entities that are in charge of processing the data. Nonetheless, the difficulty of accessing private data, together with the high cost of transmitting raw data to the central entity gave rise to a decentralized machine learning approach called Federated Learning. The main idea of federated learning is to perform an on-device collaborative training of a single machine learning model without having to share the raw training data with any third-party entity. Although few survey articles on federated learning already exist in the literature, the motivation of this survey stems from three essential observations. The first one is the lack of a fine-grained multi-level classification of the federated learning literature, where the existing surveys base their classification on only one criterion or aspect. The second observation is that the existing surveys focus only on some common challenges, but disregard other essential aspects such as reliable client selection, resource management and training service pricing. The third observation is the lack of explicit and straightforward directives for researchers to help them design future federated learning solutions that overcome the state-of-the-art research gaps. To address these points, we first provide a comprehensive tutorial on federated learning and its associated concepts, technologies and learning approaches. We then survey and highlight the applications and future directions of federated learning in the domain of communication and networking. Thereafter, we design a three-level classification scheme that first categorizes the federated learning literature based on the high-level challenge that they tackle. Then, we classify each high-level challenge into a set of specific low-level challenges to foster a better understanding of the topic. Finally, we provide, within each low-level challenge, a fine-grained classification based on the technique used to address this particular challenge. For each category of high-level challenges, we provide a set of desirable criteria and future research directions that are aimed to help the research community design innovative and efficient future solutions. To the best of our knowledge, our survey is the most comprehensive in terms of challenges and techniques it covers and the most fine-grained in terms of the multi-level classification scheme it presents.

Training Heterogeneous Client Models using Knowledge Distillation in Serverless Federated Learning

Mohak Chadha Pulkit Khera Osama Abboud + 2 lainnya

11 Februari 2024

Federated Learning (FL) is an emerging machine learning paradigm that enables the collaborative training of a shared global model across distributed clients while keeping the data decentralized. Recent works on designing systems for efficient FL have shown that utilizing serverless computing technologies, particularly Function-as-a-Service (FaaS) for FL, can enhance resource efficiency, reduce training costs, and alleviate the complex infrastructure management burden on data holders. However, existing serverless FL systems implicitly assume a uniform global model architecture across all participating clients during training. This assumption fails to address fundamental challenges in practical FL due to the resource and statistical data heterogeneity among FL clients. To address these challenges and enable heterogeneous client models in serverless FL, we utilize Knowledge Distillation (KD) in this paper. Towards this, we propose novel optimized serverless workflows for two popular conventional federated KD techniques, i.e., FedMD and FedDF. We implement these workflows by introducing several extensions to an open-source serverless FL system called FedLess. Moreover, we comprehensively evaluate the two strategies on multiple datasets across varying levels of client data heterogeneity using heterogeneous client models with respect to accuracy, fine-grained training times, and costs. Results from our experiments demonstrate that server-less FedDF is more robust to extreme non-IID data distributions, is faster, and leads to lower costs than serverless FedMD. In addition, compared to the original implementation, our optimizations for particular steps in FedMD and FedDF lead to an average speedup of 3.5x and 1.76x across all datasets.

FedCV: A Federated Learning Framework for Diverse Computer Vision Tasks

S. Avestimehr Zhenheng Tang Chaoyang He + 8 lainnya

22 November 2021

Federated Learning (FL) is a distributed learning paradigm that can learn a global or personalized model from decentralized datasets on edge devices. However, in the computer vision domain, model performance in FL is far behind centralized training due to the lack of exploration in diverse tasks with a unified FL framework. FL has rarely been demonstrated effectively in advanced computer vision tasks such as object detection and image segmentation. To bridge the gap and facilitate the development of FL for computer vision tasks, in this work, we propose a federated learning library and benchmarking framework, named FedCV, to evaluate FL on the three most representative computer vision tasks: image classification, image segmentation, and object detection. We provide non-I.I.D. benchmarking datasets, models, and various reference FL algorithms. Our benchmark study suggests that there are multiple challenges that deserve future exploration: centralized training tricks may not be directly applied to FL; the non-I.I.D. dataset actually downgrades the model accuracy to some degree in different tasks; improving the system efficiency of federated training is challenging given the huge number of parameters and the per-client memory cost. We believe that such a library and benchmark, along with comparable evaluation settings, is necessary to make meaningful progress in FL on computer vision tasks. FedCV is publicly available: https://github.com/FedML-AI/FedCV.

Daftar Referensi

1 referensi

FedML: A Research Library and Benchmark for Federated Machine Learning

R. Raskar Praneeth Vepakomma + 14 lainnya

27 Juli 2020

Artikel yang Mensitasi

3 sitasi

Blockchain for federated learning toward secure distributed machine learning systems: a systemic survey

Dezhi Han Kuan Ching Li + 6 lainnya

20 November 2021

Federated learning (FL) is a promising decentralized deep learning technology, which allows users to update models cooperatively without sharing their data. FL is reshaping existing industry paradigms for mathematical modeling and analysis, enabling an increasing number of industries to build privacy-preserving, secure distributed machine learning models. However, the inherent characteristics of FL have led to problems such as privacy protection, communication cost, systems heterogeneity, and unreliability model upload in actual operation. Interestingly, the integration with Blockchain technology provides an opportunity to further improve the FL security and performance, besides increasing its scope of applications. Therefore, we denote this integration of Blockchain and FL as the Blockchain-based federated learning (BCFL) framework. This paper introduces an in-depth survey of BCFL and discusses the insights of such a new paradigm. In particular, we first briefly introduce the FL technology and discuss the challenges faced by such technology. Then, we summarize the Blockchain ecosystem. Next, we highlight the structural design and platform of BCFL. Furthermore, we present the attempts ins improving FL performance with Blockchain and several combined applications of incentive mechanisms in FL. Finally, we summarize the industrial application scenarios of BCFL.

FedBEVT: Federated Learning Bird's Eye View Perception Transformer in Road Traffic Systems

Runsheng Xu Rui Song + 3 lainnya

4 April 2023

Bird's eye view (BEV) perception is becoming increasingly important in the field of autonomous driving. It uses multi-view camera data to learn a transformer model that directly projects the perception of the road environment onto the BEV perspective. However, training a transformer model often requires a large amount of data, and as camera data for road traffic are often private, they are typically not shared. Federated learning offers a solution that enables clients to collaborate and train models without exchanging data but model parameters. In this article, we introduce FedBEVT, a federated transformer learning approach for BEV perception. In order to address two common data heterogeneity issues in FedBEVT: (i) diverse sensor poses, and (ii) varying sensor numbers in perception systems, we propose two approaches – Federated Learning with Camera-Attentive Personalization (FedCaP) and Adaptive Multi-Camera Masking (AMCM), respectively. To evaluate our method in real-world settings, we create a dataset consisting of four typical federated use cases. Our findings suggest that FedBEVT outperforms the baseline approaches in all four use cases, demonstrating the potential of our approach for improving BEV perception in autonomous driving.

Towards Personalized Federated Learning

A. Tan Qiang Yang + 2 lainnya

1 Maret 2021

In parallel with the rapid adoption of artificial intelligence (AI) empowered by advances in AI research, there has been growing awareness and concerns of data privacy. Recent significant developments in the data regulation landscape have prompted a seismic shift in interest toward privacy-preserving AI. This has contributed to the popularity of Federated Learning (FL), the leading paradigm for the training of machine learning models on data silos in a privacy-preserving manner. In this survey, we explore the domain of personalized FL (PFL) to address the fundamental challenges of FL on heterogeneous data, a universal characteristic inherent in all real-world datasets. We analyze the key motivations for PFL and present a unique taxonomy of PFL techniques categorized according to the key challenges and personalization strategies in PFL. We highlight their key ideas, challenges, opportunities, and envision promising future trajectories of research toward a new PFL architectural design, realistic PFL benchmarking, and trustworthy PFL approaches.