A Multi-Core Controller for an Embedded AI System Supporting Parallel Recognition
Abstrak
Recent advances in artificial intelligence (AI) technology encourage the adoption of AI systems for various applications. In most deployments, AI-based computing systems adopt the architecture in which the central server processes most of the data. This characteristic makes the system use a high amount of network bandwidth and can cause security issues. In order to overcome these issues, a new AI model called federated learning was presented. Federated learning adopts an architecture in which the clients take care of data training and transmit only the trained result to the central server. As the data training from the client abstracts and reduces the original data, the system operates with reduced network resources and reinforced data security. A system with federated learning supports a variety of client systems. To build an AI system with resource-limited client systems, composing the client system with multiple embedded AI processors is valid. For realizing the system with this architecture, introducing a controller to arbitrate and utilize the AI processors becomes a stringent requirement. In this paper, we propose an embedded AI system for federated learning that can be composed flexibly with the AI core depending on the application. In order to realize the proposed system, we designed a controller for multiple AI cores and implemented it on a field-programmable gate array (FPGA). The operation of the designed controller was verified through image and speech applications, and the performance was verified through a simulator.
Artikel Ilmiah Terkait
J. Bosch Hongyi Zhang H. H. Olsson
1 Desember 2020
Machine Learning (ML) and Artificial Intelligence (AI) have increasingly gained attention in research and industry. Federated Learning, as an approach to distributed learning, shows its potential with the increasing number of devices on the edge and the development of computing power. However, most of the current Federated Learning systems apply a single-server centralized architecture, which may cause several critical problems, such as the single-point of failure as well as scaling and performance problems. In this paper, we propose and compare four architecture alternatives for a Federated Learning system, i.e. centralized, hierarchical, regional and decentralized architectures. We conduct the study by using two well-known data sets and measuring several system performance metrics for all four alternatives. Our results suggest scenarios and use cases which are suitable for each alternative. In addition, we investigate the trade-off between communication latency, model evolution time and the model classification performance, which is crucial to applying the results into real-world industrial systems.
Q. Lu Liming Zhu Hye-young Paik + 1 lainnya
22 Juni 2021
Federated learning is an emerging machine learning paradigm that enables multiple devices to train models locally and formulate a global model, without sharing the clients' local data. A federated learning system can be viewed as a large-scale distributed system, involving different components and stakeholders with diverse requirements and constraints. Hence, developing a federated learning system requires both software system design thinking and machine learning knowledge. Although much effort has been put into federated learning from the machine learning perspectives, our previous systematic literature review on the area shows that there is a distinct lack of considerations for software architecture design for federated learning. In this paper, we propose FLRA, a reference architecture for federated learning systems, which provides a template design for federated learning-based solutions. The proposed FLRA reference architecture is based on an extensive review of existing patterns of federated learning systems found in the literature and existing industrial implementation. The FLRA reference architecture consists of a pool of architectural patterns that could address the frequently recurring design problems in federated learning architectures. The FLRA reference architecture can serve as a design guideline to assist architects and developers with practical solutions for their problems, which can be further customised.
Kristof Van Beeck J. Lemeire N. Mentens + 10 lainnya
1 Agustus 2020
New achievements in Artificial Intelligence (AI) and Machine Learning (ML) are reported almost daily by the big companies. While those achievements are accomplished by fast and massive data processing techniques, the potential of embedded machine learning, where intelligent algorithms run in resource-constrained devices rather than in the cloud, is still not understood well by the majority of the industrial players and Small and Medium Entereprises (SMEs). Nevertheless, the potential embedded machine learning for processing high-performance algorithms without relying on expensive cloud solutions is perceived as very high. This potential has led to a broad demand by industry and SMEs for a practical and application-oriented feasibility study, which helps them to understand the potential benefits, but also the limitations of embedded AI. To address these needs, this paper presents the approach of the AITIA project, a consortium of four Universities which aims at developing and demonstrating best practices for embedded AI by means of four industrial case studies of high-relevance to the European industry and SMEs: sensors, security, automotive and industry 4.0.
Wathiq Mansoor Yassine Himeur A. Amira + 5 lainnya
24 Agustus 2023
Computer Vision (CV) is playing a significant role in transforming society by utilizing machine learning (ML) tools for a wide range of tasks. However, the need for large-scale datasets to train ML models creates challenges for centralized ML algorithms. The massive computation loads required for processing and the potential privacy risks associated with storing and processing data on central cloud servers put these algorithms under severe strain. To address these issues, federated learning (FL) has emerged as a promising solution, allowing privacy preservation by training models locally and exchanging them to improve overall performance. Additionally, the computational load is distributed across multiple clients, reducing the burden on central servers. This paper presents, to the best of the authors' knowledge, the first review discussing recent advancements of FL in CV applications, comparing them to conventional centralized training paradigms. It provides an overview of current FL applications in various CV tasks, emphasizing the advantages of FL and the challenges of implementing it in CV. To facilitate this, the paper proposes a taxonomy of FL techniques in CV, outlining their applications and security threats. It also discusses privacy concerns related to implementing blockchain in FL schemes for CV tasks and summarizes existing privacy preservation methods. Moving on, the paper identifies open research challenges and potential future research directions to further exploit the potential of FL and blockchain in CV applications.
L. Krupp P. Gembaczka Lars Wulfert + 4 lainnya
18 Januari 2024
Edge Artificial Intelligence (AI) relies on the integration of Machine Learning (ML) into even the smallest embedded devices, thus enabling local intelligence in real-world applications, e.g. for image or speech processing. Traditional Edge AI frameworks lack important aspects required to keep up with recent and upcoming ML innovations. These aspects include low flexibility concerning the target hardware and limited support for custom hardware accelerator integration. Artificial Intelligence for Embedded Systems Framework (AIfES) has the goal to overcome these challenges faced by traditional edge AI frameworks. In this paper, we give a detailed overview of the architecture of AIfES and the applied design principles. Finally, we compare AIfES with TensorFlow Lite for Microcontrollers (TFLM) on an ARM Cortex-M4-based System-on-Chip (SoC) using fully connected neural networks (FCNNs) and convolutional neural networks (CNNs). AIfES outperforms TFLM in both execution time and memory consumption for the FCNNs. Additionally, using AIfES reduces memory consumption by up to 54% when using CNNs. Furthermore, we show the performance of AIfES during the training of FCNN as well as CNN and demonstrate the feasibility of training a CNN on a resource-constrained device with a memory usage of slightly more than 100 kB of RAM.
Daftar Referensi
1 referensiArtikel yang Mensitasi
0 sitasiTidak ada artikel yang mensitasi.