FedBEVT: Federated Learning Bird's Eye View Perception Transformer in Road Traffic Systems

Runsheng Xu Rui Song Alois Knoll + 2 penulis

Abstrak

Bird's eye view (BEV) perception is becoming increasingly important in the field of autonomous driving. It uses multi-view camera data to learn a transformer model that directly projects the perception of the road environment onto the BEV perspective. However, training a transformer model often requires a large amount of data, and as camera data for road traffic are often private, they are typically not shared. Federated learning offers a solution that enables clients to collaborate and train models without exchanging data but model parameters. In this article, we introduce FedBEVT, a federated transformer learning approach for BEV perception. In order to address two common data heterogeneity issues in FedBEVT: (i) diverse sensor poses, and (ii) varying sensor numbers in perception systems, we propose two approaches – Federated Learning with Camera-Attentive Personalization (FedCaP) and Adaptive Multi-Camera Masking (AMCM), respectively. To evaluate our method in real-world settings, we create a dataset consisting of four typical federated use cases. Our findings suggest that FedBEVT outperforms the baseline approaches in all four use cases, demonstrating the potential of our approach for improving BEV perception in autonomous driving.

Artikel Ilmiah Terkait

A Survey of Federated Learning for Connected and Automated Vehicles

S. Żak Vishnu Pandi Chellapandi Ziran Wang + 1 lainnya

19 Maret 2023

Connected and Automated Vehicles (CAVs) represent a rapidly growing technology in the automotive domain sector, offering promising solutions to address challenges such as traffic accidents, congestion, and pollution. By leveraging CAVs, we have the opportunity to create a transportation system that is safe, efficient, and environmentally sustainable. Machine learning-based methods are widely used in CAVs for crucial tasks like perception, planning, and control, where machine learning models in CAVs are solely trained with the local vehicle data, and the performance is not certain when exposed to new environments or unseen conditions. Federated learning (FL) is a decentralized machine learning approach that enables multiple vehicles to develop a collaborative model in a distributed learning framework. FL enables CAVs to learn from a broad range of driving environments and improve their overall performances while ensuring the privacy and security of local vehicle data. In this paper, we review the progress accomplished by researchers in applying FL to CAVs. A broader view of various data modalities and algorithms that have been implemented on CAVs is provided. Specific applications of FL are reviewed in detail, and an analysis of research challenges is presented.

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

Zheng Zhu Guan Huang Junjie Huang + 1 lainnya

22 Desember 2021

Autonomous driving perceives its surroundings for decision making, which is one of the most complex scenarios in visual perception. The success of paradigm innovation in solving the 2D object detection task inspires us to seek an elegant, feasible, and scalable paradigm for fundamentally pushing the performance boundary in this area. To this end, we contribute the BEVDet paradigm in this paper. BEVDet performs 3D object detection in Bird-Eye-View (BEV), where most target values are defined and route planning can be handily performed. We merely reuse existing modules to build its framework but substantially develop its performance by constructing an exclusive data augmentation strategy and upgrading the Non-Maximum Suppression strategy. In the experiment, BEVDet offers an excellent trade-off between accuracy and time-efficiency. As a fast version, BEVDet-Tiny scores 31.2% mAP and 39.2% NDS on the nuScenes val set. It is comparable with FCOS3D, but requires just 11% computational budget of 215.3 GFLOPs and runs 9.2 times faster at 15.6 FPS. Another high-precision version dubbed BEVDet-Base scores 39.3% mAP and 47.2% NDS, significantly exceeding all published results. With a comparable inference speed, it surpasses FCOS3D by a large margin of +9.8% mAP and +10.0% NDS. The source code is publicly available for further research at https://github.com/HuangJunJie2017/BEVDet .

VISTA 2.0: An Open, Data-driven Simulator for Multimodal Sensing and Policy Learning for Autonomous Vehicles

D. Rus Tsun-Hsuan Wang Igor Gilitschenski + 5 lainnya

23 November 2021

Simulation has the potential to transform the development of robust algorithms for mobile agents deployed in safety-critical scenarios. However, the poor photorealism and lack of diverse sensor modalities of existing simulation engines remain key hurdles towards realizing this potential. Here, we present VISTA††Full code release for the VISTA data-driven simulation engine is available here: vista.csail.mit.edu., an open source, data-driven simulator that integrates multiple types of sensors for autonomous vehicles. Using high fidelity, real-world datasets, VISTA represents and simulates RGB cameras, 3D LiDAR, and event-based cameras, enabling the rapid generation of novel viewpoints in simulation and thereby enriching the data available for policy learning with corner cases that are difficult to capture in the physical world. Using VISTA, we demonstrate the ability to train and test perception-to-control policies across each of the sensor types and showcase the power of this approach via deployment on a full scale autonomous vehicle. The policies learned in VISTA exhibit sim-to-real transfer without modification and greater robustness than those trained exclusively on real-world data.

Cooperative Perception With V2V Communication for Autonomous Vehicles

H. Ngo Hua Fang Honggang Wang

1 September 2023

Occlusion is a critical problem in the Autonomous Driving System. Solving this problem requires robust collaboration among autonomous vehicles traveling on the same roads. However, transferring the entirety of raw sensors' data among autonomous vehicles is expensive and can cause a delay in communication. This paper proposes a method called Realtime Collaborative Vehicular Communication based on Bird's-Eye-View (BEV) map. The BEV map holds the accurate depth information from the point cloud image while its 2D representation enables the method to use a novel and well-trained image-based backbone network. Most importantly, we encode the object detection results into the BEV representation to reduce the volume of data transmission and make real-time collaboration between autonomous vehicles possible. The output of this process, the BEV map, can also be used as direct input to most route planning modules. Numerical results show that this novel method can increase the accuracy of object detection by cross-verifying the results from multiple points of view. Thus, in the process, this new method also reduces the object detection challenges that stem from occlusion and partial occlusion. Additionally, different from many existing methods, this new method significantly reduces the data needed for transfer between vehicles, achieving a speed of 21.92 Hz for both the object detection process and the data transmission process, which is sufficiently fast for a real-time system.

Autonomous Vehicles Perception (AVP) Using Deep Learning: Modeling, Assessment, Challenges

Rasha F. Kashef Hrag-Harout Jebamikyous

2022

Perception is the fundamental task of any autonomous driving system, which gathers all the necessary information about the surrounding environment of the moving vehicle. The decision-making system takes the perception data as input and makes the optimum decision given that scenario, which maximizes the safety of the passengers. This paper surveyed recent literature on autonomous vehicle perception (AVP) by focusing on two primary tasks: Semantic Segmentation and Object Detection. Both tasks play an important role as a vital component of the vehicle’s navigation system. A comprehensive overview of deep learning for perception and its decision-making process based on images and LiDAR point clouds is discussed. We discussed the sensors, benchmark datasets, and simulation tools widely used in semantic segmentation and object detection tasks, especially for autonomous driving. This paper acts as a road map for current and future research in AVP, focusing on models, assessment, and challenges in the field.

Daftar Referensi

6 referensi

HYDRO-3D: Hybrid Object Detection and Tracking for Cooperative Perception Using 3D LiDAR

W. Liu Zonglin Meng + 3 lainnya

1 Agustus 2023

3D-LiDAR-based cooperative perception has been generating significant interest for its ability to tackle challenges such as occlusion, sparse point clouds, and out-of-range issues that can be problematic for single-vehicle perception. Despite its effectiveness in overcoming various challenges, cooperative perception's performance can still be affected by the aforementioned issues when Connected Automated Vehicles (CAVs) operate at the edges of their sensing range. Our proposed approach called HYDRO-3D aims to improve object detection performance by explicitly incorporating historical object tracking information. Specifically, HYDRO-3D combines object detection features from a state-of-the-art object detection algorithm (V2X-ViT) with historical information from the object tracking algorithm to infer objects. Afterward, a novel spatial-temporal 3D neural network performing global and local manipulations of object-tracking historical data is applied to generate the feature map to enhance object detection. The proposed HYDRO-3D method is comprehensively evaluated on the state-of-the-art V2XSet. The qualitative and quantitative experiment results demonstrate that the HYDRO-3D can effectively utilize the object tracking information and achieve robust object detection performance. It outperforms the SOTA V2X-ViT by 3.7% in AP@0.7 of object detection for CAVs and can also be generalized to single-vehicle object detection with 4.5% improvement in AP@0.7.

PETR: Position Embedding Transformation for Multi-View 3D Object Detection

X. Zhang Tiancai Wang + 2 lainnya

10 Maret 2022

In this paper, we develop position embedding transformation (PETR) for multi-view 3D object detection. PETR encodes the position information of 3D coordinates into image features, producing the 3D position-aware features. Object query can perceive the 3D position-aware features and perform end-to-end object detection. PETR achieves state-of-the-art performance (50.4% NDS and 44.1% mAP) on standard nuScenes dataset and ranks 1st place on the benchmark. It can serve as a simple yet strong baseline for future research. Code is available at \url{https://github.com/megvii-research/PETR}.

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

Zheng Zhu Guan Huang + 2 lainnya

22 Desember 2021

Artikel yang Mensitasi

0 sitasi

Tidak ada artikel yang mensitasi.