ScaledDETR: An alight weight object detection model for autonomous driving

Ali Elhenidy, A Teaching assistant in computer engineering and control systems, faculty of engineering, Mansoura university, EgyptFollow
Labib Mohamed, A professeur in computer engineering and control systems, faculty of engineering, Mansoura university, Egypt
Amira Yassien, College Vice Dean for Environmental Affairs and Community Service زوA professeur in computer engineering and control systems, faculty of engineering, Mansoura university, Egypt
Mahmoud saafan, An assosiative professeur in computer engineering and control systems, faculty of engineering, Mansoura university, Egypt

Subject Area

Computer and Control Systems Engineering

Article Type

Original Study

Abstract

End-to-end object detection is one of the recent trends in object detection; however, it is time- and memory-consuming due to the Transformer encoder-decoder (TED) module. Detection TRansformer (DETR) is the first end-to-end object detector using a TED architecture. Despite achieving competitive performance, it suffers from slow convergence due to a long sequence of attention and the whole image. In this paper, ScaledDETR is proposed to handle the slow convergence issue in DETR and speed up the training process by implementing end-to-end detection based on the latest efficient backbone with fewer parameters. ScaledDETR proposes an efficient model with fewer parameters by replacing the ResNet backbone with EfficientNet, which is an efficient CNN backbone. The recent Relative Position Encoding (RPE) is adopted rather than standard Position Encoding (PE) which proves to gain 1.3% (AP) improvement. ScaledDETR invokes a simple architecture that runs on a single GPU that could be suitable for autonomous driving applications. The proposed model is trained for 20 epochs, which are 25x fewer than the number of epochs in DETR, and achieves competitive results with state-of-the-art object detection methods. The proposed method achieved 41.7 Ap on the COCO dataset compared with Faster RCNN, which achieved 40.2 Ap.

Keywords

Object detection; Deep learning; DETR; Computer vision; CNN

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Elhenidy, Ali; Mohamed, Labib; Yassien, Amira; and saafan, Mahmoud (2024) "ScaledDETR: An alight weight object detection model for autonomous driving," Mansoura Engineering Journal: Vol. 49 : Iss. 5 , Article 13.
Available at: https://doi.org/10.58491/2735-4202.3238

Download