AI-Based Object Detection Architectures for Real-Time Precision Targeting Systems: A Comparative Analysis of CNN and Transformer Models

Keshav Tyagi¹, Priyanka Bhutani²
^1,2Department of Computer Science Engineering,
University School of Information, Communication and Technology

Abstract: Object detection Artificial Intelligence Object detection is now a key concept in the contemporary precision targeting
systems utilized in unmanned combat aerial vehicles, missile guidance units, and autonomous surveillance platforms. Although
the convolutional neural network (CNN) detectors dominate the real time implementation, the transformer based ones have
global context modelling which can contribute in improving the robustness of detection in the complex environments.
Nonetheless, the accuracy versus computational efficiency trade-off when deployment limits are put in charge is under
developed. In this paper, we are going to provide a comparative analysis of a typical CNN-based detector (YOLOv5) and a
transformer-based detector (DETR) on controlled runtime on the COCO 2017 validation set. Models were tested on a GPU-based
system and were tested based on the accuracy of the detection, the inference latency, the number of parameters, and its
applicability. It has been shown that YOLOv5 has a much higher real-time throughput and reduced memory overhead, whereas
DETR has better localization consistency with more stringent IoU thresholds. The results indicate that there is a serious
efficiency-context modelling trade-off of architectural selection in precise targeting systems. Despite the advantages of
representation provided by transformer-based models, convolutional detectors have become more operational in operational
scenarios that are latency-sensitive in defence tasks. Future research directions of the hybrid architectures and hardware-sensitive
optimization are also given in the paper.

Keywords: Artificial Intelligence, Object Detection, Precision Targeting, CNN, Transformers, Real-Time Systems, Edge
Computing, Adversarial Robustness.

References:

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, RealTime Object Detection," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, 2016, pp. 779-788, doi: 10.1109/CVPR.2016.91.
W. Liu et al., "SSD: Single Shot MultiBox Detector," in Proc. Eur. Conf. Comput. Vis. (ECCV), Amsterdam, The Netherlands, 2016, pp. 21-37, doi: 10.1007/978-3-319-46448-0_2.
A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection," arXiv:2004.10934, 2020.
G. Jocher et al., "YOLOv5," Ultralytics, GitHub repository, 2023. [Online]. Available: https://github.com/ultralytics/yolov5
L. Zhou et al., "YOLOv8-Based Drone Detection: Performance Analysis," Applied Sciences, vol. 15, no. 2, Art. no. 723, 2025, doi: 10.3390/app15020723.
N. Carion et al., "End-to-End Object Detection with Transformers," in Proc. Eur. Conf. Comput. Vis. (ECCV), Glasgow, U.K., 2020, pp. 213-229.
X. Zhu et al., "Deformable DETR: Deformable Transformers for End-to-End Object Detection," in Proc. Int. Conf. Learn. Represent. (ICLR), 2021.
A. Dosovitskiy et al., "An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale," in Proc. Int. Conf. Learn. Represent. (ICLR), 2021.
H. Zhang et al., "AI-Based Object Recognition Under Adverse Battlefield Conditions," IEEE Access, vol. 12, pp. 21543-21558, 2024, doi: 10.1109/ACCESS.2024.10431245.
X. Li et al., "Edge Al for Real-Time Target Recognition in UAV Systems," Sensors, vol. 25, no. 2, Art. no. 356, 2025, doi: 10.3390/s25020356.
M. Khan et al., "Autonomous Turrets Using YOLO for Target Identification," Defence Technology, vol. 19, 2023, doi: 10.1016/j.dt.2023.11.004.
Y. Wang et al., "Adversarial Robustness of Object Detection Models in Military AI Systems," IEEE Trans. Neural Netw. Learn. Syst., early access, 2024, doi: 10.1109/TNNLS.2024.10345689.
I. J. Goodfellow, J. Shlens, and C. Szegedy, "Explaining and Harnessing Adversarial Examples," in Proc. Int. Conf. Learn. Represent. (ICLR), 2015.
T.-Y. Lin et al., "Microsoft COCO: Common Objects in Context," in Proc. Eur. Conf. Comput. Vis. (ECCV), 2014.
G.-S. Xia et al., "DOTA: A Large-Scale Dataset for Object Detection in Aerial Images," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018.
R. Singh et al., "Ethical and Operational Challenges of AI-Driven Autonomous Weapon Systems," AI & Society, vol. 40, no. 1, pp. 1-15, 2025, doi: 10.1007/s00146-025-01890-3.
B. Mittelstadt et al., "The Ethics of Algorithms: Mapping the Debate," Big Data & Society, vol. 3, no. 2, 2016.
A. Jain and P. Bhutani, "A Study of Context-aware Systems," in Proc. Int. Conf. Innovative Computing & Communication (ICICC), New Delhi, India, Oct. 23, 2024.
N. Verma, P. Bhutani, R. Lalit, and S. Venugopal, "Map Reduce Framework-Assisted Feature Analysis and Adaptive Multiplicative Bi-RNN Using Big Data Analytics for Decision-Making," International Journal of Computational Intelligence Systems, vol. 18, no. 1, pp. 1-30, 2025.
R. Lalit, P. Bhutani, N. Verma, and Y. Sharma, "CNN Based Methods for Crowd Counting - A Comprehensive Study," IJCRT, vol. 11, no. 10, 2023.

IITM Journal of Information Technology

ISSN (P) 2395-5457 | Single Blind Peer Reviewed Journal

Published By

INSTITUTE OF INNOVATION IN TECHNOLOGY & MANAGEMENT
Affiliated to GGSIPU, NAAC Grade ‘A’, ISO 14001:2015, 17020:2012, 21001:2018 & 50001:2018 Certified,

A Grade by GNCTD, A++ Grade by SFRC

AI-Based Object Detection Architectures for Real-Time Precision Targeting Systems: A Comparative Analysis of CNN and Transformer Models

IITM JOURNAL OF INFORMATION TECHNOLOGY