Indexed by:
Abstract:
In autonomous driving, accurately predicting the future trajectories of surrounding vehicles is essential for reliable navigation and planning. Unlike previous approaches that relied on high-definition maps and vehicle coordinates, recent research seeks to predict the future trajectories of both surrounding and ego vehicles from a bird's-eye view (BEV) perspective, leveraging data from multiple sensors on the vehicle in an end-to-end manner. A key challenge in this context is effectively modeling the spatiotemporal interactions between vehicles. In this paper, we propose a multi-scale spatiotemporal Transformer network that extracts multi-scale features from images and aligns them using a dedicated feature alignment module. We develop a divided space-time attention mechanism to capture spatiotemporal dependencies in the feature sequence. Extensive experiments on the nuScenes dataset demonstrate that the proposed framework achieves superior prediction accuracy compared to prior methods, with further performance gains as more historical information is incorporated. © 2025 IEEE.
Keyword:
Reprint 's Address:
Email:
Source :
Year: 2025
Page: 155-160
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 2
Affiliated Colleges: