Zhang, Lei
Yu, Xiaohan
Article History
Received: 20 March 2025
Accepted: 10 August 2025
First Online: 3 September 2025
Declarations
:
: Multi-modal Fusion: The proposed framework fuses RGB and depth data using a multi-modal prompt generator (MPG) and a feature adapter (MFA), leading to highly accurate semantic segmentation with minimal additional parameters. Adaptive Motion Handling: A novel motion-level initialization strategy, coupled with cross-frame motion propagation, effectively differentiates dynamic elements from static scene components, thereby reducing dynamic disturbances. Robust Pose Optimization: Integration of a weighted static constraint into the pose refinement process ensures enhanced localization accuracy even in challenging, dynamic environments. Comprehensive Validation: Extensive experiments on both TUM RGB-D and Bonn RGB-D datasets confirm the system’s superior performance in both global trajectory alignment and local motion consistency, paving the way for robust SLAM applications in real-world dynamic scenarios.