Text Detection and Recognition for Robot Localization

Z., Raisi; J., Zelek

Text Detection and Recognition for Robot Localization

عنوان مقاله: Text Detection and Recognition for Robot Localization
شناسه ملی مقاله: JR_JECEI-12-1_011
منتشر شده در در سال 1403

مشخصات نویسندگان مقاله:

Z. Raisi - University of Waterloo, Waterloo, Canada and Chabahar Maritime University, Chabahar, Iran.
J. Zelek - Systems Design Engineering Department, University of Waterloo, Canada.

خلاصه مقاله:

kground and Objectives: Signage is everywhere, and a robot should be able to take advantage of signs to help it localize (including Visual Place Recognition (VPR)) and map. Robust text detection & recognition in the wild is challenging due to pose, irregular text instances, illumination variations, viewpoint changes, and occlusion factors.Methods: This paper proposes an end-to-end scene text spotting model that simultaneously outputs the text string and bounding boxes. The proposed model leverages a pre-trained Vision Transformer based (ViT) architecture combined with a multi-task transformer-based text detector more suitable for the VPR task. Our central contribution is introducing an end-to-end scene text spotting framework to adequately capture the irregular and occluded text regions in different challenging places. We first equip the ViT backbone using a masked autoencoder (MAE) to capture partially occluded characters to address the occlusion problem. Then, we use a multi-task prediction head for the proposed model to handle arbitrary shapes of text instances with polygon bounding boxes.Results: The evaluation of the proposed architecture's performance for VPR involved conducting several experiments on the challenging Self-Collected Text Place (SCTP) benchmark dataset. The well-known evaluation metric, Precision-Recall, was employed to measure the performance of the proposed pipeline. The final model achieved the following performances, Recall = ۰.۹۳ and Precision = ۰.۸, upon testing on this benchmark.Conclusion: The initial experimental results show that the proposed model outperforms the state-of-the-art (SOTA) methods in comparison to the SCTP dataset, which confirms the robustness of the proposed end-to-end scene text detection and recognition model.

کلمات کلیدی:

Text detection, Text Recognition, Robotics Localization, Deep Learning, Visual Place Recognition

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/1866023/