Method for matching satellite and UAV images for visual place recognition with cross-view color normalization

Volodymyr Vozniak, Oleksander Barmak, Iurii Krak

Abstract


The subject of this article is visual place recognition (VPR), specifically matching satellite images with images captured by unmanned aerial vehicles (UAVs). VPR is critical for autonomous UAV navigation, particularly in GPS-denied environments such as urban canyons or areas with significant infrastructure coverage where GNSS signals are unreliable. Despite its practical importance, accurately matching UAV images to satellite imagery remains challenging due to significant viewpoint, scale, illumination, and texture discrepancies. Traditional approaches that rely on handcrafted descriptors or classical local features often fail under such cross-view conditions. This study aims to design a robust visual place recognition method for matching UAV and satellite imagery, employing deep learning-based embeddings and advanced color normalization to improve reliability across cross-view scenarios. The tasks addressed in this article are: firstly, designing a YOLO-based method is designed for extracting global image embeddings, which utilizes YOLO’s multi-scale feature extraction capabilities to encode semantically significant landmarks in the scene. Second, a novel preprocessing technique based on aligning statistical color distributions between UAV and satellite images was developed and implemented to enhance their visual congruence. Finally, these components are integrated into a complete VPR system and evaluated for effectiveness using the challenging VPAIR dataset, emphasizing urban settings. The methods employed include deep learning techniques, particularly fine-tuning a YOLO11 neural network on a dataset specifically annotated for building segmentation. Statistical alignment techniques based on cumulative distribution functions (CDF) were used to standardize image appearances between the two distinct image domains. Conclusions. The experiments demonstrate significant improvements in UAV-to-satellite image matching performance using the proposed method. Fine-tuning YOLO11 specifically for building segmentation resulted in a robust embedding generation method that achieved high segmentation accuracy (F1-score of 0.722). The color preprocessing technique further improved the recognition performance, with Recall@1 reaching 19.5% for urban terrain within a localization radius of 3, substantially outperforming the traditional methods. This study provides an effective solution for UAV localization tasks, particularly in complex urban environments, highlighting the importance of integrated embedding extraction and domain-specific image preprocessing in cross-view visual place recognition.

Keywords


visual place recognition; UAV; YOLO; image preprocessing; deep learning; image segmentation

Full Text:

PDF

References


Wang, Y., Feng, X., Li, F., Xian, Q., Jia, Z.-H., Du, Z., & Liu, C. Lightweight visual localization algorithm for UAVs. Scientific Reports, 2025, vol. 15, no. 1, article no. 6069. DOI: 10.1038/s41598-025-88089-y.

Cui, Z., Zhou, P., Wang, X., Zhang, Z., Li, Y., Li, H., & Zhang. Y. A Novel Geo-Localization Method for UAV and Satellite Images Using Cross-View Consistent Attention. Remote Sensing, 2023, vol. 15, no. 19, article no. 4667. DOI: 10.3390/rs15194667.

Yao, Y., Sun, C., Wang, T., Yang, J., & Zheng, E. UAV Geo-Localization Dataset and Method Based on Cross-View Matching. Sensors, 2024, vol. 24, no. 21, article no. 6905. DOI: 10.3390/s24216905.

Fesenko, H. V., & Kharchenko, V. S. Vyznachennya optymalʹnoho marshrutu oblʹotu zadanykh tochok terytoriyi potentsiyno nebezpechnoho obʺyektu flotom BPLA [Determination of an optimal route for flight over of specified points of a potentially dangerous object territory by UAV fleet]. Radioelectronic and Computer Systems, 2019, no. 3, pp. 63-72. DOI: 10.32620/reks.2019.3.07. (In Ukrainian).

Fan, J., Zheng, E., He, Y., & Yang, J. A Cross-View Geo-Localization Algorithm Using UAV Image and Satellite Image. Sensors, 2024, vol. 24, no. 12, article no. 3719. DOI: 10.3390/s24123719.

Tsekhmystro, R., Rubel, O., Prysiazhniuk, O., & Lukin, V. Impact of distortions in UAV images on quality and accuracy of object localization. Radioelectronic and Computer Systems, 2024, vol. 2024, no. 4, pp. 59-67. DOI: 10.32620/reks.2024.4.05.

Karapet, B., Savitskyi, R., & Vakaliuk, T. Method of comparing and transforming images obtained using UAV. Radioelectronic and Computer Systems, 2024, vol. 2024, no. 1, pp. 99-115. DOI: 10.32620/reks.2024.1.09.

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conf. Comput. Vision and Pattern Recognition (CVPR), 2016, pp. 779–788. DOI: 10.1109/CVPR.2016.91.

Lowry, S., Sunderhauf, N., Newman, P., Leonard, J. J., Cox, D., Corke, P., & Milford, M. Visual Place Recognition: A Survey. IEEE Transactions on Robotics, 2016, vol. 32, no. 1, pp. 1–19. DOI: 10.1109/TRO.2015.2496823.

Cummins, M., & Newman, P. FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance. The International Journal of Robotics Research, 2008, vol. 27, no. 6, pp. 647–665. DOI: 10.1177/0278364908090961.

Galvez-López, D., & Tardos, J. D. Bags of Binary Words for Fast Place Recognition in Image Sequences. IEEE Transactions on Robotics, 2012, vol. 28, no. 5, pp. 1188–1197. DOI: 10.1109/TRO.2012.2197158.

Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., & Sivic, J. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, vol. 40, iss. 6, pp. 1437-1451 DOI: 10.1109/TPAMI.2017.2711011.

Hausler, S., Garg, S., Xu, M., Milford, M., & Fischer, T. Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 14136-14147. DOI: 10.1109/CVPR46437.2021.01392.

Workman, S., Souvenir, R., & Jacobs, N. Wide-Area Image Geolocalization With Aerial Reference Imagery. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, pp. 3961-3969. DOI: 10.1109/ICCV.2015.451.

Lin, T.-Y., Cui, Y., Belongie, S., & Hays, J. Learning Deep Representations for Ground-to-Aerial Geolocalization. 015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 5007-5015. DOI: 10.1109/CVPR.2015.7299135.

Zheng, Z., Wei, Y., & Yang, Y. University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization. Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 1395–1403. DOI: 10.1145/3394171.3413896.

Zhu, S., Yang, T., & Chen, C. VIGOR: Cross-View Image Geo-Localization Beyond One-to-One Retrieval. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 5316-5325. DOI: 10.1109/CVPR46437.2021.00364.

Zhu, R., Yin, L., Yang, M., Wu, F., Yang, Y., & Hu, W. SUES-200: A Multi-Height Multi-Scene Cross-View Image Benchmark Across Drone and Satellite. IEEE Transactions on Circuits and Systems for Video Technology, 2023, vol. 33, no. 9, pp. 4825–4839. DOI: 10.1109/TCSVT.2023.3249204.

Cisneros, I., Yin, P., Zhang, J., Choset, H., & Scherer, S. ALTO: A Large-Scale Dataset for UAV Visual Place Recognition and Localization. arXiv:2207.12317. DOI: 10.48550/arXiv.2207.12317.

Chen, J., Wen, G., Jian, H., & Fan, X. A Visual Localization Benchmark for UAVs in Complex Multi-Terrain Environments. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025, pp. 1–15. DOI: 10.1109/JSTARS.2025.3526695.

Schleiss, M., Rouatbi, F., & Cremers, D. VPAIR -- Aerial Visual Place Recognition and Localization in Large-scale Outdoor Environments. arXiv.2205.11567. DOI: 10.48550/arXiv.2205.11567.

Komorowski, J., Wysoczańska, M., & Trzcinski, T. MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition. 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 2021, pp. 1-8. DOI: 10.1109/IJCNN52387.2021.9533373.

Berton, G., Masone, C., & Caputo, B. Re-thinking Visual Geo-Localization for Large-Scale Applications. arXiv.2204.02287, pp. 1-15. DOI: 10.48550/arXiv.2204.02287.

Ali-Bey, A., Chaib-Draa, B., & Giguère, P. MixVPR: Feature Mixing for Visual Place Recognition. 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2023, pp. 2997-3006. DOI: 10.1109/WACV56688.2023.00301.

Ali-bey, A., Chaib-draa, B., & Giguère, P. GSV-Cities: Toward appropriate supervised visual place recognition. Neurocomputing, 2022, vol. 513, pp. 194–203. DOI: 10.1016/j.neucom.2022.09.127.

Zaffar, M., Ehsan, S., Momeni, L., & et al. VPR-Bench: An Open-Source Visual Place Recognition Evaluation Framework with Quantifiable Viewpoint and Appearance Change. International Journal of Computer Vision, 2021, vol. 129, pp. 2136–2174. DOI: 10.1007/s11263-021-01469-5.

Keetha, N., Mishra, A., Karhade, J., & et al. AnyLoc: Towards Universal Visual Place Recognition. IEEE Robotics and Automation Letters, 2024, vol. 9, no. 2, pp. 1286-1293. DOI: 10.1109/LRA.2023.3343602.

Radford, A., Kim, J. W., Hallacy, C., & et al. Learning Transferable Visual Models From Natural Language Supervision. arXiv.2103.00020, 2021, pp. 1-48. DOI: 10.48550/arXiv.2103.00020.

Oquab, M., Darcet, T., Moutakanni, T., & et al. DINOv2: Learning Robust Visual Features without Supervision. arXiv.2304.07193, 2024. DOI: 10.48550/arXiv.2304.07193.

Berton, G., Trivigno, G., Caputo, B., & Masone, C. EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2023, pp. 11046-11056. DOI: 10.1109/ICCV51070.2023.01017.

Shao, J., & Jiang, L. Style Alignment-Based Dynamic Observation Method for UAV-View Geo-Localization. IEEE Transactions on Geoscience and Remote Sensing, 2023, vol. 61, pp. 1–14, article no. 3000914. DOI: 10.1109/TGRS.2023.3337383.

Gallo, I., Rehman, A. U., Dehkordi, R. H., Landro, N., La Grassa, R., & Boschetti, M. Deep Object Detection of Crop Weeds: Performance of YOLOv7 on a Real Case Dataset from UAV Images. Remote Sensing, 2023, vol. 15, no. 2, article no. 539. DOI: 10.3390/rs15020539.

Wu, T., & Dong, Y. YOLO-SE: Improved YOLOv8 for Remote Sensing Object Detection and Recognition. Applied Sciences, 2023, vol. 13, no. 24, article no. 12977. DOI: 10.3390/app132412977.

Rainio, O., Teuho, J., & Klén, R. Evaluation metrics and statistical tests for machine learning. Scientific Reports, 2024, vol. 14, article no. 6086. DOI: 10.1038/s41598-024-56706-x.

Ultralytics. YOLO11. Available at: https://docs.ultralytics.com/models/yolo11 (accessed 04.05.2025).

Buildings Instance Segmentation - v1 raw-images. Roboflow. Available at: https://universe.roboflow.com/roboflow-universe-projects/buildings-instance-segmentation/dataset/1 (accessed 04.05.2025).

Jocher, G., Qiu, J., & Chaurasia, A. Ultralytics YOLO. Python, Jan. 2023. Available at: https://github.com/ultralytics/ultralytics (accessed 04.05.2025).

Çorbacıoğlu, Ş. K., & Aksel, G. Receiver operating characteristic curve analysis in diagnostic accuracy studies: A guide to interpreting the area under the curve value. Turkish Journal of Emergency Medicine, 2023, vol. 23, no. 4, pp. 195-198. DOI: 10.4103/tjem.tjem_182_23.




DOI: https://doi.org/10.32620/reks.2025.3.11

Refbacks

  • There are currently no refbacks.