Method for optimizing descriptors clustering in the feature database of a content-based image retrieval system

Stanislav Danylenko, Kirill Smelyakov

Abstract


The subject of this study is the method of grouping image descriptors that are placed in the feature database of search systems. This study aims to develop a method for optimizing the clustering of descriptors in Big Data storage, represented by a Multidimensional Cube data model. Further use of the formed clusters for effective search in content-based image retrieval systems. The task is to: analyze modern approaches and solutions for forming groups of image descriptors in the feature database; formulate the problem of the clustering method in the Multidimensional Cube and the requirements for its optimization; develop an abstract optimization method; develop specific optimization algorithms for different types of model placement in memory; develop metrics; perform experiments and compare the results with analogs. The methodology includes analyzing the process of forming groups using different methods, determining their advantages and disadvantages, applying numerical methods to optimize the clustering process, conducting experiments with data sets available on the Internet, evaluating the effectiveness of the optimization method, and generating result tables for comparison with analogs. The following results were obtained: a method for optimizing the clustering of descriptors was developed for use in the Multidimensional Cube data model, and optimization algorithms were developed for model placement in random-access memory and databases. The results of clustering, determined by the metrics of time spent and cluster filling, were compared with those of descriptor clustering performed using the k-means algorithm and the Product Quantization approach with the implementation of Inverted Multi-Index. The results showed that the use of the model with the developed optimizations demonstrates that the quality of descriptor clustering is not worse than when using Inverted Multi-Index and better in terms of time spent than when using k-means or Inverted Multi-Index. Conclusions: The developed method for optimizing descriptor clustering significantly improves the distribution of descriptors within the Multidimensional Cube model and makes it a good alternative for use in content-based image retrieval systems.

Keywords


information technology; algorithms and data structure; data storage; big data; search system; multidimensional data model; optimization method; clustering; computational complexity; CBIR

Full Text:

PDF

References


Vopson, M. M. The information catastrophe. AIP Advances, 2020, vol. 10, no. 8. DOI: 10.1063/5.0019941.

Badshah, A., Daud, A., Alharbey, R., Banjar, A., Bukhari A., & Alshemaimri, B. Big data applications: overview, challenges and future. Artificial Intelligence Review, 2025, vol. 57, no. 11. DOI: 10.1007/s10462-024-10938-5.

Clissa, L., Lassnig, M., & Rinaldi, L. How big is Big Data? A comprehensive survey of data production, storage, and streaming in science and industry. Frontiers in Big Data, 2023, vol. 6. DOI: 10.3389/fdata.2023.1271639.

Chembian, W. T., Senthilkumar, G., Prasanth, A., & Subash, R. K-means Pelican Optimization Algorithm based Search Space Reduction for Remote Sensing Image Retrieval. Journal of the Indian Society of Remote Sensing, 2025, vol. 53, no. 1, pp. 101–115. DOI: 10.1007/s12524-024-01994-z.

Jatakia, V., Korlahalli, S., & Deulkar, K. A survey of different search techniques for big data. 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), Coimbatore, India, 2017, pp. 1-4. DOI: 10.1109/iciiecs.2017.8275939.

RezaAbbasifard, M., Ghahremani, B., & Naderi, H. A Survey on Nearest Neighbor Search Methods. International Journal of Computer Applications, 2014, vol. 95, no. 25, pp. 39-52. DOI: 10.5120/16754-7073.

Wu, Q., Yu, Y., Zhou, L., Lu, Y., Chen, H., & Qian, X. Storage and Query Indexing Methods on Big Data. Arabian Journal for Science and Engineering, 2023, vol. 49, iss. 5, pp. 7359-7374. DOI: 10.1007/s13369-023-08175-z.

Gupta, D., Loane, R., Gayen, S., & Demner-Fushman, D. Medical image retrieval via nearest neighbor search on pre-trained image features. Knowledge-Based Systems, 2023, vol. 278. DOI: 10.1016/j.knosys.2023.110907.

Alsmadi, M. K. Content-Based Image Retrieval Using Color, Shape and Texture Descriptors and Features. Arabian Journal for Science and Engineering, 2020, vol. 45, pp. 3317-3330. DOI: 10.1007/s13369-020-04384-y.

Li, X., Yang, J., & Ma, J. Recent developments of content-based image retrieval (CBIR). Neurocomputing, 2021, vol. 452, pp. 675-689. DOI: 10.1016/j.neucom.2020.07.139.

Tiwari, V. R. Developments in KD Tree and KNN Searches. International Journal of Computer Applications, 2023, vol. 185, no. 17, pp. 17-23. DOI: 10.5120/ijca2023922879.

Malkov, Y. A., Yashunin, D. A. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2020, vol. 42, no. 4, pp. 824-836. DOI: 10.1109/tpami.2018.2889473.

Liu, J., Zhao, M. & Zhan, C. Deep Representation-Based Fuzzy Graph Model for Content-Based Image Retrieval. International Journal of Fuzzy Systems, 2024, vol. 26, no. 6, pp. 2011–2022. DOI: 10.1007/s40815-024-01682-7.

Jiang, X., & Hu, F. Multi-scale Adaptive Feature Fusion Hashing for Image Retrieval. Arabian Journal for Science and Engineering, 2024, vol. 50, pp. 12027-12036. DOI: 10.1007/s13369-024-09627-w.

Chen, Y., Long, Y., Yang, Z., & Long, J. Unsupervised random walk manifold contrastive hashing for multimedia retrieval. Complex & Intelligent Systems, 2025, vol. 11, no. 4. DOI: 10.1007/s40747-025-01814-y.

Bano, S., & Khan, M. N. A. A Survey of Data Clustering Methods. International Journal of Advanced Science and Technology, 2018, vol. 113, pp. 133-142. DOI: 10.14257/ijast.2018.113.14.

Jégou, H., Douze, M., & Schmid, C. Product Quantization for Nearest Neighbor Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, vol. 33, no. 1, pp. 117-128. DOI: 10.1109/tpami.2010.57.

Babenko, A., & Lempitsky, V. The inverted multi-index. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 2012, pp. 3069-3076. DOI: 10.1109/cvpr.2012.6248038.

Ge, T., He, K., Ke, Q., & Sun, J. Optimized Product Quantization for Approximate Nearest Neighbor Search. 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 2013, pp. 2946¬-2953. DOI: 10.1109/cvpr.2013.379.

Ai, L., Cheng, H., Wang, X., Chen, C., Liu, D., Zheng, X., & Wang, Y. Approximate Nearest Neighbor Search Using Enhanced Accumulative Quantization. Electronics, 2022, vol. 11, no. 14. DOI: 10.3390/electronics11142236.

Danylenko, S., & Smelyakov, S. Development of a Multidimensional Data Model for Efficient Content-based Image Retrieval in Big Data Storage. Radioelectronic and Computer Systems, 2025, vol. 2025, no. 1. pp. 137-152. DOI: 10.32620/reks.2025.1.10.

GitHub, facebookresearch/faiss: A library for efficient similarity search and clustering of dense vectors. Available at: https://github.com/facebookresearch/faiss (accessed 09.02.2025).

Beckert, B., Bubel, R., Drodt, D., Hähnle, R., Lanzinger, F., Pfeifer, W., Ulbrich, M., & Weigl, A. The Java Verification Tool KeY: A Tutorial. Lecture Notes in Computer Science, Springer Nature Switzerland, Cham, 2024, pp. 597-623. DOI: 10.1007/978-3-031-71177-0_32.

COCO, Common Objects in Context. Available at: https://cocodataset.org/#home (accessed 09.02.2025)




DOI: https://doi.org/10.32620/reks.2025.3.10

Refbacks

  • There are currently no refbacks.