FAST SEARCH FOR NEARBY OBJECTS
Abstract
The subject of study. The article proposes a method of fast search for the closest objects in Euclidean space, which is based on the necessary proximity conditions proved for two metrics. The method involves randomly selecting landmarks, projecting objects into a k-dimensional space, and using special hash structures to speed up the search. It has been experimentally proven that the proposed method outperforms known analogs in terms of execution speed. The aim of the work was to develop an efficient algorithm for finding close objects that can be used in intelligent search engines, big data processing technologies, and learning systems for knowledge clustering. Tasks. To develop a method that would allow to quickly find objects that satisfy the proximity conditions in Euclidean space. Prove the theoretical foundations of the method, including the necessary proximity conditions, and experimentally test its effectiveness in comparison with existing approaches. Consider two cases: search for ordered m-ary trees and for strings using the Levenshtein distance. Methods used. The method is based on randomly selecting landmarks, projecting objects into a k-dimensional space, and using hash structures to filter candidates. To evaluate the effectiveness, experiments have been conducted comparing the execution time of the proposed method with the basic and modified analogs. The theoretical foundations include the proof of proximity conditions that reduce the number of computations. Results. Experiments have shown that the proposed method (Nearest Hash) outperforms the known analogs (Basic and Modified) by 1.67-3.80 times in terms of search speed, depending on the size of the dataset. For trees, we use editing metrics that take into account the operations of renaming, deleting, and inserting nodes. For strings, the Levenshtein distance is used. The results confirm that the method effectively eliminates unnecessary computations at the first stage and speeds up the search due to the hash matrix. The scientific novelty of the work lies in the introduction of the necessary proximity conditions that allow to increase the search efficiency. The proposed method can be used in search engines, big data processing, and educational technologies. Further research will be aimed at optimizing the choice of landmarks and expanding the application of the method in practical problems.
Keywords
Full Text:
PDF (Українська)References
Bustos, B., Navarro, G., Chavez, E.: Pivot selection techniques for proximity searching in metric spaces. Pattern Recognition Letters 24(14), 2357-2366 (2003).
Batko, M., Falchi, F., Lucchese, C. and others: Building a web-scale image similarity search system. Multimedia Tools and Applications 3(47), 599 – 629 (2010).
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity search: the metric space approach. Springer, New York (2006).
Chukhray, A. G.: Quick Search Method "similar" relational tuples relations Radioelectronic and Computer Systems 2 (2), 64 – 69 (2003)
Kulik, A., Chukhray, A., Zavgorodniy, A.: Similar strings detecting methods. In: Proceedings of the East-West Fuzzy Colloquium, pp. 38 – 47. IPM, Zittau, Germany (2005)
Selkow, S. M. : The tree-to-tree editing problem. Information Processing Leters 6(6), 184–186 (1977).
Tai, K. C.: The tree-to-tree correction problem. Journal of the ACM 26(3), 422 – 433 (1979).
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. Society for Industrial and AppliedMathematics Journal on Computing 18(6), 1245 – 1262 (1989).
DOI: https://doi.org/10.32620/oikit.2025.105.16
Refbacks
- There are currently no refbacks.