The impact of the number representation format in parameterized text queries on the accuracy of 3D model generation

Olesia Barkovska, Liubov Bukharova, Igor Ruban, Oleksii Liashenko, Andriy Kovalenko, Vitalii Serdechnyi

Abstract


The subject matter of the article is the analysis of the artificial intelligence models for generating 3D objects based on a text query with different formats for representing numerical parameters. The goal of the work is to evaluate and compare the effectiveness of the Hunyuan3D 2.0, Michelangelo and SDFusion models depending on the format of the text input queries using the following metrics: generation time, CLIP-Similarity and Chamfer Distance. The tasks to be solved are: 1) to conduct a systematic review of modern generative 3D models regarding their ability to process queries with numerical parameters presented in different formats (decimals, verbal representations, fractions); 2) to conduct a quantitative and qualitative evaluation of generative 3D models using the metrics of generation time, CLIP-Similarity and Chamfer Distance, taking into account the visual similarity to text descriptions and geometric similarity to Ground Truth objects; 3) to analyse the obtained results of generative text-to-3D models evaluation for parameterized 3D objects generation tasks for further implementation in user interface applications. The methods used in this work are machine learning methods, methods of vector representation of text and images, statistical methods of evaluating results, and a heuristic method of forming text queries. The results show that there is no universal generative 3D model that is capable of creating objects that fully correspond to a parameterised text query. The proposed methodology enables the formation of input text sequences containing such numerical representations, including decimals, verbal representations or fractions, for further analysis of the accuracy of object generation. Conclusions. Similar efficiency of training 3D generative models in the joint latent space of text features containing numerical parameters and using datasets with precisely defined geometric characteristics of objects was confirmed. The results show that the Hunuyan3D-2.0 model is suitable for further research and modification to adapt the used methods to create personalised 3D objects with given numerical parameters, such as a prosthetic cover.

Keywords


GenAI; similarity; precision; parameters; model; Chamfer Distance; CLIP-Similarity; text-to-3D

Full Text:

PDF

References


Barkovska, O., Oliinyk, D., Sorokin, A., Zabroda, I., & Sedlaček, P. A system for monitoring the progress of rehabilitation of patients with musculoskeletal disorder. Advanced Information Systems, 2024, vol. 8, no. 3, pp. 13–24. DOI: 10.20998/2522-9052.2024.3.02.

Bukharova, L. D., & Barkovska, O. Y. 3D-generatziya z vykorystannyam opysovykh і chyslovykh zapytiv [3D-generation using descriptive and numerical queries]. Suchasni napryamy rozvytku informatziino-communicatziinykh tekhnologii ta zasobiv upravlinnya – Current Directions of Development of Information and Communication Technologies and Control Tools, 2025, vol. 3, p. 26. DOI: 10.32620/ict.25.t3. (In Ukrainian).

AI in action 2024 report | IBM. IBM - United States, 2024. Available at: https://www.ibm.com/think/reports/ai-in-action (accessed 31.05.2025).

Goehring, B., Goyal, M., Gunnar, R., Marshall, A., & Soffer, A. The Ingenuity of Generative AI at Scale, 2024. Available at: https://www.ibm.com/thought-leadership/institute-business-value/en-us/report/scale-generative-ai. (accessed 31.05.2025).

Shi, Z., Peng, S., Xu, Y., Geiger, A., Liao, Y., & Shen, Y. Deep Generative Models on 3D Representations: A Survey, arXiv, 2023, DOI: 10.48550/arXiv.2210.15663.

Kerbl, B., Kopanas, G., Leimkuehler, T., & Drettakis, G. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics, 2023, vol. 42, no. 4, pp. 1–14. DOI: 10.1145/3592433.

Lee, H., Savva, M., & Chang, A. X. 7.Text-to-3D Shape Generation. Computer Graphics Forum, 2024, vol. 43, no. 2. DOI: 10.1111/cgf.15061.

Wang, Z., Li, D., & Jiang, R. Diffusion Models in 3D Vision: A Survey, ArXiv, 2024. DOI: 10.48550/arXiv.2410.04738.

Fu, R., Zhan, X., Chen, Y., Ritchie, D., & Sridhar, S. ShapeCrafter: A Recursive Text-Conditioned 3D Shape Generation Model. In: S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh, eds. Advances in Neural Information Processing Systems. Curran Associates, Inc., 2022, pp. 8882–8895. DOI: 10.5555/3600270.3600916.

Rashidi, H. H., Pantanowitz, J., Hanna, M. G., Tafti, A. P., Sanghani, P., Buchinsky, A., Fennell, B., Deebajah, M., Wheeler, S., Pearce, T., Abukhiran, I., Robertson, S., Palmer, O., Gur, M., Tran, N. K., & Pantanowitz, L. Introduction to Artificial Intelligence and Machine Learning in Pathology and Medicine: Generative and Nongenerative Artificial Intelligence Basics. Modern Pathology, 2025, vol. 38, no. 4, article no. 100688. DOI: 10.1016/j.modpat.2024.100688.

Kamath, P., Morreale, F., Bagaskara, P. L., Wei, Y., & Nanayakkara, S. Sound Designer-Generative AI Interactions: Towards Designing Creative Support Tools for Professional Sound Designers. Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. New York, NY, USA: Association for Computing Machinery, 2024. DOI: 10.1145/3613904.3642040.

Nikolenko, S. I. Synthetic Data for Deep Learning. Springer International Publishing, 2021. DOI: 10.1007/978-3-030-75178-4.

Strothotte, T., & Schlechtweg, S. 7 – Geometric Models and Their Exploitation in NPR. In: T. Strothotte and S. Schlechtweg, eds. Non-Photorealistic Computer Graphics. San Francisco: Morgan Kaufmann, 2002, pp. 203–245. DOI: 10.1016/B978-155860787-3/50008-6.

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. Learning Transferable Visual Models From Natural Language Supervision, ArXiv, 2021. DOI: 10.48550/arXiv.2103.00020 (accessed 31.05.2025).

Huang, J.-H., Zhu, H., Shen, Y., Rudinac, S., & Kanoulas, E. Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models. MultiMedia Modeling: 31st International Conference on Multimedia Modeling, 2025, Nara, Japan, January 8–10, pp. 413–427. DIO: 10.1007/978-981-96-2071-5_30.

Tang, J., Chen, Z., Chen, X., Wang, T., Zeng, G., & Liu, Z. LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation. ArXiv, 2024. Available from: https://arxiv.org/abs/2402.05054 (accessed 31.05.2025).

Tang, J., Ren, J., Zhou, H., Liu, Z., & Zeng, G. DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation. The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024, arXiv, 2024. DOI: 10.48550/arXiv.2309.16653.

Lee, H.-H., Savva, M., & Chang, A. X. Text-to-3D Shape Generation. arXiv, 2024. Available from: https://arxiv.org/abs/2403.13289 (accessed 31.05.2025).

Chen, L., Wang, Z., Zhou, Z., Gao, T., Su, H., Zhu, J., & Li, C. MicroDreamer: Zero-shot 3D Generation in ~20 Seconds by Score-based Iterative Reconstruction, CoRR. arXiv, 2024. Available from: https://arxiv.org/abs/2404.19525 (accessed 31.05.2025).

Xiang, J., Lv, Z., Xu, S., Deng, Y., Wang, R., Zhang, B., Chen, D., Tong, X., & Yang, J. Structured 3D Latents for Scalable and Versatile 3D Generation, CoRR, 2024, Available from: 10.48550/arXiv.2412.01506.

Chen, R., Chen, Y., Jiao, N., & Jia, K. Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 22246–22256. DOI: 10.48550/arXiv.2303.1387.

Qiu, L., Chen, G., Gu, X., Zuo, Q., Xu, M., Wu, Y., Yuan, W., Dong, Z., Bo, L., & Han, X. Richdreamer: A generalizable normal-depth diffusion model for detail richness in text-to-3d. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 9914–9925. DOI: 10.48550/arXiv.2311.16918.

Li, J., Tan, H., Zhang, K., Xu, Z., Luan, F., Xu, Y., Hong, Y., Sunkavalli, K., Shakhnarovich, G., & Bi, S. Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model. The Twelfth International Conference on Learning Representations, ICLR 2024, arXiv. DOI: 10.48550/arXiv.2311.06214.

Li, M., Zhou, P., Liu, Keppo, J., Lin, M., Yan, S., & Xu, X. Instant3D: Instant Text-to-3D Generation. International Journal of Computer Vision, 2024, vol. 132, pp. 4456–4472. DOI: 10.1007/s11263-024-02097-5.

Chen, C., Yang, X., Yang, F., Feng, C., Fu, Z., Foo, C.-S., Lin, G., & Liu, F. Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior. EEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 10228–10237. DOI: 10.1109/cvpr52733.2024.00974.

Ren, X., Huang, J., Zeng, X., Museth, K., Fidler, S., & Williams, F. XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2024, pp. 4209–4219. DOI: 10.1109/CVPR52733.2024.00403.

Huang, T., Zeng, Y., Zhang, Z., Xu, W., Xu, H., Xu, S., Lau, R. W., & Zuo, W. Dreamcontrol: Control-based text-to-3d generation with 3D self-prior. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 5364–5373. DOI: 10.1109/CVPR52733.2024.00513.

Zhang, L., & et al. CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets. ACM Transactions on Graphics, 2024, vol. no. 4, pp. 1–20. DOI: 10.1145/3658146.

Hyper3d.ai. Hyper3D. Available at: https://hyper3d.ai/ (accessed 30.05.2025).

Zhao, Z., & et al. Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation. arXiv, 2025. Available at: http://arxiv.org/abs/2501.12202. (accessed 30.05.2025).

Zhao, Z., Liu, W., Chen, X., Zeng, X., Wang, R., Cheng, P., FU, B., Chen, T., YU, G., & Gao, S. Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation. In: Thirty-seventh Conference on Neural Information Processing Systems, arXiv, 2023. DOI: 10.48550/arXiv.2306.17115.

Li, W., Liu, J., Yan, H., Chen, R., Liang, Y., Chen, X., Tan, P., & Long, X. CraftsMan3D: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner. arXiv, 2024. DOI: 10.48550/arXiv.2405.14979.

Cheng, Y.-C., & et al. SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2023, pp. 4456–4465. DOI: 10.1109/cvpr52729.2023.00433.

Leng, Z., Birdal, T., Liang, X., & Tombari, F. HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 19691-19700. DOI: 10.1109/CVPR52733.2024.01862.

Lin, F., Hou, S., Liu, H., Gao, S., Yamada, K. D., Zhang, H. K., & Zhang, Z. Hyperbolic Chamfer Distance for Point Cloud Completion and Beyond. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2024, pp. 14549–14560. DOI: 10.1109/ICCV51070.2023.01342.

Koffeman, A. Wooden Bookcase by Akoffeman (3D model). 2020. Available from: https://www.thingiverse.com/thing:4500930 (accessed 30.05.2025).

Replica Heerenhuis Rapture Coffee Table Model, Black - Poliigon. Poliigon (3D model). Available at: https://www.poliigon.com/model/replica-heerenhuis-rapture-coffee-table-model-black/4285 (accessed 30.05.2025).

Hultgren, K. Three 1:24 Windsor Chairs. (3D Model). Available at: https://www.thingiverse.com/thing:21999 (accessed 30.05.2025)




DOI: https://doi.org/10.32620/reks.2025.3.02

Refbacks

  • There are currently no refbacks.