Supervised data extraction from transformer representation of Lambda-terms

Oleksandr Deineha

Abstract


The object of this research is the process of compiler optimization, as it is essential in modern software development, particularly in functional programming languages like Lambda Calculus. Optimization strategies directly impact interpreter and compiler performance, influencing resource efficiency and program execution. While functional programming compilers have garnered less attention regarding optimization efforts than their object-oriented counterparts, Lambda Calculus’s complexity poses unique challenges. Bridging this gap requires innovative approaches like leveraging machine learning techniques to enhance optimization strategies. This study focuses on leveraging machine learning to bridge the optimization gap in functional programming, particularly within the context of Lambda Calculus. This study delves into the extraction features from Lambda terms related to reduction strategies by applying machine learning. Previous research has explored various approaches, including analyzing reduction step complexities and using sequence analysis Artificial Neural Networks (ANNs) with simplified term representation. This research aims to develop a methodology for extracting comprehensive term data and providing insights into optimal reduction priorities by employing Large Language Models (LLMs). Tasks were set to generate embeddings from Lambda terms using LLMs, train ANN models to predict reduction steps, and compare results with simplified term representations. This study employs a sophisticated blend of machine learning algorithms and deep learning models as a method of analyzing and predicting optimal reduction paths in Lambda Calculus terms. The result of this study is a method that showed improvement in determining the number of reduction steps by using embeddings. Conclusions: The findings of this research offer significant implications for further advancements in compiler and interpreter optimization. This study paves the way for future research to enhance compiler efficiency by demonstrating the efficacy of employing LLMs to prioritize normalization strategies. Using machine learning in functional programming optimization opens avenues for dynamic optimization strategies and comprehensive analysis of program features.


Keywords


Lambda Calculus; functional programming language; strategy optimization; Large Language Model; code embeddings.

Full Text:

PDF

References


Tanabe, Y., Lubis, L. A., Aotani, T., & Masuhara, H. A Functional Programming Language with Versions. The Art, Science, and Engineering of Programming, 2021, vol. 6, issue 1, article 5. DOI: 10.22152/programming-journal.org/2022/6/5.

Deineha, O., Donets, V., & Zholtkevych, G. On Randomization of Reduction Strategies for Typeless Lambda Calculus. Communications in Computer and Information Science, 2023, no. 1980, pp. 25–38. Available at: https://icteri.org/icteri-2023/proceedings/preview/01000021.pdf (accessed 08.03.2024).

Deineha, O., Donets, V., & Zholtkevych, G. Estimating Lambda-Term Reduction Complexity with Regression Methods. International Conference "Information Technology and Interactions", 2023, no. 3624, pp. 147–156. Available at: https://ceur-ws.org/Vol-3624/Paper_13.pdf (accessed 08.03.2024).

Runciman, C., & Wakeling, D. Heap Profiling of a Lazy Functional Compiler. Functional Programming, 1992, pp 203–214. DOI: 10.1007/978-1-4471-3215-8_18.

Chlipala, A. An optimizing compiler for a purely functional web-application language. Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming, 2015, pp. 10-21. DOI: 10.1145/2784731.2784741.

Deineha, O., Donets, V., & Zholtkevych, G. Unsupervised Data Extraction from Transformer Representation of Lambda-Terms. Eastern European Journal of Enterprise Technology, 2024. In press.

Clemens Grabmayer. Linear Depth Increase of Lambda Terms along Leftmost-Outermost Beta-Reduction. ArXiv: Computer Science, 2019. DOI: 10.48550/arXiv.1604.07030.

Asada, K., Kobayashi, N., Sin'ya, R., & Tsukada, T. Almost Every Simply Typed Lambda-Term Has a Long Beta-Reduction Sequence. Logical Methods in Computer Science, 2019, vol. 15, issue 1. DOI: 10.23638/LMCS-15(1:16)2019.

Lago, U. D., & Vanoni, G. On randomised strategies in the λ-calculus. Theoretical Computer Science, 2020, vol. 813, pp. 100-116. DOI: 10.1016/j.tcs.2019.09.033.

Xiaochu, Qi. Reduction Strategies in Lambda Term Normalization and their Effects on Heap Usage. ArXiv: Computer Science, 2004. Available at: https://arxiv.org/abs/cs/0405075 (accessed 08.03.2024).

Deineha, O., Donets, V., & Zholtkevych, G. Deep Learning Models for Estimating Number of Lambda-Term Reduction Steps. ProfIT AI 2023: 3rd International Workshop of IT-professionals on Artificial Intelligence (ProfIT AI 2023), 2023, vol. 3624, pp. 147-156. Available at: https://ceur-ws.org/Vol-3641/paper12.pdf (accessed 08.03.2024).

Yang, Z., Ding, M., Lv, Q., Jiang, Z., He, Z., Guo, Y., Bai, J., & Tang, J. GPT Can Solve Mathematical Problems Without a Calculator. ArXiv: Computer science, Machine Learning, 2023. DOI: 10.48550/arXiv.2309.03241.

Liu, C., Lu, S., Chen, W., Jiang, D., Svyatkovskiy, A., Fu, S., Sundaresan, N., & Duan, N. Code Execution with Pre-trained Language Models. Accepted to the Findings of ACL 2023, 2023, pp. 4984-4999. DOI: 10.48550/arXiv.2305.05383.

Cummins, C., Seeker, V., Grubisic, D, Elhoushi, M., Liang, Y., Roziere, B., Gehring, J., Gloeckle, F., Hazelwood, K., Synnaeve, G., & Leather, H. Large Language Models for Compiler Optimization. ArXiv: Computer science, 2023. DOI: 10.48550/arXiv.2309.07062.

Miranda, B., Shinnar, A., Pestun, V., & Trager, B. Transformer Models for Type Inference in the Simply Typed Lambda Calculus: A Case Study in Deep Learning for Code. Computer Science, 2023. DOI: 10.48550/arXiv.2304.10500.

LeCun, Y., Bengio, Y., & Hinton, G. Deep Learning. Nature, 2021, no. 521, pp. 436-444. DOI: 10.1038/nature14539.

Li, Y., Zhang, Y., & Sun, L. MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents. ArXiv, 2023. DOI: 10.48550/arXiv.2310.06500.

Asudani, D. S., Nagwani, N. K., & Singh, P. Impact of word embedding models on text analytics in deep learning environment: a review. Artificial Intelligence, 2023, vol. 56, pp. 10345-10425. DOI: 10.1007/s10462-023-10419-1.

Turing, A. M. Computability and λ-Definability. The Journal of Symbolic Logic, 1937, vol. 2, iss. 4, pp. 153-163. DOI: 10.2307/2268280.

Vaswani, A., Shazeer, N. M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. Attention is All you Need. Neural Information Processing Systems, 2017, vol. 30. DOI: 10.48550/arXiv.1706.03762.

Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., Liu, P., Nie, J., & Wen, J. A Survey of Large Language Models. ArXiv: Computer science, 2023. DOI: 10.48550/arXiv.2303.18223.

Ormerod, M., Del Rincón, M. J., & Devereux, B. How is a “Kitchen Chair” like a “Farm Horse”? Exploring the Representation of Noun-Noun Compound Semantics in Transformer-based Language Models. Computational Linguistics, 2024, vol. 50, iss. 1, pp. 49-81. DOI: 10.1162/coli_a_00495.

Kowsari, K., Meimandi, K. J., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. Text Classification Algorithms: A Survey. Information, 2019, vol. 10, iss. 4: article no. 150. DOI: 10.3390/info10040150.

Compare the different Sequence models (RNN, LSTM, GRU, and Transformers). Machine Learning Resources, March 2024. Available at: https://aiml.com/compare-the-different-sequence-models-rnn-lstm-gru-and-transformers/ (accessed 08.03.2024).

Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., & Zhou, M. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. Findings of the Association for Computational Linguistics: EMNLP 2020, 2020, pp. 1536-1547. DOI: 10.18653/v1/2020.findings-emnlp.139.

New and improved embedding model. Blog, Dec. 2022. Available at: https://openai.com/blog/new-and-improved-embedding-model (accessed 08.03.2024).

Nussbaum, Z., Morris, J.X., Duderstadt, B., & Mulyar, A. Nomic Embed: Training a Reproducible Long Context Text Embedder. ArXiv: Computer science, 2024. DOI: 10.48550/arXiv.2402.01613.

New embedding models and API updates. Blog, Jan 2024. Available at: https://openai.com/blog/new-embedding-models-and-api-updates (accessed 08.03.2024).

Krell, M. M., & Wehbe, B. A First Step Towards Distribution Invariant Regression Metrics. ArXiv: Computer science, 2020. DOI: 10.48550/arXiv.2009.05176.




DOI: https://doi.org/10.32620/reks.2024.2.02

Refbacks

  • There are currently no refbacks.