Performance prediction for chosen types of loops over one-dimensional arrays with embedding-driven intermediate representations analysis

 pdf (1431K)

The method for mapping of intermediate representations (IR) set of C, C++ programs to vector embedding space is considered to create an empirical estimation framework for static performance prediction using LLVM compiler infrastructure. The usage of embeddings makes programs easier to compare due to avoiding Control Flow Graphs (CFG) and Data Flow Graphs (DFG) direct comparison. This method is based on transformation series of the initial IR such as: instrumentation — injection of artificial instructions in an instrumentation compiler’s pass depending on load offset delta in the current instruction compared to the previous one, mapping of instrumented IR into multidimensional vector with IR2Vec and dimension reduction with t-SNE (t-distributed stochastic neighbor embedding) method. The D1 cache miss ratio measured with perf stat tool is considered as performance metric. A heuristic criterion of programs having more or less cache miss ratio is given. This criterion is based on embeddings of programs in 2D-space. The instrumentation compiler’s pass developed in this work is described: how it generates and injects artificial instructions into IR within the used memory model. The software pipeline that implements the performance estimation based on LLVM compiler infrastructure is given. Computational experiments are performed on synthetic tests which are the sets of programs with the same CFGs but with different sequences of offsets used when accessing the one-dimensional array of a given size. The correlation coefficient between performance metric and distance to the worst program’s embedding is measured and proved to be negative regardless of t-SNE initialization. This fact proves the heuristic criterion to be true. The process of such synthetic tests generation is also considered. Moreover, the variety of performance metric in programs set in such a test is proposed as a metric to be improved with exploration of more tests generators.

Keywords: mathematical modeling, compilers, intermediate representation, embeddings, performance analysis, static analysis
Citation in English: Zavodskikh R.K., Efanov N.N. Performance prediction for chosen types of loops over one-dimensional arrays with embedding-driven intermediate representations analysis // Computer Research and Modeling, 2023, vol. 15, no. 1, pp. 211-224
Citation in English: Zavodskikh R.K., Efanov N.N. Performance prediction for chosen types of loops over one-dimensional arrays with embedding-driven intermediate representations analysis // Computer Research and Modeling, 2023, vol. 15, no. 1, pp. 211-224
DOI: 10.20537/2076-7633-2023-15-1-211-224

Indexed in Scopus

Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU

The journal is included in the Russian Science Citation Index

The journal is included in the RSCI

International Interdisciplinary Conference "Mathematics. Computing. Education"