Sparse Computation Data Dependence Simplification for Efficient Compiler-Generated Inspectors
This paper presents a combined compile-time and runtime loop-carried dependence analysis of sparse matrix codes and evaluates its performance in the context of wavefront parallellism. Sparse computations incorporate indirect memory accesses such as x[col[j]] whose memory locations cannot be determined until runtime. The key contributions of this paper are two compile-time techniques for significantly reducing the overhead of runtime dependence testing: (1) identifying new equality constraints that result in more efficient runtime inspectors, and (2) identifying subset relations between dependence constraints such that one dependence test subsumes another one that is therefore eliminated. New equality constraints discovery is enabled by taking advantage of domain-specific knowledge about index arrays, such as col[j]. These simplifications lead to automatically-generated inspectors that make it practical to parallelize such computations. We analyze our simplification methods for a collection of seven sparse computations. The evaluation shows our methods reduce the complexity of the runtime inspectors significantly. Experimental results for a collection of five large matrices show parallel speedups ranging from 2x to more than 8x running on a 8-core CPU.
Mohammadi, Mahdi Soltan; Yuki, Tomofumi; Cheshmi, Kazem; Davis, Eddie C.; Hall, Mary; Dehnavi, Maryam Mehri; . . . and Mills Strout, Michelle. (2019). "Sparse Computation Data Dependence Simplification for Efficient Compiler-Generated Inspectors". In K.S. McKinley and K. Fisher (Eds.), PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (pp. 594-609). Association for Computing Machinery. https://doi.org/10.1145/3314221.3314646