\documentstyle[12pt,a4]{article} \title{Parallelization Techniques for the Improved Conjugate Gradient Method} \author{\em Tianruo Yang $\dagger$, Hai-Xiang Lin $\ddagger$, Man Lin $\dagger$ \\ \em $\dagger$ Department of Computer and Information Science \\ \em Link\"oping University, S-581 83, Link\"oping, Sweden \\ \em $\ddagger$ Department of Technical Mathematics and Computer Science \\ \em TU Delft, P.O. Box 356, 2600 GA Delft, The Netherlands} \date{} \begin{document} \maketitle \begin{abstract} In this paper, we mainly study the parallelization technique for the improved version of the conjugate gradient (ICG) method, which is reorganized without changing the numerical stability, but all inner products and matrix-vector multiplications of a single iteration step are now derived to be independent and communication time required for the inner product can now be overlapped efficiently with computation time. For large and sparse applications, the method is difficult to be parallelized efficiently on massively distributed memory computers since the standard method of parallelizing loop and accessing data is often indirect addressing, namely inspector-executor strategies. These approaches incur substantial execution time preprocessing overheads when multiple levels of indirection are encountered, a frequent occurrence in sparse and large matrix-based applications. Here we mainly investigate the performance of the new proposed sparse-array rolling (SAR) technique, which significantly reduced these preprocessing overheads, and discuss its execution support accompanied by a detailed performance evaluation. The corresponding results demonstrate the significant advantage over the conventional approaches will be reported as well. \end{abstract} \end{document}