\documentstyle[12pt,a4]{article}

\title{Parallelization Techniques for the Improved Conjugate Gradient Method}

\author{\em Tianruo Yang $\dagger$, Hai-Xiang Lin $\ddagger$, Man Lin
$\dagger$ \\ \em $\dagger$  Department of Computer and Information
Science \\ \em  Link\"oping  University,  S-581 83, Link\"oping, Sweden \\
  \em $\ddagger$ Department of Technical Mathematics and Computer
  Science \\ \em TU Delft, P.O. Box 356, 2600 GA Delft, The Netherlands}
\date{}

\begin{document}
\maketitle

\begin{abstract}
In this paper, we mainly study the parallelization technique for the
improved version of the conjugate gradient (ICG) method, which is
reorganized without changing the numerical stability, but all inner
products and matrix-vector multiplications of a single iteration step
are now derived to be independent and communication time required for
the inner product can now be overlapped efficiently with computation time.
For large and sparse applications, the method is difficult to be
parallelized efficiently on massively distributed memory computers
since the standard method of parallelizing loop and accessing data is
often indirect addressing, namely inspector-executor strategies. These
approaches incur substantial execution time preprocessing overheads
when multiple levels of indirection are encountered, a frequent
occurrence in sparse and large matrix-based applications. Here we
mainly investigate the performance of the new proposed sparse-array
rolling (SAR) technique, which significantly reduced these
preprocessing overheads, and discuss its execution support accompanied
by a detailed performance evaluation.  The corresponding results
demonstrate the significant advantage over the conventional approaches
will be reported as well.
\end{abstract}
\end{document}

