\documentstyle[12pt,a4]{article} \include{/home/tiaya/latex/yanglic/defs} \title{The Improved Biconjugate Gradient Method on Massively Parallel Distributed Memory Computers} \author{\em Tianruo Yang $\dagger$, Hai-Xiang Lin$\ddagger$ \\ \em $\dagger$ Department of Computer and Information Science \\ \em Link\"oping University, S-581 83, Link\"oping, Sweden \\ \em $\ddagger$ Department of Technical Mathematics and Computer Science \\ \em TU Delft, P.O. Box 356, 2600 GA Delft, The Netherlands} \date{} \begin{document} \maketitle \begin{abstract} For the solutions of linear systems of equations with unsymmetric coefficient sparse matrices, we propose an improved version of the biconjugate gradient (IBiCG) method by using the Lanczos process as a major component combining elements of numerical stability and parallel algorithm design. For Lanczos process, stability is obtained by a coupled two-term procedure that generates Lanczos vectors scaled to unit length. The algorithm, which is shown to be mathematically equivalent to the original biconjugate gradient method, is derived such that all inner products and matrix-vector multiplications of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time. Therefore, the cost of global communication on parallel distributed memory computers can be significantly reduced. The resulting IBiCG algorithm maintains the favorable properties of the Lanczos process while not increasing computational costs. Additionally, the data distribution and the communication scheme for the sparse operations are designed particularly based on the analysis of the non-zero elements which are crucial for efficient execution. The efficiency of this method is demonstrated by numerical experimental results with regards to both scaling behavior and execution time from real applications carried out on a massively parallel distributed memory computer, the Parsytec GC/PowerPlus. \end{abstract} \end{document}