Preconditioning Large Indefinite Linear Systems

After briefly recalling some relevant approaches for preconditioning large symmetric linear systems, we describe a novel class of preconditioners. Our proposal is tailored for large indefinite linear systems, which arise very frequently in many different contexts of numerical analysis and nonlinear optimization. Our preconditioners are built as by–product of the Krylov subspace method used to solve the system. We describe theoretical properties of the class of the preconditioners we propose, namely their capability of both shifting some eigenvalues of the systems matrix to controlled values, and reducing the modulus of the other ones. The results of a numerical experimentation give evidence of the performance of our proposal.


Introduction
he efficient solution of large linear systems (or a sequence of slowly varying linear systems) is of fundamental importance in many contexts of numerical analysis and nonlinear optimization. In this paper we first recall a few relevant approaches for preconditioning large indefinite linear systems. Observe that in many contexts of numerical analysis (see e.g. http://math.nist.gov/MatrixMarket) and nonlinear optimization, the T iterative efficient solution of linear systems and sequences of linear systems is sought. Truncated Newton methods in unconstrained optimization, KKT systems arising in constrained optimization, interior point methods, and PDE constrained optimization are just some examples from optimization.
We first show that using information from quasi-Newton updates may often provide effective preconditioners. The latter are sometimes endowed with theoretical properties related to the spectrum and the condition number of the preconditioned matrix. Then, we describe a novel class of preconditioners for the solution of large indefinite linear systems, without assuming any sparsity pattern of the system matrix.
In particular, the class of preconditioners we propose uses information collected from Krylov subspace methods, in order to assess the structural properties of the system matrix. We iteratively construct our preconditioners either indirectly using (but not performing) a factorization of the system matrix (see, e.g. (Fasano and Roma, 2007;Golub and Van Loan, 1996;Stoer, 1983)), obtained as by product of Krylov subspace methods, or performing a Jordan canonical form on a very small size matrix. We address our preconditioners using a general Krylov subspace method; then, we prove theoretical properties for such preconditioners, and describe results which indicate how to possibly select the parameters which characterize the definition of the preconditioners.
The basic idea of our approach is to apply a Krylov subspace method to generate a positive definite approximation of the inverse of the system matrix. The latter is then used to build our preconditioners, needing to store just a few vectors, without requiring any product of matrices. We assume that the entries of the system matrix are not known and the information necessary to build the preconditioner is gained by using a routine, which computes the product of the system matrix times a vector.
In the paper (Fasano and Roma, 2011b) we experiment our preconditioners both in numerical analysis and nonconvex optimization frameworks. In particular, we test our proposal on significant linear systems from the literature. Then, we focus on the so called Newton-Krylov methods, also known as Truncated Newton methods (see (Nash, 2000) for a survey). In these contexts, both positive definite and indefinite linear systems are considered.
We recall that in case the optimization problem in hand is nonconvex, i.e. the Hessian matrix of the objective function is possibly indefinite and at least one eigenvalue is negative, the solution of Newton's equations within Truncated Newton schemes may require some care. Indeed, the Krylov subspace method used to solve Newton's equation, should be suitably applied considering that optimization frameworks require the computation of descent directions, which have to satisfy additional properties (Dennis and Schnabel,1983;Nocedal and Wright, 2000). In this regard our proposal provides a tool, in order to preserve the latter properties.
The paper is organized as follows. In Section 2 we briefly recall relevant approaches from the literature. Then, in Section 3 we describe our class of preconditioners for large indefinite linear systems, by using a general Krylov subspace method. We detail some theoretical properties of our proposal, along with some hints on its calculation. Finally, a section of conclusions and future work completes the paper.
Regarding the notations, for a nn  real matrix M we denote with [] M  the spectrum of M; k I is the identity matrix of order k and we use the capital letter T to indicate a tridiagonal matrix. Finally, with 0 C we indicate that the matrix C is positive definite, and  denotes the Euclidean norm.

Some approaches for preconditioning large symmetric systems
Let us consider the following linear system = , , , (1) where A is symmetric and nonsingular, and n is large. We assume that the structure of the matrix A is unknown as is its sparsity pattern. We recall that in case of special structures of matrix A, suitable preconditioners may be built for solving (1) (see (Higham, 2002)).
As is well-known, the main rationale behind the idea of using a preconditioner to solve the linear system (1), consists in introducing the nonsingular matrix M, such that solving =

MAx Mb
(2) is possibly simpler in some sense than solving (1). Of course, to the latter purpose the extreme choices for M are = MI and 1 = MA  (the latter being the ideal choice). In most cases (always when n is large) there is no chance to compute 1 A  (or computing 1 A  is no cheaper than solving (1)). Notwithstanding this difficulty, the preconditioner M can be chosen according to the following alternative guidelines:  the linear system (2) should be of easy solution thanks to the structure of matrix M (e.g. preconditioners for linear systems from PDEs discretization often have a suitable block-structure which is suggested by the problem in hand);  the condition number () MA  should be relatively small. The latter fact may be helpful when attempting to solve the preconditioned system (2), with a technique sensitive to () MA  (e.g. the Krylov subspace methods);  the eigenvalues in the spectrum [] MA  should be as clustered as possible (see e.g. (Nocedal and Wright, 2000)). The latter fact may again be helpful whenever Krylov subspace methods are adopted to solve (2). Since we want to deal with the large scale case, without any assumption on the sparsity pattern of A, the main approaches in the literature for building and applying a preconditioner to (1) often gain information on the system matrix by computing the matrix-vector product , Av  with , n vR  or rely on numerical differentiation. In particular, among the approaches proposed we have the following:  Approximate Inverse Preconditioners based on using the BFGS update method (see also (Nash, 1985;Benzi et al., 2000), and references therein). Here, the main adopted idea is that a BFGS update may be suitably used to compute the approximate inverse # A of matrix A. Then, the matrix # A is applied as a preconditioner.  Preconditioners based on the L-BFGS method (see also (Morales and Nocedal, 2000;2001), and the class of preconditioners Limited Memory Preconditioners (LMPs) by Giraud and Gratton (2006)), which pursue an idea similar to Approximate Inverse Preconditioners in the previous item.  Approximate Inverse Preconditioners based on the use of Krylov subspace methods (see also (Simoncini and Szyld, 2007;Fasano and Roma, 2009)), where a Krylov subspace method is used to determine the solution of (1) and to provide information in order to build a preconditioner for (1).  Band Preconditioners based on matrix scaling/balancing (see also (Roma, 2005)).  Preconditioners based on numerical differentiation (see e.g. (Higham, 2002)).  Band Preconditioners based on the BFGS method (see e.g. (Luksan et al., 2010)), where a BFGS update is partially modified, so that suitable band preconditioners are defined for linear systems. This approach was mainly tested within truncated-Newton frameworks. For each of the preconditioning strategies mentioned above we have both a theoretical analysis and a numerical experience, for validation. In this paper we want to follow and generalize the approach proposed in Fasano and Roma (2009), where at any step, an iterative Krylov subspace method is used to compute the approximate inverse # , A over an increasing dimensional subspace. Note that as detailed in Section 3, our approach also encompasses diagonal and block-diagonal preconditioners.

Our class of preconditioners
In this section we first introduce some preliminaries, then we propose our class of preconditioners. Consider the indefinite linear system =,

Ax b
( 3) where nn AR   is symmetric, n is large and . n bR  Suppose any Krylov subspace method is used for the solution of (3), e.g. the Lanczos process or the CG method (Golub and Van Loan, 1996) (but MINRES (Saad, 2003) or Planar-CG methods (Hestenes, 1980;Fasano, 2005) may be also alternative choices). They are equivalent as long as 0, A whereas the CG, though cheaper, in principle may not cope with the indefinite case. In the next Assumption 3.1 we suppose that a finite number of steps, say , hn of the adopted Krylov subspace method are performed.
T is irreducible and nonsingular, with eigenvalues 1 ,, h  not all coincident, Remark 3.1 Note that most of the common Krylov subspace methods for the solution of symmetric linear systems at iteration h may easily satisfy Assumption 3.1 (Saad, 2003;Stoer, 1983). In particular, also observe that from (4) (Saad, 2003;Stoer, 1983;Fasano and Roma, 2007 (4), and recalling that =,  (Baglama et al., 1998) (Fasano and Roma, 2009) we consider a special case where the request (5) on h T may be considerably weakened under mild assumptions. Moreover, in the paper (Fasano and Roma, 2011b) we also prove that for a special choice of the parameter 'a' used in our class of preconditioners (see below), strong theoretical properties may be stated.
On the basis of the latter assumption, we can now define our preconditioners and show their properties. To this aim, considering for the matrix h T the expression (5), we define (see also (Gill et al., 1992) it suffices to use (13). Finally, the proper setting of the parameter 'a' allows to easily control the condition number of matrix (12).

Preliminary numerical results
In order to preliminarily test our proposal on a general framework, without any assumption on the sparsity First, we considered a set of symmetric linear systems as in (3), where the number of unknowns n is set as = 1000, n and the matrix A has also a moderate condition number. We simply wanted to experience how our class of preconditioners affects the condition number of A. In particular (see also (Geman, 1980)), a possible choice for the latter class of matrices is given by  (3) is computed randomly with entries in the set [ 10,10]. U  We computed the preconditioners (8-9) by using the Conjugate Gradient (CG) method (Higham, 2002), which is one of the most popular Krylov subspace iterative methods to solve (3) (Golub and Van Loan, 1996). We remark that the CG is also often used in case the matrix A is indefinite, though it can prematurely stop. As an alternative choice, in order to satisfy Assumption 3.1 with A indefinite, we can use the Lanczos (1950) process, MINRES methods (Paige and Saunders, 1975) or Planar-CG methods (Fasano, 2005). In (8) we set the parameter h in the range { 20 , 30 , 40 , 50 , 60 , 70 , 80 , 90 }, h  and we preliminarily chose =0 a (though other choices of the parameter 'a' yield similar results), which satisfied items a) and c) of Theorem 3.1. We have generated several matrices like (15), obtaining very similar results. In particular, given one instance of A as in (15) Figure 2). The latter property is an appealing result as described in Section 1, since the eigenvalues of # (0,1, ) hn M I A will be 'more clustered'. The latter phenomenon was better investigated by introducing other sets of test problems, described hereafter. we computed the preconditioners (8-9) by using the CG, setting the starting point 0 x so that the initial residual 0 b Ax  was a linear combination (with randomly chosen coefficients -1 and +1) of all the n eigenvectors of A.
We strongly highlight that the latter choice of 0 x is expected to be not favorable when applying the CG, in order to build our preconditioners. In the latter case the CG method is indeed expected to perform exactly n iterations before stopping (see also (Nocedal and Wright, 2000;Saad, 2003)), so that the matrices (4.2) may be significant to test the effectiveness of our preconditioners, in case of small values of h (broadly speaking, h small implies that the preconditioner contains correspondingly little information on the inverse matrix in order to verify again how the preconditioners (8) are able to cluster the eigenvalues of A. Following the guidelines in (Morales and Nocedal, 2001), in order to test our proposal also on a different range of values for the parameter h, we set { 4 , 8 , 12 , 16 , 20 }. h  The results are given in Figure 3 (full comparisons) which includes all the 500 eigenvalues, and Figure 4 (details) which includes only the eigenvalues from the 410-th eigenvalue to the 450-th eigenvalue. Observe that our preconditioners are able to shift the largest absolute eigenvalues of A towards -1 or +1, so that the clustering of the eigenvalues is enhanced when the parameter h increases. For each value of h the matrix A is (randomly) recomputed from scratch, according to relation (16). This explains why in the five plots of Figures 3-4  indicates that the eigenvalues tend to cluster around -1 and +1, according with the theory.
To complete our preliminary experience we tested our class of preconditioners in optimization frameworks. In particular, we considered a standard linesearch-based truncated Newton method in Table 1, where for any 0 k  the solution of the symmetric linear system (Newton's equation  is required. We considered several unconstrained optimization problems from CUTEr collection (Gould et al., 2003), and for each problem we applied the truncated Newton method in Table 1 are not expected to differ significantly. For simplicity we just report the results on two test problems, using = 1000, n in the set of all the optimization problems experienced. Very similar results were obtained for almost all the test problems.

Conclusions
We have given theoretical and numerical results for a class of preconditioners, which are parameter dependent. The preconditioners in our proposal can be built by using any Krylov method for the symmetric linear system (3), provided that it is able to satisfy the general conditions (4-5) in Assumption 3.1. The latter property may be appealing in several real problems, where a few iterations of the adopted Krylov subspace method may suffice to compute an effective preconditioner.  Our proposal seems tailored also for those cases where a sequence of linear systems of the form = , = 1, 2, kk A x b k requires a solution (e.g., see (Morales and Nocedal, 2001)  . On this guideline, in a future work we are going to experience our proposal with other preconditioners described in Section 2. In particular, we think that a comparison with the proposals in (Gratton et al., 2011;Morales and Nocedal, 2000) could be noteworthy.
Finally, the class of preconditioners in this paper seems to be an interesting tool also for the solution of linear systems in financial frameworks. In particular, in future works we want to focus on symmetric linear systems arising when we impose KKT conditions in portfolio selection problems, with a large number of titles in the portfolio, along with linear equality constraints (see also (Al-Jeiroudi et al., 2008)).

Acknowledgment
The first author wishes to thank the Italian Ship Model Basin -INSEAN, CNR institute, for their support.