A Superlinearly Convergent Penalty Method with Nonsmooth Line Search for Constrained Nonlinear Least Squares

Recently, we have presented a projected structured algorithm for solving constrained nonlinear least squares problems, and established its local two-step Qsuperlinear convergence. The approach is based on an adaptive structured scheme due to Mahdavi-Amiri and Bartels of the exact penalty method. The structured adaptation also makes use of the ideas of Nocedal and Overton for handling the quasi-Newton updates of projected Hessians and appropriates the structuring scheme of Dennis, Martinez and Tapia. Here, for robustness, we present a specific nonsmooth line search strategy, taking account of the least squares objective. We also discuss the details of our new nonsmooth line search strategy, implementation details of the algorithm, and provide comparative results obtained by the testing of our program and three nonlinear programming codes from KNITRO on test problems (both small and large residuals) from Hock and Schittkowski, Lukšan and Vlček and some randomly generated ones due to Bartels and Mahdavi-Amiri. The results indeed affirm the practical relevance of our special considerations for the inherent structure of the least squares.


Introduction
n exact penalty method is a sequential unconstrained minimization approach for solving general nonlinear programming (NLP) problems.Some attractive features of the exact penalty methods are:  They overcome the difficulties posed by inconsistent constraint linearization (Fletcher, 1987). They are successful in solving certain classes of problems in which standard constraint qualifications are not satisfied (Anitescu, 2005;Benson et al., 2006;Leyffer et al., 2006). They are useful to enhance the robustness and ensure the feasibility of the subproblems produced in the iterations of the algorithm (Byrd et al., 2004;Gill et al., 2002).Coleman and Conn (1982aConn ( , 1982bConn ( , 1984) ) presented an effective exact penalty algorithm for solving generally constrained nonlinear minimization problems.Mahdavi-Amiri and Bartels (1989) adapted the approach to develop an algorithm using projected structured Hessian for solving constrained least squares problems.Although computational results given in (Mahdavi-Amiri and Bartels, 1989) showed a promising twostep superlinear asymptotic convergence, the authors did not provide an analytical proof.We presented a variant of the approach in (Mahdavi-Amiri and Bartels, 1989) and established both the global and local superlinear convergence (Mahdavi-Amiri andAnsari, 2011a, 2011b).Our algorithm appropriates the ideas of Dennis et al. (1989) in the unconstrained case to the projected structured Hessian updating.Here, we present the details of our new structurally adapted nonsmooth line search strategy, implementation details and more extensive computational results.
The remainder of our work is organized as follows.Section 2 gives the notation and relevant background on constrained nonlinear least squares problems, the exact penalty approach and projected structured Hessian updating scheme for constrained nonlinear least squares problems.Section 3 discusses the new nonsmooth line search procedure accounting for the structure in the nonlinear least squares objective.Section 4 is devoted to the global and the local two-step superlinear convergence results.The practical issues relating to implementation are discussed in Section 5. Computational results in Section 6 substantiate the theoretical results and show that the new algorithm is competitive as compared to the best methods tested in the collection of Hock and Schittkowski (1981) and three algorithms developed in KNITRO (Byrd et al., 2006) integrated package.There, we also show the effectiveness of the algorithm on some larger test problems taken from (Lukšan and Vlček, 1999) and randomly generated test problems due to Bartels and Mahdavi-Amiri (1986).Finally, we conclude in Section 7.
Throughout,  denotes the 2 l norm for vectors or matrices.

Constrained nonlinear least squares
The problem to be solved is: where x is an n-vector, and for each , f  i c and j c are functions from n R to 1 , R all assumed to be twice continuously differentiable.We refer to each f  as a residual.There are good reasons for numerical analysts to study least squares problems, mainly because of their applicability to many practical problems.Almost anyone who formulates a parameterized model for a chemical, physical, financial, social or A economic application uses a problem of the form (1) to select values for the parameters that best match the model to the data.Although the problem (1) can be solved by a general constrained nonlinear optimization method, in most circumstances the inherent structure of the objective function in (1) makes it worthwhile to use specialized techniques.Notice that if   Gx is the matrix with its columns being the gradients  , In many applications, it is possible to calculate the first partial derivatives that make up the matrix   Gx explicitly.These could be used to calculate the gradient  .Moreover, the term     T G x G x is often more important than the second summation term in (2), either because of near-linearity of the model near the solution (that is, the being small) or because of small residuals (that is, the f  being small).Most algorithms for unconstrained nonlinear least squares exploit these structural properties of the Hessian.Newton's method uses the search direction p found by solving, For problems where,

 
Sx is small compared to x approaches a minimizer *, x the term

 
Sx is dropped in (3) to yield the Gauss-Newton method.This method works well when   Gx has full row rank and

Approximation for  
Sx in cases where it cannot be neglected is a recurring theme in the literature.In the unconstrained case, quasi-Newton approximations to only the second term,  , Sx of the Hessian matrix (2) have been developed (Dennis et al., 1989;Dennis and Walker, 1981;Engels and Martinez, 1991).Pertinent to our proposal is the work by Dennis et al. (1981), who made investigations of quasi-Newton updates to a matrix  .

B S x
 This additive structure was analyzed by Dennis et al. (1989), was used by Al-Baali and Fletcher (1985) and was extended to an infinite dimensional setting by Griewank (1987).These strategies are called "structured quasi-Newton" methods.Tapia (1988) extended the class of secant updates to equality constrained optimization utilizing the structure present in the Hessian of the augmented Lagrangian.Local Q-superlinear convergence of these structured secant methods were established by Dennis and Walker (1981).Based on Tapia's approach, Huschens (1993) also presented an approach to exploit the nonlinear least squares structure.A hybrid method, linking the reduced successive quadratic programming (SQP) method with the ideas of Fletcher and Xu (1987), was proposed by Tjoa and Biegler (1991).Mahdavi-Amiri and Bartels (1989) adjoined the reduced SQP method with the ideas developed by Dennis et al. (1981) for the unconstrained structured quasi-Newton BFGS update to develop a projected structured approach for solving the problem (1).

An exact penalty function
Here   , x  denotes the 1 -exact penalty function, where the penalty parameter  is a positive number.It is well known that for appropriate values of the penalty parameter ,  are either KKT points of the nonlinear program (1) or infeasible stationary points (Byrd et al., 2005).Pietrzykowski (1969) showed this penalty function to be exact in the sense that if * x is an isolated minimizer of (1) and the gradients of the active (binding) constraints at * x are linearly independent, then there exists a real number *0 x is also an isolated local minimizer of   ,, x  for each , 0 *.


  Luenberger (1970) determined, in the convex case, the threshold value of .


The parameter 0   is used to determine the closeness of any constraint to zero, and hence to judge its activity.Using the approach of Mahdavi-Amiri and Bartels (1989), we define the index sets   of violated equality constraints, and

 
of violated inequality constraints.Thus, the minimization of  is carried out with the aid of an  -active merit function: This provides a differentiable approximation to the true merit function  in a neighborhood of x.It is trivial that   Loosely speaking, step directions are determined using ,   but line searches and optimality tests use . A necessary condition for * x being an isolated local minimizer of  under the assumptions made above on ,  the , i c and the j c is that there exist multipliers * , where () Ax is the matrix with its columns being the gradients of the active constraints at x , for more details see (Coleman and Conn, 1980).A point x for which only (4) above is satisfied is said to be a stationary point of .
x is an isolated minimizer, then it is necessary for * x to be a stationary point and satisfy (5) and (6).Note that stationarity and optimality are determined using 0 ,  which is   with = 0.  We estimate r  of the numbers * r  only in the neighborhoods of stationary points.In such neighborhoods, the numbers r  are taken to be the least squares solution to: and in practice, the QR decomposition of () Ax is used to solve the least squares problem: If the columns of () Ax are linearly independent, then the columns of Y and Z, respectively, serve as bases for the range space of () Ax and null space of ( ) ; ZZ are the identities.Nearness to a stationary point is governed by the stationary tolerance > 0.


The r  are computed only if the projected gradient, ( , ),  is deemed "small enough" according to this tolerance.

The quadratic subproblem
Fundamental to the Mahdavi-Amiri and Bartels (1989) approach is a particular unconstrained quadratic problem: 1 ( ( , )) .min 2 The matrix z H is a positive definite approximation of one of two projected Hessian matrices, , as explained next.When the algorithm is in its "local" state, that is in the proximity of a stationary point, the dual variables  are computed and in its "global" state, that is far from a stationary point, the value of the r  are taken to be zero.For general nonlinear programming problems, a variety of projected quasi-Newton updates is available.For the general constrained nonlinear least squares problems, we make use of the ideas in (Dennis et al., 1989), in the unconstrained case, for structuring and in (Nocedal and Overton, 1985) for projecting.

The step directions
The minimization of  is carried out using several alternative step directions derived from :   (1) the global horizontal step direction; (2) the dropping step direction; (3) the Newton step direction: (3a) the asymptotic horizontal step direction; (3b) the vertical step direction.The global horizontal step direction is := G h Z w where w is the solution of the quadratic problem (9).The matrix , z H of course, approximates the global projected Hessian.The asymptotic horizontal step direction A h is a component of the Newton step direction, .
A hv  The Newton steps are only attempted in close neighborhoods of stationary points that are expected to be minimizers.

The step direction
A h is computed as in the global case above.In this case, the matrix z H is to be an approximation of the local projected Hessian.
At , A xh  the constraints of ( , ) AC x  may no longer be within  of zero.The vertical step is derived by taking a single Newton step toward the value of v that solves the system of equations, ( , )  is the vector of the active constraint functions, ordered in accordance with the columns of ( ).Ax The dropping step direction tries to drop a function i c from equalities or j c from inequalities of the collection of active constraints and provides a direction that gives a local first-order decrease in the penalty function value.It is the step direction d that satisfies the system of equations, ( ) = sgn( ) , r AC x   is chosen for whichever one of ( 5) or ( 6) is violated, and r e is the rth unit vector.The steps described above are used, broadly speaking, as follows: (1) When ( , ) > , where a line search is used to determine > 0.
(a) If ( 5) or ( 6) is not satisfied, then an index ( , ) r AC x   is chosen for whichever one of ( 5) or ( 6) is violated, and we set , where a line search is used to determine > 0. 5) and ( 6) are satisfied, then we set the steps incurred by the  -approximation to the penalty function produce descent for the penalty function if  and  are "correctly set".Algorithm 1 below gives an outline of the classical 1 -penalty method.
Give 0 >0  and starting point 0 ; set a new starting point Note that care must be taken to detect infeasibility or unboundedness.Also, when the value of  is too large, in which case the optimal point for  may be infeasible, Algorithm 1 reduces  and the minimization of  is repeated, for more details see the first two algorithms presented by Mahdavi-Amiri and Bartels (1989).The details of how z B is to be managed will be given in Section 2.4.

The projected least squares structure
For computing the steps using (9), we intend to provide a quasi-Newton approximation z H for 22 ( , ) In the rest of our work, we set ( ) ( ) ( ) ( ) ( ) and in order to simplify the notation, the presence of a bar above a quantity indicates that it is taken at iteration 1, k  and the absence of a bar indicates iteration k.Thus, we have = ( ) ,   and intend to provide a quasi-Newton approximation, ( , ) .
We interpret the multipliers r  to be zero or estimated according to (7), whichever is appropriate at the kth iteration of the algorithm.Following (Mahdavi-Amiri and Bartels, 1989), the matrix z B is updated such that the new matrix z B satisfies the secant condition =, and y is an appropriate approximation to ( , ) .


Based on the structure principle given by Dennis et al. (1989), Engels and Martinez (1991) derived the structured Broyden family.According to this, we presented a scheme (Mahdavi-Amiri and Ansari, 2011a) for updating the matrix z B and derived a BFGS z B -update scheme as and a DFP update formula as Later, we use the BFGS scheme for testing the problems, because of its generally satisfactory numerical behavior in other contexts.We also impose positive definiteness of , z H if necessary, to guarantee descent, by use of the modified Cholesky factorization.

The line search strategy
Here, we propose a nonsmooth line search strategy that makes use of the structure of least squares and is specially designed for exact 1 -penalty function ( , ).
x  In a close neighborhood of * x (a point satisfying second-order sufficiency conditions), a step size of unity is always taken.Therefore, the procedure to be explained here is to be used when 'far' from a solution.
Let x be the current iterate and h be the direction of search along which the step  is to be taken.We are interested in finding >0 where  is a given small positive constant to be specified by the following assumption. where

 
We note that Assumption 3.1 has also been used by Coleman and Conn (1980).It is a direct extension of a result of Proposition 1 of Conn and Pietrzykowski (1977), to show that the above condition can be satisfied.
For a function ( ), It should always be clear from the context whether we are thinking of p, as a function of a vector or a step length value  .Also, we denote the derivative of () A point where the derivative of ()  does not exist will be referred to as a breakpoint.Note that the breakpoints occur at the zeros of the ( ).

 
We gain our motivation from the case when the f  and the i c are taken to be linear.A similar idea was suggested by Coleman and Conn (1982b) for the general nonlinear programming problems, but our approach is different in our consideration of the objective function.They make a linear approximation to the objective function and constraints but, in fact, we make a quadratic (sum of squares of linear functions) approximation of the objective function and a linear approximation of the constraints.Clearly, this should provide a better approximation.In the new scheme, we try out a sequence of candidate values ,  stopping to accept one of these values when certain conditions are satisfied.The numerical results show the new strategy to do better than the general line search in (Murray and Overton, 1978) considered by Mahdavi-Amiri and Bartels (1989).
Suppose that h is a descent direction for  so that ( ) < 0 As an approximation of ( ), Note that the breakpoints of  occur at zeros of the 12 ( ) ( ) , .
 for all  and without any new objective or constraint function evaluation, if needed.
  be a sorted list (in nonincreasing order with respect to j  ) of these breakpoints along h.Now, an algorithm to determine the minimizer of ()  is straightforward: by examining breakpoints sequentially, we find either a breakpoint as a minimizer (that is a breakpoint, at which '0 minimizer that is found by setting '( ) = 0.


Next, we explain this strategy in more detail.We consider moving along the positive direction of h as long as the corresponding directional derivative of  along h is non-positive (in doing this, we may encounter breakpoints).Later, we discuss how to compute ',  Now, suppose we are along the interval    and a minimizer of  is yet to be determined.We compute ' ( ).
ii   and we determine the step length  so that '( ) = 0,  then clearly i  is the desired step length and the breakpoint = In cases where ' ( ) 0, ii   is determined and the process is repeated.If we progress through the entire sorted list of breakpoints and  continues to reduce past the last breakpoint =,    Now, we discuss the computational details.We compute breakpoints of  by = ( ) / ( ) > 0 .
For computing the derivative and directional derivative of  , we note that for any > 0,  we have '( ) '( ) = . only.Considering the above and , ) , We have (by the fact that ( ) ( ) = 0 Since at i  the change in directional derivative of  is determined by the change in directional derivative of c only, we have ' ( ) = ( ) ' ( ) ' ( ) where  is as defined by (15).Finally, by ( 14), the solution of ( ) = 0, then   is constant along each interval and therefore we are certain that the minimizer is a breakpoint.
Algorithm 2. Line search procedure.
Input: x , the current point, h , a descent direction for  at x , 1 >0  and >0
where,  is a given small positive constant specified by (13).
:= 0 {Perform the (Murray and Overton, 1978)     then we have a good interval containing the minimizer and thus the approach of Murray and Overton   (1978) can be performed conveniently using their successive polynomial interpolation with safeguard strategy (using the available information at the last point x and , xh   their algorithm works out more favorably; for details, see (Murray and Overton, 1978)).Our numerical testing showed that an initial large value of  may harm the performance of the line search.Thus, for the first trial, we adjust the choice of  by setting = min (1, ),  to ensure that the Newton-like step is tried (if it gives enough reduction).For subsequent values of ,  when the first trial does not provide a sufficient decrease specified by ( 13), we adjust  by at least an 1 >0  to ensure enough change in the value of and admit a finite termination of the algorithm.
We are now ready to give our line search algorithm.In Algorithm 2, the parameter 1 in our experiments in Section 6) is used to avoid acceptance of a small correction to .
 Our experiments showed that the usage of this parameter would accelerate the procedure.The parameter  is used to determine a sufficient decrease in  and is characterized by (line search) Assumption 3.1 as specified by (3.1).
It should be pointed out that our numerical experiments in Section 6 showed that Algorithm 2 often achieved adequate reductions inside the loop without needing to perform the Murray and Overton line search.

Global and local convergence results
Here, we briefly state the global and the local two-step Q-superlinear convergence results.The analysis of global convergence is based on the results of Byrd and Nocedal (1991), who studied convergence of the Coleman and Conn (1984) approach, with our necessary adjustments for the nonlinear least squares.Let D denote a compact set so that k xD  in D, for all k.Let ()

AC
Cx denote the vector of active constraints at x and Ŝ denote the set of first-order points of  in D, see (Coleman and Conn, 1980) for justification of this terminology.(C) the gradients of the active constraints at any point in D are linearly independent.
Theorem 4.2 Suppose that Assumptions 4.1 hold and assume that (i) the number of stationary points of  in D is finite; (ii) all first-order points of  in D are strict second-order points of  ; x is a subsequence and x is a first-order point such that  then there exists a sufficiently small positive constant  such that , ( ) , where k s is defined by ( 10); (iv) the proposed line search strategy produces a step length  satisfying Assumption 3.1.
Remark 4.3 It should be noted that a similar result for the general nonlinear case (see Coleman and Conn, 1982b, Theorem 1) makes use of a stronger condition, , () where L G is the Hessian of Lagrangian function, while Theorem 4.2 is proved under a weaker condition as specified by (iii).Considering Theorem 4.13 of Mahdavi-Amiri and Ansari (2011a), our condition in (iii) is reasonable.Now, we state a local two-step superlinear convergence of our method.The asymptotic analysis is developed adapting the approach of Nocedal and Overton (1985) to the nonlinear least squares.We will denote * x to be a local optimal solution of problem (1) and suppose the active set at * we assume that we are sufficiently close to * x so that the proper active set has been identified.We need the following assumptions.Proof.See Theorem 4.17 in (Mahdavi-Amiri and Ansari, 2011a).

Computational considerations
Here, we indicate the measures taken to implement Algorithm 1.The reader is referred to (Mahdavi-Amiri and Bartels, 1989) for more details.
The test for nearness to stationarity has the form, 1 ( , ) ( , ( , ) ) , where, for given scalars  and val, we define 1 reference to be a composite absolute-relative test, Tests for feasibility are carried out exactly as for activity, except that a much smaller user-defined tolerance, ,  must be used in place of . The main difficulty in the context of exact penalty method is the strategy for choosing and updating the penalty parameter .
 The initial value of =1  was used for most test problems, though we sometimes found that other values were more advantageous to use.These are noted in the reported results.Only naive changes are made to  during the running of the program.The value of  is reduced through division by 8 each time a minimizer x for ( , ) x  is found that is not feasible for (1).Recently, Byrd et al. (2008) proposed an effective penalty parameter update strategy appropriated for trust region approaches.Of course, a possible adaptation to line search strategies is welcome.We have work in progress to consider a new penalty parameter updating strategy in the context of combined line search and trust region exact penalty methods.The value of r  must be tested to determine whether it satisfies ( 5) or ( 6).We carried out this relative to a boundary tolerance  in analogy with the corresponding ones in (Mahdavi-Amiri and Bartels, 1989).
The value of  is regarded as being too small when   2 ( , ( )) , where macheps is the unit round-off error.We regard  to be too small when ,   and  is regarded as being too small when .

 
The difference in penalty function values, ( , ) ( , ), has to be tested to determine whether a sufficient decrease has been obtained.We choose the condition for a sufficient decrease on the Newton step to be: ) .
A sufficient decrease for the other two steps (global and dropping) is demanded by setting the parameter 1  in the line search condition (13).The numerical positive definiteness of the matrix z H is enforced, if necessary, by using the modified Cholesky factorization described in (Fang and O'Leary, 2008) during the process of solving ( 9) to guarantee descent.The requirements for optimality are that, for reasonable choices of reference values, (1) One might also be prepared to accept as a case of optimal termination the event that one or more of the above fail to be satisfied, but the user should be notified whenever this happens, for details see (Mahdavi-Amiri and Bartels, 1989).

Computational results
Here, we report the comparative results obtained on three sets of problems.First, thirty CNLLS problems were taken from Hock and Schittkowski (1981): 1, 2, 6, 13, 14, 16, 17, 18, 20, 22, 23, 26, 27, 28, 30, 31, 32, 42, 46, 48, 49, 50, 51, 52, 53, 60, 65, 77 and 79.These were chosen because they were least squares problems or could be conveniently cast in that form.The number of variables in these problems varies from 1 to 5, and the number of constraints varies from 1 to 13. Problems 28, 48, 50, 51, 52, and 53 have both linear constraints and linear residual functions and the other problems have at least a constraint or a residual function to be nonlinear.Second, we considered seven problems generated from (Bartels and Mahdavi-Amiri, 1986) who presented a program for random generation of general nonlinearly constrained nonlinear programming problems and also generated an example of a least squares problem where the dimensions and the curvature of the problem could be varied.These random test problems have both nonlinear constraints and residual functions.Third, we used twelve larger test problems in the constrained nonlinear least squares form from (Lukšan and Vlček, 1999) as listed below: 1. Chained Rosenbrock function with trigonometric-exponential constraints.2. Chained Wood function with Broyden banded constraints.3. Chained Powell singular function with simplified trigonometric-exponential constraints.4. Chained Cragg-Levy function with tridiagonal constraints.5. Chained HS46 (Hock and Schittkowski, 1981), problem number 46) problem.6. Chained HS47 problem.7. Chained modified HS48 problem.8. Chained modified HS49 problem.9. Chained modified HS50 problem.10.Chained modified HS51 problem.11.Chained modified HS52 problem.12. Chained modified HS53 problem.These problems were modified in such a way that a variable number of n may be used.In our tests we take 300 n  (we were not able to consider larger n, since the free KNITRO package we are using is limited to problems with = 300 n ).In all of these test problems the constraints and the residual functions are both nonlinear.The algorithm has successfully solved all of the considered problems quite efficiently.The local convergence on all problems clearly showed a two-step superlinear rate.Moreover, the significant reduction in the number of global steps confirms the effectiveness of our proposed line search procedure (Algorithm 2), as compared to the results obtained by Mahdavi-Amiri and Bartels (1989) as well as methods reported by Hock and Schittkowski (1981) and results obtained by our testing of three algorithms (Interior/Direct, Interior/CG and Active-Set) recently developed in the KNITRO 6.0 package.Our calls to programs in KNITRO are managed by a C++ program to have the same testing environment as our program.
Our algorithm was coded in a portable subset of C++ and ran in double precision arithmetic on the Microsoft visual C++ 6.0 compiler on an x86-based PC with an AMD~1667Mhz processor.The parameters of Sections 2 and 5 were set as follows: Tables 1 and 2 present the results on problems from (Hock and Schittkowski, 1981).The same initial points as in this reference were used.The function value obtained by our algorithm is found in the column headed by "FV" for comparison with those listed.It should be noted that our objective functions correspond to those reported in (Hock and Schittkowski, 1981) multiplied by .
12 This was simply a matter of convenience for us in arranging the test problems in a least squares format.The number given in the "NI" column counts the cumulative number of steps through the for loop of the minimization for .
 The "NAI" column counts the number of steps needed to be taken for termination after the identification of the correct active set at the solution and execution of the Newton steps.The "NFE" column counts the number of times the value of the objective function was computed.The spread between the NI and the NFE provides an impression of how much work was required by the line search.Indeed, the spread was not found to be significant and this points to the efficiency of our proposed line search scheme.The numbers listed in the column headed by "H&SH" give the number of function evaluations required by the best method tested in (Hock and Schittkowski, 1981) for attaining a function value at least as good as the one we attained.We list the number of function evaluations needed by the method of Mahdavi-Amiri and Bartels (1989) in the column headed by "MA&B".For the three algorithms provided by KNITRO 6.0 (Byrd et al., 2006) integrated package for general constrained nonlinear problems, we used the default parameter values without defining any special case for the objective function or constraints, and used the dense quasi-Newton BFGS Hessian approximation of the Hessians.In Tables 1 and 2, the columns headed by "D", "CG" and "AS" respectively give the number of function evaluations required by Interior/Direct (barrier), Interior/CG (barrier) and Active-Set algorithms in this package.Our method of counting function evaluations was consistent with (Hock and Schittkowski, 1981;Mahdavi-Amiri and Bartels, 1989;and KNITRO), so that these columns serve as a basis for comparison.Obviously, the function evaluations are compared on cases where the algorithms have converged to the same solution point.The results for initialization of the projected matrix with the zero matrix are given in Table 1 and those for initialization with the identity matrix are given in Table 2.There is no entry in the "H&SH" column for problem 65, which serves to indicate that our program reached a minimum value significantly lower than that of any of the methods reported on in (Hock and Schittkowski, 1981).In the first column of Tables 1 and 2, the superscript (1) or (4) indicates that the initial value of  was set to 100 or 10, respectively, rather than 1.The problems in question, i.e. 6, 26, 60, 65, and 77, had constraint function values at some points whose magnitudes dominated the magnitude of the penalty function, forcing the minimization process to take an unusually long time in attempting to maintain the constraints within feasibility.The larger value of  compensated for the imbalance in scale between the objective function and the constraints.The superscript (2) or (3) indicates that the initial value of  was set to 0.01 or 0.001, respectively, instead of 1.In the problems for which this was done, i.e. 13, 15, 16,   and 20, there was a reverse imbalance of scale.The magnitude of the objective function at some point was far larger than that of the constraints.This caused the minimization to waste time in finding a sequence of infeasible stationary points for a sequence of decreasing values of .
 The smaller value of  compensated for the imbalance in scale between the objective function and the constraints.
It should be pointed out that we have work in progress to come up with a new automatic penalty parameter updating strategy in the context of combined line search and trust region exact penalty methods.Despite this, the method we have described shows itself competitive in comparison with those tested by Hock and Schittkowski (1981) and those solved by KNITRO.It did, respectively, as well as, better than and poorer than the best of the methods in Hock and Schittkowski on approximately 40%, 50% and 10% of the test problems and in comparison with the method of Mahdavi-Amiri and Bartels (1989), the corresponding results are 30%, 70% and 0%, respectively.In comparison with Interior/Direct algorithm, our algorithm did as well as, better and poorer on 13%, 70% and 17%, respectively.In comparison with Interior/CG algorithm, the corresponding results are 13%, 74% and 13%, respectively.Finally, in comparison with Active-Set algorithm, results are 37%, 50% and 13%, respectively.Thus, our new code here with the modifications proposed along with the new line search strategy significantly outperforms the ones reported in (Bartels and Mahdavi-Amiri, 1986;Mahdavi-Amiri and Bartels, 1989), as well as the three general nonlinear algorithms of KNITRO (Byrd et al., 2006), on the test problems.This should justify the appropriateness of our structural approach to nonlinear least squares.In Tables 3 and 4, we report the results obtained by our program on three sets of random problems (problems 1, 2 and 3 as set 1, problems 4 and 5 as set 2, and finally problems 6 and 7 as set 3) generated and tested by Bartels and Mahdavi-Amiri (1986) -Amiri, 1986).The numbers of the equality constraints, the inequality constraints and the active constraints at solution for the three sets respectively are (2, 3, 4), (5, 5, 7) and (8,12,10).The results obtained by each algorithm provided by KNITRO 6.0 on these random problems are separately reported in Table 5.In Tables 3 and 4, the "SP" column shows the value to which all components of the starting point are set.The "  " column shows the value considered for .
 The "NCE" column gives the count for the number of times the constraint function is computed.We compute the composite absolute-relative feasibility error at the final point x as: () Cx denotes the vector of constraints evaluated at x. Rounded value of () rfe x is shown in the column headed by "RFE".The other column headlines have the same meanings as those given for Tables 1 and 2. In Table 5 for each algorithm of KNITRO we have three columns, and all column headlines have the same meanings as those given for other tables.
Note that on problems 4 (with starting point 10), 5 (with starting point 1), 6 and 7 our program reached a locally optimal solution having significantly smaller objective values than the values obtained by the other three algorithms.At these solutions, our program terminated with the active constraints set {1, … ,7,9} on problems 4 and 5, and with the active constraints set {1, … ,15,17,18,19} on problems 6 and 7.
In Table 6, we report the results obtained by our program and three programs of KNITRO 6.0 on problems taken from (Lukšan and Vlček, 1999).The same initial points as in this reference were used.The columns headed by "n" and "m" respectively give the number of variables and constraints.The superscript (in the first column) and the other column headlines have the same meanings as those given for Table 1.The Interior/Direct program, on problems 2, 3, 4, 7 and 9 reached the iteration limit (of 10000) or failed to solve the problem and on problem 8 it found a local optimal solution having objective value 1.67453E+02, which is significantly smaller than the values obtained by the other three programs.On problem 3, the Interior/CG program and the Active-Set program terminated with the objective values 3.40390E+02 and 7.08426E+00, respectively.On problem 1, the Active-Set program found a local optimal solution having the objective value 1.92827E+00, which is significantly larger than the values obtained by the other three programs.
The results in Tables 3, 4, 5 and 6 show the effectiveness of the proposed algorithm as compared to the three algorithms recently developed in KNITRO (Byrd et al., 2006) on constrained nonlinear least squares problems.Observing the low values of the number of local iterations under the column "NAI" in all the tables, the local two-step superlinear convergence of our proposed algorithm is well supported.Moreover, the overall performance of the proposed algorithm on the test problems attests to its robustness.11) proposed here, we used alternative update rules instead of (11) in the local steps and ran the algorithm on all the test problems again.The asymptotic results showed a marginal outperformance of (11).Thus, the overall clear outperformance of our new code to the one in (Mahdavi-Amiri and Bartels, 1989) (as we mentioned before) should be attributed to our global steps and the use of our new nonsmooth line search strategy.With this observation, it is remarkable that the two-step superlinear convergence of the method in (Mahdavi-Amiri and Bartels, 1989) is yet to be proved.

Conclusions
We presented a structurally projected exact penalty method adapted for solving constrained nonlinear least squares problems.The projected structure made appropriate use of the ideas of Nocedal and Overton (1985) for general nonlinear programs.Numerical results showed our proposed algorithm to be efficient and reliable.For robustness, we devised a nonsmooth line search procedure accounting for the structure in the nonlinear least squares objective.We implemented the proposed algorithm and tested the program and three nonlinear programming codes from KNITRO on both small and large residual problems (Hock and Schittkowski, 1981;Lukšan and Vlček, 1999) and some randomly generated ones due to Bartels and Mahdavi-Amiri (1986).The results showed the practical relevance of our special line search and projected considerations for the sum of squares objective.

Acknowledgment
The first author thanks the Research Council of Sharif University of Technology for supporting this work.

References
distinctive feature of a least squares problem is that by knowing  ,Gxwe can compute the first part of the Hessian i c  Since () is a continuous function and bounded below, a minimizer *  exists and a minimum occurs either at a breakpoint, where ( r is computed.Along this interval, derivatives of the linearized constraints would remain unchanged.Therefore, the change in ' is determined by the change in ' differentiable and their first and second derivatives are uniformly bounded in norm on D; 1 converges to * x with a local two-step Q-superlinear rate.


This avoids underflow, or a stringent test, in either case that val becomes far from or close to zero.A much smaller tolerance, ,  of course, replaces  in (16) when the tests for convergence are made.By the same token, activity is determined by, of an equality constraint is determined by the reverse of this inequality.If second term serves us as a general-purpose average function value.


= 10 , = 10 , = 10 , = 10 , = 10 , These values were determined by experience as ones that have worked well on most of the problems had tried.
and the left and right derivatives of ( ), line search with the initial step length  };  then we repeat the above procedure with x replaced by .

Table 1 .
Initializing with the zero matrix

Table 3 .
Results on randomly generated test problems initializing with the zero matrix . Each set of random problems are generated with all quantities ( *, x   , for details see (Bartels and Mahdavi

Table 4 .
Results on randomly generated test problems initializing with the identity matrix

Table 5 .
Results of KNITRO on randomly generated test problems