Least-squares Problems

The multilinear least-squares (MLLS) problem is an extension of the linear least-squares problem. The difference is that a multilinearoperator is used in place of a matrix-vector product. The MLLS ...


Introduction
Consider the following multilinear least-squares (MLLS) problem in which u • v denotes the component-wise product of vectors u and v.Given a vector b ∈ R m and matrices A i ∈ R m×n i , i = 1, 2, . . ., L, find x * ∈ R N that solves the problem where x i ∈ R n i , i = 1, 2, . . ., L, N = n 1 + n 2 + . . .+ n L and x = (x T 1 , x T 2 . . . ., x T L ) T .For the sake of simplicity, we consider here the standard Euclidean norm, although the subsequent reasoning holds also for the weighted Euclidean norm.Note that if L = 1, then (1) is a linear least-squares problem.
The MLLS problem occurs, for instance, in factor analysis, chemometrics, psychometrics [7,8,9,13,15].We will study this problem in relation to the design of filter networks [1,11,14], specifically the sequential connection of sparse sub-filters presented by Fig. 1.In this case, x i stands for individual characteristics (design parameters) of sub-filter i, whose frequency response is A i x i .The ideal (desired) frequency response of the sub-filter sequence and the actual one are represented in (1) by b and (A 1 x 1 ) • (A 2 x 2 ) • . . .• (A L x L ), respectively.It is common for the design of filter networks that N << m.
MLLS is a non-convex, typically large-scale, optimization problem with a very large number of local minimizers.Each of the local minimizers is singular and non-isolated.The most typical approach to solving the MLLS problem consists in generating randomly a number of starting points for their further refinement with the use of local optimization methods.One major shortcoming of this approach is that a very large number of starting points is required to be generated in order to find a reasonably good fit in problem (1).Another major shortcoming is that the convergence of local methods is too slow in this problem.
The most popular of the local algorithms used for solving the MLLS problem is called alternating least squares (ALS).It exploits the feature of problem (1) that, if to fix all the vectors x T 1 , . . ., x T L but one, say x i , then the resulting sub-problem of minimizing over x i is a linear least-squares problem.In the ALS algorithm, the linear least-squares sub-problems are solved for the alternating index i.This algorithm is also known as block-coordinate relaxation or nonlinear Gauss-Seidel algorithm [12].The mentioned major shortcomings of the local search algorithms are inherent in ALS.
The main aim of our work here is to develop an effective global optimization approach to solving the MLLS problem and justify it theoretically.
Our work is organized as follows.In Section 2, we consider a new constrained optimization problem introduced in [11].It is similar, in some sense, to the MLLS problem, and its solution gives a good starting point for running the local search in the MLLS problem.Global optimality conditions for the new problem are derived in Section 3.These conditions are then used in Section 4 for constructing a global search algorithm.In Section 5, we report and discuss results of applying our global search algorithm to solving MLLS problems related to the design of filter networks.In Section 6, we draw conclusions and discuss future work.

Problem reformulation
Problem (1) can be written in the equivalent form where Following [11], we consider a similar, conceptually close, problem in which b = y 1 • . . .• y L , and A i x i ≈ y i , i = 1, . . ., L.
We formulate it as: After solving this problem in x, we obtain min where P i is the matrix of orthogonal projection defined by A i .In the numerical implementation, it may not be reasonable to compute P i explicitly, but instead, it can be treated as a linear operator defined by A i in one of the standard ways [2].Moreover, since this problem may admit trivial asymptotic solutions, it must be regularized.This can be done by adding µ y i 2 with a small µ to each term in the objective function.We assume further that all matrices P i have been slightly perturbed in this way, and hence they are positively definite.
Observe that the regularized objective function in ( 2) is strictly convex with a unique minimizer in the origin.Unlike (1), this problem does not suffer from the bad property of having non-isolated minimizers.However, it inherits the multi-extremal nature of problem (1).
Without loss of generality, we can assume that b ≥ 0 in (1) and (2).Indeed, if any component of b is negative, the change of its sign to positive can be compensated by the change of sign in the corresponding row of, for instance, A 1 .In [11], it is discussed how to treat the case of zero components.From now on, we assume that b > 0.
Note that the feasible set in problem (2) consists of disjoint subsets.Each of these subsets is connected.It is characterized by a certain feasible combination of signs of y 1 , . . ., y L .The total number of the subsets is determined by the number of the feasible combinations of signs that equals 2 m(L−1) .
Consider how to solve problem (2) on a given isolated subset of the feasible set, for instance, the subset associated with the positive orthant R mL ++ = {y ∈ R mL : y > 0}.The problem in this case takes the form min The substitution y i = exp(w i ) reduces this problem to: min where exp(•) and ln(•) are component-wise operations.This linear equality constrained problem can be efficiently solved by the conventional methods [10] that are able to take advantage of using the easily available derivatives of the objective function and the simple structure of the linear constraints in (4).In [11], the computational time for solving this problem was approximately half the time for one run of the ALS algorithm on problem (1).
To study the general case of sign combinations, we divide problem (2) into an outer binary problem to deal with the signs of y and an inner subproblem, similar to (4), in which the minimization is performed on the corresponding subset of the feasible set.Notice that the feasible vectors y 1 , . . ., y L in (2) have no zero components, because b > 0. Following [11], we present problem (2) as a specially enumerated set of subproblems of the form (3). We will use the following notations: ( Let S m be the set of all vectors in R m whose elements equal +1 or −1.If y i is feasible, then s i ∈ S m and ȳi > 0. Furthermore, for all feasible vectors y 1 , . . ., y L in (2) we have with the objective function Here, the dependence of Pi on s i is given by (5).Note that the substitution s L = s 1 • . . .• s L−1 is able to eliminate the equality constraint in outer problem (6), which is a binary problem with 2 m(L−1) feasible points.This number of feasible points defines the number of all inner problems (7).
The important feature of problem ( 6) is that it performs a partitioning of the feasible set in (2) and reduces this problem to a finite number of easy-to-solve inner problems (7) of the same form as (3).This allows us to capture the nature of the local minimizers of problem (2) and to enumerate them efficiently by combining the signs.
Any optimal or close to optimal solution y to problem (6), or equivalently problem (2), can be used as an initial point in problem (1), to be further refined by local search algorithms like ALS.Given y, the initial point x is computed by the formula where . In our numerical experiments [11], we compared the performance of the ALS for the initial point generated by our approach and for randomly generated points.It was required to run the ALS from at least 500 random points in order to get a local minimizer in (1) with the approximation error comparable with only one run of the ALS from the point generated by solving problem (6).Thus, the approach introduced in [11] achieved the overall network design speedup factor of several hundreds.Moreover, the randomly generated initial points did not guarantee any success.This speaks for the robustness of the approach.
It should be emphasized that binary problems are, in general, difficult to solve, but fortunately, the nature of signs in the sub-filter outputs are often well understood.Prior knowledge of the filter characteristics and its structure helps to facilitate substantially the solution process of the outer problem by focusing on a relatively small number of sign combinations (see [11] for details).

Theoretical background for global search
Given an approximate solution to problem (2), our global search is aimed at finding a new combination of signs in (6) with a better value of the objective function defined by (7).It is based on solving problem (2) under the assumption that all components of given feasible vectors y 1 , . . ., y L are fixed, except for their kth components denoted here by u 1 , . . ., u L , respectively.The value of k changes in the process of global minimization.
To justify our approach, we will consider problem (2) rewritten in terms of these components.Let ŷi coincide with y i in all the components, but the kth one which equals zero in ŷi .Let (P i ) k and (P i ) kk stand for the kth column and diagonal element of the matrix P i , respectively.It can be easily verified, for i = 1, . . ., L, that Thus, the minimization over u = (u 1 , . . ., u L ) T in (2) results in the problem: where c denotes the kth component of b.It is worth noting that this problem has at least one global minimizer, because the level sets of the objective function are compact and the function defining the constraint is smooth.Let U * stand for the set of all global minimizers in problem (9).For this problem, the following notations will be used: Let the multivariate function σ(•) be defined as the product of the signs of all the variables, for instance, Note that the feasible set in problem (9) consists of disjoint subsets.Each of these connected subsets belongs to the corresponding orthant R L s determined by sign(u).Since such subsets are characterized by σ(u) = 1, their total number is 2 L−1 .It grows exponentially with L. This is indicative of a highly multi-extremal nature of problem (9).The result presented in Theorem 1 allows one to effectively locate the optimal combination of signs S * or, equivalently, to find the orthants that contain the connected subsets of the feasible set on which the global optimum of problem ( 9) is attained.
Theorem 1 Let the coefficients c and α in problem (9) be positive.Then, s * ∈ S * if and only if σ(s * ) = 1 and one of the following conditions holds: (i) σ(β) ≥ 0 and s * i = sign(β i ), for all i such that β i = 0; (ii) σ(β) = −1 and there exists i * ∈ I such that with the set Proof.We start by proving the "if" part.Suppose that s * ∈ S * .Let u * ∈ U * be such that sign(u * ) = s * .The feasibility of u * implies that σ(s * ) = 1.
Consider the linear space transformation given by the formula This nonsingular transformation is aimed at easing our analysis because, in the new space, the objective function takes the form of a squared Euclidean distance between the two points v = (v 1 , . . ., v L ) T and a = (a 1 , . . ., a L ) T = s * • β.Note that Arg min Another important feature of the transformation is that it does not change the multiplicative type of the constraint.The problem in the new space takes the form: where c = c • α 1 • . . .• α L .Thus, the reformulated problem (10) is to find the shortest distance from a to the feasible set.Let v * = (v * 1 , . . ., v * L ) T be the image of u * in the new space, i.e., v * = α • s * • u * .Clearly, v * is a global minimizer for problem (10).Then, in the view of the fact that v * > 0, conditions (i) and (ii) can be reformulated in the new space as follows: (ii ′ ) if σ(a) < 0, then there exists i * ∈ I such that a i > 0, for all i = i * , and a i * < 0.
We first show that there is no more than one negative component of a. Suppose, to the contrary, that at least two components of a are negative, say, a i and a j .It can be verified easily that the open linear segment (v * , a) intersects the hyperplane where λ ∈ (0, 1) is given by the formula Consider the point v ′′ = (v ′′ 1 , . . ., v ′′ L ) T defined as follows: This point is obviously feasible.The triangle inequality gives since v ′ ∈ (v * , a).Hence, the feasible point v ′′ gives a better objective function value in (10) than v * .This contradicts the assumption that v * is a global minimizer for problem (10) and proves that a can have at most one negative component.This result immediately proves (i ′ ) for σ(a) = 1.For σ(a) = 0, suppose, contrary to (i ′ ), that there exists a i < 0. Such component must be unique.There must exist index j such that a j = 0.For these indices i and j, consider the points v ′ and v ′′ defined by formulas ( 11), ( 12) and ( 13).One can show, as above, that ( 14) holds for the two points.This contradicts the assumption that v * is a global minimizer.Thus, statement (i ′ ), and consequently part (i) of the theorem, hold.
Consider now the case σ(a) < 0. As shown above, exactly one component of a must be negative, say, a i < 0. Suppose, contrary to (ii ′ ), that i / ∈ I.For this i and any j ∈ I, consider the point v ′′ defined by (13).For the point v ′ defined by ( 11) and ( 12), the condition λ ∈ (0, 1) is satisfied, because a i + a j < 0. For v ′ and v ′′ one can show, as above, that ( 14) holds, which contradicts that v * is a global minimizer.This proves (ii ′ ) and accomplishes the proof of the "if" part of the theorem.
For the "only if" part, let s * satisfy the sufficient conditions.We choose any u ∈ U * and construct a point u * = (u * 1 , . . ., u * L ) T individually for each of the cases (i) and (ii).Suppose that (i) holds.Consider u * defined as follows: Obviously, sign(u * ) = s * and u * is a feasible point.As proved in the "if" part, sign(u) must satisfy (i).Thus, s * i = sign(u i ), for all i such that β i = 0. Therefore, u * has the same objective function value in (9) as u.
Suppose now that (ii) holds.Let i * ∈ I be such that s * i * = −sign(β i * ).It must satisfy (ii).Suppose j ∈ I is the index for which sign(u j ) = −sign(β j ).This means that then we define u * = u.Otherwise, we define u * as follows: It can be easily seen that sign(u * ) = s * and also that u * is feasible and has the same objective function value in (9) as u.
In each of the two cases, u * ∈ U * , and consequently s * ∈ S * .This completes the proof of the theorem.This result suggests ways in which the sign combinations intrinsic in the global minimizers of problem ( 9) can be effectively constructed for any given β.Our algorithm presented in the next section is based on this result.

Global search algorithm
Theorem 1 implies that s * is not unique when any of the following two cases occurs: • β has more than one zero component.
• σ(β) = −1 and the set I consists of more than one element.Given β, the set S * can be constructed based on the optimality conditions as follows.
• If σ(β) = 0, then S * is composed of all vectors s * ∈ S L whose components s * i = sign(β i ), for all i such that β i = 0, and the rest of the components ensure σ(s * ) = 1.
• If σ(β) = −1, then S * is composed of the same number of elements as I.Each i * ∈ I determines s * ∈ S * in such a way that s * i * = −sign(β i * ), and the remaining components of s * are the same as in sign(β).Note that it is not necessary to construct the whole set S * when it is required to find only one s * ∈ S * .The same principles as above can be employed in this case.
We propose below a global search algorithm.It uses procedures optsign, local and als.Procedure optsign(y, k) computes β by formula (8), and then it returns an s * arbitrarily chosen from the set S * .Another task of this procedure is to verify if a given s ∈ S L is optimal for problem (9).The optimality conditions given by Theorem 1 can be used for checking if s ∈ optsign(y, k) holds.Procedure local(s 1 , . . ., s L ) returns y that solves problem (7) for a given sign combination s 1 , . . ., s L .Procedure als(x 0 ) returns the result of running the ALS algorithm from a given starting point x 0 .
The derived optimality conditions open the way to a successive improvement of the sign combination in outer problem (6).The resulting global search strategies admit various implementations.
The one that we present below is based on a sequential checking of the components in s 1 , . . ., s L for a possible improvement.It starts from a given sign combination s 1 , . . ., s L , and it returns an approximate solution for problem (1) In Algorithm 1, an initial sign combination s 1 , . . ., s L is required to be given.For this purpose, the choice of signs proposed in [11] can be used.An alternative is to choose as the initial sign combination the best one produced by ALS, starting from a number of randomly generated points.

Numerical experiments
For generating MLLS test problems of the form (1), we considered two-dimensional filters of the monomial class [6] with the lognormal [3,4] and logerf [5,11] radial parts.We shall use the abbreviations LP, BP and HP standing for the Low-Pass, first-order Band-Pass and first-order High-Pass filters, respectively.They were approximated by a sequence of L = 5 sub-filters.The total number of coefficients N of the sub-filters and the number of components m of the discretized ideal frequency responses b are specified in Table 1 for each filter.
Our numerical experiments were performed on a PC with a 2.27GHz Intel Xeon E5520 processor and 32-bit Windows XP operating system.The results are shown in Table 1.The Matlab routine fmincon was used to solve problem (4) which is a reformulation of (7).As mentioned earlier, the objective function in (2) is required to be regularized.For the regularization parameter value, we used µ = 0.5.We shall use the term approximation error to refer to the objective function value in (1) and denote it by ε.In Table 1, min(ε j ) stands for the best approximation error obtained by running ALS from 500 randomly generated starting points.The CPU time (in seconds) spent on performing these 500 runs is denoted by t als .We shall use the term local search to refer to solving problem (7) only once for the sign combination chosen as proposed in [11].The approximation error ε loc is the result of one run of ALS from the starting point produced by the local search.Our global search strategy is aimed at improving the local search results.To initialize it, we used the same sign combination as in the local search.The set of filters used in our experiments included also zero-and second-order bandpass and high-pass filters.For these filters, the initial sign combination proposed in [11] was nearly optimal in the sense that there was practically no difference between the approximation errors ε loc and min(ε j ).For this reason, our global search strategy was unable to improve the initial sign combination.
The results presented in Table 1 refer to the filters for which the global search strategy was able to improve local search solutions in terms of objective function values in problems (7) and (1).In the case of high-pass and band-pass filters, the solution produced for problem (1) was as good as the best of those produced by 500 runs of ALS, but for achieving this, the global search required a CPU time that was over 50 times shorter.These results demonstrate the efficiency of our global search strategy and its capability for substantially speeding up the filter design process.

Conclusions
The derived optimality conditions open up possibilities to perform a global search for a better sign combination.The implemented global search strategy is not a very computationally demanding procedure.Its efficiency was demonstrated by the results of numerical experiments.For some filters, our global search ensured a faster process of optimizing sub-filter parameters with an overall speedup factor of over fifty.
We plan to extend our approach to solving optimal filter design problems having more general sub-filter network structures.