Experimental Design for Nonlinear Problems

Experimental designs for nonlinear problems have to a large extent relied on optimality criteria originally proposed for linear models. Optimal designs obtained for nonlinear models are functions of the unknown model parameters. They cannot, therefore, be directly implemented without some knowledge of the very parameters whose estimation is sought. The natural way is to adopt a sequential or Bayesian approach. Another is to utilize available estimates or guesses. In this article we provide a brief historical account of the subject, discuss optimality criteria commonly used for nonlinear models, the associated problems and ways of overcoming them. We also discuss issues of robustness of locally optimal designs. A brief review of sequential and Bayesian procedures is given. Finally we discuss alternative design criteria of constant information and minimum bias and pose some problems for future work.


Introduction
Experiments are investigations where the investigator has some control over the system under study.Such investigations are very common in many fields, where the effect of some factors on a specified response variable is to be investigated.Design objectives include: 1. Reduction of systematic errors, due to controllable factors that are of no interest to the experimenter, but may influence the response.Such factors are commonly known as nuisance factors.2. Minimization of experimental error, due to random variability and/or uncontrollable and unknown factors that may influence response.3. The appropriate number of experimental units to be used.It is generally desirable to have an experiment large enough for effects of practical significance to be detected, but not too large to waste experimental material detecting small effects of no practical significance.Most of the early developments of the subject were closely associated with agricultural field experiments, and with the names of Fisher (1935) and Yates (1935Yates ( , 1936Yates ( , 1937)).This remains an active research area today.Most of the designs developed then were informal emphasizing the key concepts of blocking and randomization to meet the first two of the objectives stated above.The models used are linear with a continuous response variable and categorical explanatory variables.These models can generally be written as: Where Y is the observed response, q is a p×1 vector of unknown parameters and X is a p×1 indicator vector whose elements are zeros and ones.The variance of Y is assumed constant and denoted by σ 2 .When N observations are made, the observation Y is an N×1 vector whose i th element is Yi, where E (Yi) = q T Xi .
Equations ( 1) can now be written as: Where X is an N×p matrix whose i th row is Xi T .The matrix X is called the design matrix.The normal equations for estimating q are given by: The matrix X T X is usually singular and some conditions-known as estimability conditions-are imposed to make it nonsingular.The estimator q ˆ of q is then ( The experimental design model in (1) is a special case of the polynomial regression model, usually written in the form: where f(X) is a function of X only.All the equations above apply with X replaced by f(X).The polynomial regression models do not usually suffer from the non-singularity problem of the experimental design model and no estimability conditions are needed.Models are said to be non-linear if they are not linear in the parameters and, therefore, can not be written in the form of equation ( 4).Such models are written as: Where f(X, q ) is nonlinear in q .Consideration of design issues for polynomial regression models, lead to the development of the notion of optimality activated by Elfving (1952Elfving ( , 1959) ) and Kiefer (1959).For models of the form (5) the Fisher information matrix is For linear models as in equations ( 2) and (4), I ( q ) is independent of Ө and the variance of the maximum likelihood estimator q ˆ is σ 2 I -1 ( q ).Optimality criteria call for the maximization of some real valued function of Fisher's information matrix.This is equivalent to the minimization of a function of I⁻¹( q ) which is proportional to the asymptotic covariance matrix of the maximum likelihood estimator.Thus optimality criteria are variance based.
When the model is nonlinear, I ( q ) is a function of the unknown parameters q and, so are the optimal designs.An account of the history of the development of experimental designs can be found in Atkinson and Bailey (2001) for instance.We introduce below commonly used optimality criteria.

Optimality criteria
The optimality criterion adopted for a given problem naturally depends on the main objective of the experiment.That leads to several optimality criteria being proposed and studied, such as i.
D-optimality: This is by far the most commonly adopted optimality criterion.A design is called Doptimum if it minimizes det I⁻¹( q ).The criterion treats all the p parameters as of equal interest, which makes it the most appealing when estimation of the parameter is the main objective.It guarantees confidence ellipses of the smallest volume.When interest is in a subset of the parameters or some specific linear combination A Tq of the parameters, modifications known as Ds-and DA-optimality are used (see Atkinson and Donev;1992, chapters 10 and 11).ii.G-optimality: A design is called G-optimum if it minimizes the maximum standardized variances of predicted response.An equivalence theorem (Kiefer and Wolfowitz, 1960) proves the asymptotic equivalence of D-and G-optimality.iii.A-optimality: A design is called A-optimum if it minimizes the average variance of the maximum likelihood estimators of the parameters.This is equivalent to the minimization of the trace of I⁻¹( q ).iv.E-optimality: A design is called E-optimum if it minimizes the variance of the least well estimated contrast a T q (a T a=1).This is equivalent to the minimization of the largest eigen value of I⁻¹( q ).This means that D-, A-and E-optimality can be defined in terms of the eigenvalues respectively.An appealing property of D-and G-optimality that is not shared by A-and E-optimality is the invariance of optimal designs under reparametrization.In addition to the above most commonly used criteria, there are several others (see Atkinson and Donev, 1992).These include: v.
c-optimality: Minimize the variance of the estimate of c Tq , where c Tq is the linear combination of parameters of main interest.vi.Q-optimality: Minimize average prediction variance over a specified design region (see Myers et al, 1994).vii.F-optimality: Minimize the width of Fieller's fiducial interval (Finney, 1971).
owing to the use of the alphabet in the naming of these criteria, we now see reference to alphabetic optimal designs (e.g.Myers et al, 1994).
The theory of optimal designs and methods of construction of optimal designs are the subjects of several books including Silvey (1980), Atkinson and Donev (1992), Pukelsheim (1993) and Cox and Reid (2000).For reviews and sample applications of optimal designs to applied problems in education, business, marketing, epidemiology, microbiology, environmental science, pharmaceutical and medical research and manufacturing industry, see Berger and Wong (editors, 2005).

Non-linear models
When optimality criteria are used for linear models the optimal designs have attractive properties and can be constructed and used (see Silvey, 1980).Most of the literature on nonlinear problems, adopted the same criteria.Since the information matrix depends on the unknown model parameters for nonlinear models, so do the resulting optimal designs.This presents a serious hurdle to the implementation of these designs in practice.
The first non-linear design problem was the dilution series introduced by Fisher (1922) well before his later foundation work on linear problems The problem he considered then involved a single parameter exponential model.Fisher argued that the magnitude of the variance relative to the parameter should be minimized rather than the variance in isolation.He thus considered minimizing the coefficient of variation I⁻¹(log q ).He noticed that I (log q ) is almost independent of θ and utilized this property to construct a design that provides a specified proportion of the total information.This strategy was generalized by Abdelbasit andPlackett (1981, 1983) and studied in greater details by Abdelbasit (1998).Stallard and Gravenor (2006) discuss further design issues on the dilution series.
Later work on designs for non-linear problems did not, however, follow Fisher's approach.Instead it focused on optimality criteria originally proposed for linear models, resulting in parameter dependent optimal designs.Despite their litle practical value, these designs provided useful reference points for designs that can be implemented (Ford..et al, 1992).This probably is the main justification for the extensive literature on such designs.Dette et al (2004) investigate E-and c-optimal designs for a broad class of non-linear regression models.Dette and Sahm (1998) consider maximum variance optimality criterion of Elfving (1959) in the context of non-linear response models and constructed mini-max optimal designs.Dette and Haines (1994) develop a procedure for constructing E-optimal designs for a broad class of two parameter models.Examples illustrate the main features of the procedure.Hedayat et al (2004) identify classes of 2-parameter non-linear models, for which D-optimal designs are precisely supported on two points.They also obtain some efficient designs that allow for model checking.
Dependence of optimal designs on unknown parameters is the major problem limiting implementation.A number of methods are used in the literature to overcome this problem.Among these are : a. proceeding sequentially b. adopting a Bayesian approach c. using the best available estimates or guesses, leading to what are called locally optimal designs.
In all the three approaches either preliminary estimates for the parameters or prior distributions have to be used.An immediate question is how robust (or sensitive) the resulting designs are to poor initial estimates and priors ?

Sequential designs
This is the natural approach when optimal designs depend on the unknown model parameters.The experimenter runs the first experiment using the best available estimate or guess and similarly use estimates obtained from an experiment to run the next.Abdelbasit and Plackett (1983) obtained sequential designs for one parameter exponential and two parameter logistic models.They concluded that: a.The better the initial estimate, the more experimental subjects should be used in the first stage.b.With good initial estimates and relatively small number of subjects, it may not be worthwhile to go beyond one experiment.c.Underestimating the variance is more serious than overestimating it.d.The smaller the variance, the more sensitive the design becomes to poor initial estimates.
One of the earliest problems considered in this area are the dose-response problems, where interest lies in the estimation of the parameters of the response curve or its percentiles.Most of the interest was initially in the Median of the response curve, commonly known as the median effective dose ED50.Later extreme percentiles ED100p where p is close to 1 or 0 were considered.Dixon and Mood (1948) proposed the up and down method for estimating the ED50.Subjects are tested one at a time at equally spaced doses and the experimenter performs the experiment at the next higher dose in case of no response and the next lower dose in case of a response.Bortot and Giovagnoli (2005) proposed a second order up and down method, where the next step is based on the outcome of the last two.Robbins and Monro (1951) extended the up and down method to a variable step size, where the doses get closer to each other as they approach the ED50.A discussion of these methods is given in Wetherill and Glazebrook (1986).A modification to improve the performance of the Robbins-Monro procedure at extreme percentiles is given by Joseph (2004).A summary of developments in sequential designs for estimating ED50 is given by Wu (1985) where he proposes new designs and compare -via simulation -small sample size performances.
More generally Sitter and Forbes (1997) consider a class of symmetric binary response models and showed that for many of the optimality criteria (e.g.A-, D-, E-, F-and G-optimality); the optimal second stage design consists of two points symmetrically placed about the ED50, with possibly different weights at each point.Sinha and Wiens (2002) investigate sequential design methodologies when the fitted model is possibly of an incorrect parametric form.Their small sample simulation results indicate that their designs reduce mean squared error due to model misspecification and heteroscedastic variation.Hu (1998) studied the consistency of parameter estimators in sequential non-linear cases and established consistency of Baye's estimators in stochastic regression models.
Sequential procedures are not practical except in situations where possible response is immediate, and their properties are hard to explore analytically.Hence their use in practice remained limited.

Bayesian designs
Since optimum designs for nonlinear models depend on the values of the unknown parameters q , a Bayesian approach to the design seems natural.Assuming an initial estimate or guess is effectively an assumption of prior knowledge about q .
If such knowledge can be expressed in the form of a prior probability distribution, the posterior expected information can be obtained.The optimality criteria above can then be applied to the posterior expected information.Chaloner and Lantz (1989) derived general Bayesian theory for non-linear models, applied it to the logistic regression and numerically obtained optimal designs for one and two parameter cases.Chaloner (1993) also considered Bayesian designs for one parameter, single explanatory variable models.Consistency issues are addressed by Hu (1998).Dette and Neugebauer (1997) obtained Bayesian D-optimal designs for exponential growth models with up to three explanatory variables.

Locally optimal designs
These are probably the most commonly used and most extensively studied in the literature.Issues associated with such designs are robustness to poor initial estimates and efficiency relative to an optimal or another design.In the context of dose-response problems, and when interest is in a single parameter, typically the ED50, Finney (1971) obtained symmetrical two and three points F-optimal designs.Abdelbasit and Plackett (1983) used simulation to compare the use of fiducial intervals and asymptotic intervals.They concluded that asymptotic intervals are not inferior to fiducial intervals.Their results were re-examined by Sitter and Wu (1993a) who supported Finney's proposal and concluded that Abdelbasit and Placket's conclusion resulted from their large sample size.Sitter and Wu (1993b) also considered F-optimality together with other alphabetic ones.For further comparisons of the fiducial and asymptotic methods see Faraggi et al (2003) and Yangxin (2005).
For the standard two parameter logistic model, Abdelbasit and Plackett (1983) derived D-optimal designs.Their work was generalized by Minkin (1987) and Khan and Yazidi (1988).Myers et al (1994) developed optimal designs for the logistic model using several alphabetic criteria.Sitter and Fainaru (1997) obtained alphabetic optimal designs for a class of symmetric models that include the probit and logistic models.Dette and Sahm (1997) obtained standardized A-and E-optimal designs for the probit and logit models.The reason for standardization of the information matrix is the scale dependency of the A-and E-optimality criteria.Ford et al (1992) used canonical forms in the construction of locally D-and c-optimal designs for various non-linear problems.Dette and Sahm (1998) considered minimax designs and Fandom and Seidel (2000) gave a minimax algorithm that works efficiently for constructing optimal symmetric balanced designs.
The question of robustness remains a key question for all locally optimal designs.Sitter (1992) used minimax procedures to obtain designs that are robust to poor initial estimates.The procedure yield designs with more design points and larger spread.The more the uncertainty about the parameter, the more spread out is the design and supported on more points.The issue of robustness is also addressed by many authors including Abdelbasit and Plackett (1983), Myers et al (1994), Kalish (1990), Hedayat et al (1997), Moerbeek (2005) and Melas (2005).Cox (1998) suggests that problems caused by non-linearity of the model may not be that serious.He considered the case of estimating small treatment differences, and showed that the problem caused by nonlinearity of exponential family distributions is not severe, and the usual normal theory applies well if the data are not very heterogeneous.
Situations with more than one explanatory variable are studied by Abdelbasit and Plackett (1982) in relation to joint action of stimuli (see also Antonello and Raghavaroo, 2000).Atkinson et al (1995) introduced gender as a second explanatory variable and Sitter and Torsney (1995) extend D-and c-optimal designs for binary response data to the case of two design variables.
Design problems for bivariate response cases are investigated by Heisi and Myers (1996) who consider the bivariate response (efficacy, toxicity) modeled by a bivariate logistic and developed D-and Q-optimal designs.They also discuss robustness of the designs obtained.Dragalin and Fedorov (2005) developed an adaptive design for efficacy-toxicity response with bivariate correlated binary response.The main problems of all optimal designs are that they : a. heavily depend on the precise specification of the model.b. usually offer no possibility of checking the assumed model, since they invariably have as many points as the number of parameters to be estimated c. can not be implemented, when the model is non-linear, because they are functions of the unknown model parameters.In the remainder of this article we introduce alternative design strategies that try to avoid the above problems.

Fisher's constant information
Somehow, the first design criterion proposed for non-linear models (Fisher;1922) was hardly pursued further in the literature.Abdelbasit and Plackett (1981, 983) showed that Fisher's results for a single scale parameter hold for any model and not just the exponential.They also showed that uniform designs are constant information designs for a single location parameter.They used the word reliable to describe designs that make the information function independent of the parameter.They also showed that uniform designs are D-reliable for two parameter models when the probability of response at dose x can be written as: for any specified distribution function F (.).A design is called D-reliable if it makes the determinant of the information matrix independent of the parameters.A numerical investigation by Abdelbasit (1998) examined the use of the criterion for one and two parameter logistic models.His results indicate that the resulting designs have far more points than the number of parameters.

Bias-based designs
Minimum bias (all bias) criterion was introduced by Box and Draper (1959) to rival design criteria based solely on variance.They noted that unless the variance contribution to the mean squared error is many times greater than bias contribution, the designs that minimize mean squared error are very similar to minimum bias designs.Abdelbasit and Butler (2006) extended Box and Draper criteria to generalized linear models of the form fy(y, q ,φ) = exp{(y q -b( q ))/a(φ) + c(y,φ)} (8) This family includes most of the standard probability distributions.The parameter q is called the canonical parameter.Abdelbasit and Butler (2006) derived minimum bias designs for the canonical and non-canonical cases, and presented as examples; the binary, the Poisson and exponential cases.Their results suggest that equally spaced designs minimize the bias-standard error ratio and the mean square error-variance ratio.

Concluding remarks
The main hurdle of dependence of designs on unknown model parameters is yet to be resolved.Variancebased optimal design criteria seem to have matured, but offer no practical solution.Alternative design criteria other than optimality need to be sought.The abovementioned constant information and minimum bias are examples of such alternatives.At present the scarce results available indicate that these criteria yield designs with too many design points as opposed to the too few of optimal designs.A search for a compromise that combines the merits of the various approaches is worth perusing.
Very little is known, however, about the statistical properties of the designs based on either of the two alternatives suggested above.Further work in this area is needed, and the efficiency of these designs relative to optimal or other designs needs to be investigated.A comparative analysis of alternative approaches (variance based, bias based and constant information) is also needed for providing useful guidance to the practitioner.