ON REGRESSION ESTIMATORS USING EXTREME RANKED SET SAMPLES 69 2 . Regression Estimators when

: ةصلاخ  بولسأ  .نيتلاح يف )ص( دمتعملا ريغتملل عمتجملل يباسحلا طسولا ريدقتل مدختسا دق رادحنلاا  ( لقتسملا دعاسملا ريغتملل يباسحلا طسولا نوكي امدنع ىلولأا ةلاحلا يف نويكي امدينع ةيينانلا ةيلاحلا ييفو ًامولعم )س ولعم ريغ م لقتيسملا رييغتملل يبايسحلا طيسولا ريديقتل ةيجودتملا ةينيعلا ةيقيرط انمدختيسا ديقل ةيينانلا ةيلاحلا ييف اضيأ . ( هءلامتو يوامس ثحب يف ءاج امك ةبترملا ىوصقلا تانيعلا مادختساب نيتقيرطلا ءادأ نم انققحت دقل .)س( 1996 ) .  ت دقل ةيلاح ييف حينأ هنايتنلا تريتظأ ديقلو .ثيحبلا اذه يف قيبطتلاو ةاكاحملا ةطساوب ةيمقرلاو ةيرظنلا ةيحانلا ضرع م ةيقيرط نيم ةييلاعف رينكا ييه راديحنلاا تارييرقتل ةيبترملا ىويصقلا تاينيعلا مادختيسا ةيقيرط نايف ةيلنامتملا تايعيتوتلا .ةطيسبلا تانيعلاو ةيداعلا ةبترملا تانيعلا


Introduction
In many experimental situations the response variable Y is related to a non-stochastic concomitant variable, X .For instance, let Y be the Bilirubin level in jaundice babies who stay in neonatal intensive care and let X be the weight of the baby at birth.By obtaining simultaneous observations on X and Y , we can use information contained in the X-measurements to estimate the mean value of .
Y This can be done by using either ratio estimation or regression estimation.Herein, we are interested in the regression estimation method used to obtain increased precision in estimating the population means or totals of the variable of interest, Y , by taking advantage of its correlation with the auxiliary variable X .The two cases where the mean, x  , of X is known and where it is unknown are considered.In many cases the sampling units in a study are easier ranked than actually quantified.McIntyre (1952) proposed to use the mean of n units obtained from a ranked set sample (RSS) to estimate a population mean.Patil et al. (1993) compared the precision of ranked set sampling with the regression estimator.They showed that using RSS is superior to regression estimator under SRS in most of the cases.Yu and Lam (1997) used the RSS regression estimation method to estimate the population mean and showed that using RSS provides a more efficient estimator than using SRS.For more details on RSS see, for example, Kaur et al. (1995) and Patil et al. (1999).Samawi et al. (1996) investigated the use of extreme ranked set sampling (ERSS) in reducing the ranking error and in improving the precision in estimating the population mean in the case of a symmetric underlying distribution.They showed that if the underlying distribution is the uniform distribution, then the highest magnitude of the relative savings occur when only the extreme ordered units are measured with equal proportion.However, in the case of other unimodal symmetric distributions the highest gain is achieved when the units possessing the middle rank are measured.For this reason, Yanagawa and Chen (1980) did not consider the uniform distribution while investigating various symmetric distributions to develop a better ranked set sample estimator of the population mean.
As in Samawi et al. (1996) we obtain an extreme rank set sample by first choosing r independent sets, each of which contains r bivariate elements drawn randomly from an infinite population.Rank the elements in each set with respect to one of the variables Y or X .Suppose that the ranking is done on the variable X .From the first set an actual measurement is taken of the X element with the smallest rank, together with the value of Y associated with this smallest element of X .From the second set an actual measurement is taken of the element with the largest rank of X , together with the associated Y value.
From the third set an actual measurement is taken of the element with the smallest rank of X , together with the associated Y value, and so on.In this way we obtain the first 1 r  measured elements using the first 1 r  sets, together with the associated values of the Y variable.The choice of the th r  element from the th r  (i.e., the last) set depends on whether r is even or odd : (a) If r is even the largest ranked X element is measured, together with the value of the associated variable Y .ERSSa will denote such a sample.(b) If r is odd we measure the median of X , together with the value of variable Y associated with the median of X .ERSSb will denote such a sample.The cycle may be repeated m times until n rm  bivariate elements have been measured.In this paper we propose to use ERSS to improve the precision of the two methods of regression estimation.We study the properties of these estimators and compare them under different settings.In Section 2, we obtain the regression estimator of the mean of Y using extreme ranked set sampling when x  is known.The mean and variance of the estimator are derived.Comparisons between the various estimators are discussed in terms of efficiencies.In Section 3, we obtain the regression estimator using extreme ranked set sampling when x  is unknown using a double sampling method.Again, we derive the mean and variance of the estimator and some comparisons between the various estimators are discussed in terms of efficiencies.An illustration of the methods using real data about the Bilirubin level in jaundice babies is given in Section 4.

Regression Estimators when  X is Known
Like ratio estimation, linear regression estimation of the mean is designed to increase the precision of the estimator by using an auxiliary variable X that is correlated with Y .When the relationship between Y and X is examined, it may be found that although the relation is approximately linear, the line does not go through the origin.This suggests that an estimator based on the linear regression of Y on X is better than an estimator that is based on the ratio of the two variables.

  
, where  is the correlation coefficient between X and Y .

When the population mean x
 is known, the regression estimator of the mean of Y is given by: where , and n m r  .
When the joint underlying distribution of   , XY is assumed to be a bivariate normal, the regression estimator reg Y is an unbiased estimator for y  and its variance is given by Tikkiwal (1960) or Sukhatme and Sukhatme, 1970.)However, if the assumption of the linear relationship in (2.1) is invalid, then the SRS regression estimator in (2.2) is in general a biased estimator of y  .

Regression Estimator Using RSS
Consider a bivariate RSS where the relationship between , and  Yu and Lam (1997) is given by Using basic properties of conditional moments, Yu and Lam (1997) showed that under (2.4), Reg Y is an unbiased estimator of y  and its variance is where, , and Again, if the assumption of the linear relationship is invalid, the RSS regression estimator in (2.5) is in general a biased estimator for  y .

Regression Estimator Using ERSS
Assuming that both variables, X and Y , have symmetric underlying distributions, let be respectively, the i th  smallest value of X and the corresponding value of Y obtained from the j th  sample and the k th where 1, ; 1, 2,..., / 2 i r j r  and 1, 2,..., , km  when r is even,  1, , 1, 2,..., i r j  1 2 r  and  1, 2,..., km  when r is odd, and ijk  has the same distributional assumptions as in (2.1).In what follows we discus in details the case when r is even.The case when r is odd is similar and it will only be presented in the numerical results.
When the population mean  x is known, we have the difference estimator, where,  y and x  respectively, see Samawi et al. (1996).
Therefore, it can easily be shown that Da Y is an unbiased estimator of .(2.9) Then using basic properties of conditional moments, we have the following theorem: Theorem 2.1: Under (2.2) and assuming that the underlying marginals distributions of X and of Y are symmetric, the regression estimator of y  as defined in (2.9) has the following properties: and , ()

 
Proof of Theorem 2.1 (a): Using (2.7) and the proof of (1), we have that and from the proof of (2) above, For the variances of the naïve RSS and ERSS estimators, see for example Samawi et al. (1996) Therefore, the regression method of estimating y  based on ERSS is most preferable if  is large.
Similarly, from (2.11), Ereg Y has a greater precision than RSS Y whenever

Comparisons with SRS Regression Estimator
We consider the relative precision of our proposed ERSS regression estimator relative to the SRS regression estimator.

Comparisons with RSS Regression Estimator
Finally, we consider the relative precision of our proposed ERSS regression estimator relative to the RSS regression estimator, as presented by Yu and Lam (1997).Following, Yu and Lam (1997)

Evaluation of Departure from the Linearity Assumption
Generally, if the assumption of the linear relationship in (2.7) is invalid, the ERSS regression estimator is a biased estimator.In such a case, we define the relative precision to be the ratio of the MSEs of the estimators compared.As in Yu and Lam (1997), we evaluate the performance of the regression estimator under the departure from the linearity assumption by using Plackett's class of bivariate distributions with fixed marginal distribution functions    The reason for choosing this class of bivariate distributions is that it covers the full range of dependence: In general, the relationship between X and Y is not linear.However, their relationship might be close to linear when  is close to 0 or  and their marginal distributions are the same and symmetric if  is close to 0. For a more detailed description of Plackett's distribution and its random generation, see Johnson (1987), (P. 191-197).
First, we fix the set size r to be 4 and 5, and examine m = 1, 4, 8. Five types of dependence from strongly negative to strongly positive corresponding to  = 0.05, 0.3, 1, 3, 10, and two marginal distributions, normal (  , 1), uniform (0,1), are considered here.Table 2.3 gives the relative precision of the ERSS regression estimator relative to the ERSS naive estimator based on simulations of size 100,000.
The main conclusions from Table 2.3 are: 1. Clearly, if both X and Y have symmetric marginal distributions and  is 0.05 or 10, the ERSS regression estimator is superior to the ERSS naive estimator since the Plackett's distribution in these cases is close to a bivariate distribution with linearly related marginal.
2. The efficiency decreases as the value of  increases from 0.05 to 1, and starts to increase as  increases from 1 to 10 for any given value of m and for r = 4 and 5.

3.
For any fix  and any value of r, we note that as m increases the efficiency increases.
In general when  is close to 1, the performance of the ERSS regression estimator is poor.This may be due to the fact that when  is close to 1, the two variables X and Y are independent.

 is Unknown
In this Section, we discuss how to obtain the extreme ranked set sample regression estimator by using the method of double sampling (or two-phase sampling), when  x is unknown.

The regression estimators
where,  is as in (2.9) and n mr  .
Again, using basic properties of conditional moments, we have the following theorem.
, where,   then by the proof of part (1) of Theorem 2.1, we have that Since ERSSa X is an unbiased estimator for x  (under the symmetry assumption, see Samawi et al. (1996)) and X is also an unbiased estimators for x  , then and hence Eds Y is an unbiased estimator of y  .
Proof of Theorem 3.1 (B): Similar to the proof of Theorem 2.1, From (1) we know that x 2 1 1 .

ERSS yy
and the relative precision of

Numerical Comparison
Assuming that   , XY has a bivariate normal distribution, we compute various expressions for the relative efficiencies obtained in the previous section.The set sizes examined are r = 4, 5, 6, 7 and 8 with cycles of m = 1, 4, 8 and  .A simulation size of 100,000 is used to evaluate the values of . In the case of double sampling, note that the relative precision is less than the relative precision of the case when x  is known.This is due to the extra variation introduced when estimating the mean x  .The main conclusions from Table 3.1 are: 1.When ranking is done on the variable X , the relative precision is best at 0   .The efficiency increases as the value of  decreases from .99 to 0.

2.
For a fixed value of the set size, r , we note that as m increases the efficiency converges rapidly to 1.

3.
The efficiency decreases with increasing set size   r , for any given value of m .4. For a given value of r , there is no change in the efficiency when the cycle is repeated more than 8, (Efficiency stability).This may be due to the fact that when the sample size is large enough to represent the population, the ranking has less impact on the regression estimator.5.The double sampling ERSS regression estimator is always superior to the double sampling SRS regression estimator no mater how large the correlation coefficient,  is.
Table 3.2 presents the relative precision under the assumption of an underlying bivariate normal distribution.Again, the table shows that the relative precisions are all at least 1.We also note that the double sampling ERSS regression estimator is always slightly better than the double sampling RSS regression estimator no mater how large the correlation coefficient,  is.

Application to Bilirubin level in Jaundice Babies
We illustrate the methods discussed above using real data on bilirubin level in jaundice babies who stay in neonatal intensive care.Hyper Bilirubinemia is defined as a total serum Bilirubin above 1.5 mg/dl while neonatal jaundice is defined as yellowish discoloration of skin and sclera and it occurs if Bilirubin level is more than 5 mg/dl.(see Nelson et al., 1994).Jaundice is observed during the first week of life in approximately 60% of term infants (from 37 to less than 42 completed weeks) and 80% of pre-term infants (less than 37 completed weeks) (see Nelson et al., 1994).
Neonatal jaundice is a common problem in full-term infants (42 completed weeks or more (294 days or more)) and pre-term babies.It is possible that the generally accepted levels are too high and may produce some high tone hearing loss.Most experts accept that 18.82 mg/dl to 20 mg/dl should not be exceeded in full-term babies, who are less than three days of age, but that a mature baby can tolerate levels of up to 21.18 mg/dl or 22.35 mg/dl by the fifth day without evidence of damage.Pre-mature babies are probably more susceptible and 17.64mg/dl should not be exceeded.Since most cases of neonatal jaundice appear on the second day of life and most of normal newborn babies leave the hospital after 24 hours of life, our primary concern will be on babies staying in neonatal intensive care.
Physicians are interested in jaundice because of its importance and risk on hearing, brain and death.It will be really helpful to the physicians if we can estimate the populations mean of the amount of Bilirubin in the blood for jaundice pre-term, mature, and full term babies.However, estimating the population mean can be expensive and time consuming.Therefore, there is a need for a sampling scheme which can give more accurate population mean estimates with a smaller sample size, and hence results in saving money and time.
All babies who appear significantly jaundiced on clinical examination should have their plasma Bilirubin estimated.This is done in a laboratory test that needs about half an hour or more to find the level of Bilirubin in the blood.This test is expensive and time consuming.However, by using the regression estimator calculated based on extreme rank set sample, we will show that the population mean of plasma Bilirubin for babies who stay in neonatal intensive care, can be estimated with more precision without measuring all units.

Data Collection
The data were collected by Samawi and Al-Sagheer (2001) from five hospitals in Jordan.These hospitals are Al-Qawasmeh Hospital, Prince Rahma Hospital, Irbid Specialty Hospital, Ibin al-Nafies Hospital, and Queen Zein Al-Sharaf Hospital.
The data were limited to deliveries in the first six months of 1997.Herein, we find the population mean estimate for the Bilirubin level for neonatal jaundice.Jaundice is measured by the level of Bilirubin in the blood.This level is determined via a blood test (tsb).The unit of measurement is mg/dl.The test is conducted on neonatal infants twice daily during the period of the neonatal in the intensive care.One hundred and twenty cases are included in the study.The weight at birth is taken as the concomitant variable.
Since ranking on the concomitant variable X (weight) is easier and measuring X is less expensive than ranking and measuring Y (tsb), we will rank on the variable X .

Parameters
The following are the exact population values of the data:  For the data at hand, the naïve estimators are doing better than the regression estimators.This may be due to the fact that the correlation between the weight and TSB is very small.Although this is only an illustration of the computations, the results confirm our earlier conclusions:  

Fx
concomitant variable X , which is usually unknown in practical settings.If x  is unknown, the method of double sampling can be used to obtain an estimate of x  .This involves the drawing of a large random sample of size , n which is used to estimate x  .A sub-sample of sizeis then selected from the n   original ( n ) selected units to study the primary characteristics of Y .Under an Extreme Ranked Set Sampling setting,phase sampling is SRS and the second -.Note that the first be the sample mean of X based on m r 2 observation of X in the first-phase.Clearly, X  is an unbiased estimator for x .If ERSS is the second phase sampling, the double sampling regression estimator of the population mean y  is defined as and SRS sampling methods are used to obtain the samples shown in

Estimator Using SRS
are the means of X and Y respectively, and for a fixed i x  y y  Table 2.1 presents the relative precision when   , XY has a bivariate normal distribution with a correlation coefficient of zero.From the table we see that the relative precision is always greater than 1 when 0   .Since the relative precision as given in (2.12) is independent of ,  the ERSS regression estimator is always superior to the SRS regression estimator, regardless of the value of  .Table 2.1.Relative precision of ERSS regression estimator relative to the SRS regression estimator.

Table 2 .
, since ERSS Y does not utilize any information on the concomitant variable X , it is fair to compare ERSS regression 2 presents the relative precision for a bivariate normal distribution with zero correlation coefficient.The table shows that the relative precision is always greater than 1 when 0   .Since the relative precision given in (2.13) is independent of  , we can again conclude that the ERSS regression estimator is always superior to the RSS regression estimator regardless of the value of  .Table 2.2.Relative precision of ERSS regression estimator relative to the RSS regression estimator  and the parameter  governs the dependence between X and Y .Table 2.3.Relative precision of ERSS regression estimator relative to ERSS naive estimator when the linearity assumption is violated (bold numbers indicate RP < 1).
Theorem 3.1: Assume that the model in (2.7) is satisfied and that the underlying marginals distribution functions of Y and X are symmetric.Then the double sampling regression estimator for y 

Table 3 .
1 shows the relative precision of Eds Y relative to ds Y for an underlying bivariate normal distribution.From the table we see that all the relative precision values are at least 1 indicating again in precision when using ERSS instead of SRS.

Table 3 .
1.The relative precision of double sampling ERSS regression estimator relative to double sampling SRS regression estimator.

Table 3 .
2. The relative precision of double sampling ERSS regression estimator relative to double sampling RSS regression estimator.
Table 4.1.The following results are obtained from the samples: 1) Based on the ERSS sample, the regression estimate is .