Regression Estimator Using Double Ranked Set Sampling

ةصلاخ : انه ثحبي فوس يرادحنلاا ريدقتلل لحارملا ةيئانثلا ةبترملا ةنياعملا ةقيرط ءادأ  ريغتملل يباسحلا طسولا نوكي امدنع فورعم ريغ دعاسملا . ملا قرطو ةيلولأا تلايلحتلا نأ مادختساب رادحنلاا ردقم نأ ىلع تلد ، ةاكاح ةبترملا ةنياعملا ةقيرط مادختساب رادحنلاا تاردقم نم ةيلعاف رثكأ وه لحارملا ةيئانثلا ريغو ةبترملا ةنياعملا قرط نم ٍّ يأ لا ىرخلأا ةبترم . وه ً اضيأو نم يلأ ةيباسحلا طاسولأا مادختساب ةطيسبلا تاردقملا نم ةيلعاف رثكأو لضفأ ةنياعملا قرط وأ ةيداحأ ،ةبترملا ريغو ةبترملا لحارملا ةيئانث .


Introduction
I n many applications, considerable cost savings can be achieved if the number of quantifications is only a small fraction of the number of available units, although all units contribute to the information content of the quantification.Ranked set sampling (RSS) is a method of sampling that can achieve this goal.RSS was first introduced by McIntyre (1952).It is highly powerful and much superior to the standard simple random sampling (SRS) for estimating some population parameters.
RSS can be applied in agricultural, environmental and human populations.For example, the level of bilirubin in the blood of infants can be ranked visually by observing: (i) Color of the face.(ii) Color of the chest.(iii) Color of lower part of the body.(iv) Color of terminal parts of the whole body.As the yellowish goes from (i) to (iv), the level of bilirubin in the blood goes higher (see Samawi and Al-Sakeer 2001).Al-Saleh and Al-Kadiri (2000) showed that the efficiency of estimating the population mean could be improved even more by using double ranked set sampling (DRSS).Also, they proved that ranking in the second stage is easier than in the first stage.Moreover, as a variation of RSS Samawi et al. (1996) investigated extreme ranked set sample (ERSS) and also suggested double extreme ranked set sampling (DERSS) Samawi (2002).More details about RSS can be found in Kaur et al., (1995) and Patil et al. (1999).In this paper, we investigate the performance of DRSS for estimating the population mean using the regression estimator.Theoretical and numerical comparisons with other estimators will be considered.In section 2, notations, definitions and some basic results are introduced.The regression estimator using SRS and, RSS regression estimator (Yu and Lam, 1997) are introduced in section 3. Our proposed regression estimator using DRSS and its properties are given in section 4. In section 5, we illustrate the theory using a set of data representing a real life situation.

Univariate Population
RSS involves selecting random sets each of size from the target population.In the most practical situations, the size will be 2, 3 or 4. Rank each set by a suitable method of ranking, for example, by using prior information or visual inspection.In sampling notation this implies: where ij X denotes the -th observation in the i j -th set and ( ) j i X is the -th ordered statistic in the i j -th set.Only the elements 1(1) 2(2) , ,..., ( ) r r X X n mr = X are quantified i.e. the element with smallest rank from the first set, the second smallest from the second set, and so on until the largest unit from the -th set is measured.This represents one cycle of RSS.We can repeat the whole procedure times to get a RSS of size (Takahasi and Wakimoto, 1968).
2.1.2For bivariate population Samawi and Muttlak (1996) modified the above procedure in the case of bivariate distributions to estimate the population ratio, / Y R χ µ µ = .The procedure is described as follows: First choose independent bivariate elements from a population, with bivariate distribution function .Rank each set with respect to one of the variables Y or 2 r ) y ( , F x X .Suppose ranking is on variable X .Apply the same procedures as in case of univariate population but for each measured unit from the X 's, the associated unit from the Y 's is measured too.This may be repeated times to get a bivariate sample of size n m rm = .In sample notation: The sample { , =1,2,…, ; =1,2,…, } will denote the bivariate RSS.

Double Ranked Samples (Two stage sampling)
As a variation of RSS, Al-Saleh and Al-Kadiri (2000) introduced the DRSS procedure as follows: 1. Identify r elements from the target population and divide these elements randomly 3 into sets each of size elements.r 2 r 2. Apply the usual RSS procedure to each set to obtain r RSS, each of size r .3. Employ again the RSS procedure in Step 2, to obtain the DRSS of size .r 4. We may repeat steps 1-3 m times to obtain a sample of size n rm = .
In sampling notation, after ranking each sample separately in each subset, we get: k=1,2,…,m, where is the i-th ordered observation in the i-th sample of the l-th set in the k-th cycle.Use RSS scheme on each subset separately, to get Then in the second stage, let W = i-th smallest observation in , then {W ,  Kadiri (2000) showed that: ( ) where µ and 2 σ are the mean and the variance of the population, respectively.Also, it was shown that ranking in the second stage is easier than in the first stage.

Regression Estimators Using SRS and RSS
As in ratio estimation, the linear regression estimator is used to increase the precision of estimating the population mean by using extra information in an auxiliary variable X that is correlated with the survey variable Y.When the relation is approximately linear, and the line does not go through the origin, an estimate of the population mean based on the linear regression of Y on X is suggested rather than using the ratio of the two variables.

Let ( , )
i i X Y , i=1,2,…,r, be a bivariate sample from , and assume that ( , ) where x µ and y µ are the means of X and Y respectively, and for fixed i X , the i ε 's , i=1,2,…,r are i.i.d (independent and identically distributed) with mean zero and variance ( ) Consider the case where x µ is unknown.The method of double sampling can be used to obtain an estimate of y µ .This involves drawing of a large random sample of size n′ , which is used to estimate y µ .Then a subsample of size n is selected from the original selected units to study the primary characteristic of Y .Setting , the first and the second-phase samples are simple random samples.Then the double-sampling regression estimator where 1 ) When the underlying distribution of (X, Y) is assumed to be bivariate normal, the regression estimator ds Y is an unbiased estimator for y µ and its variance is given by and Sukhatme, 1970).If the assumption of the linear relationship in (3.1) is invalid, then the SRS regression estimator in (3.2) is in general a biased estimator of y µ .

Regression estimator using RSS
Consider the bivariate RSS.From (3.1) the relationship between Y and Again, when x µ is unknown the method of double sampling (two-phase sampling) can be used to obtain an estimate of x µ .Note that the first-phase sample is a simple random sample and the second-phase sample is a ranked set sample.Then the double-sampling regression estimator Rds Y based on RSS as in Yu and Lam (1997) have given by: where X ′ is the sample mean of X based on the observations of the first phase.Furthermore, using the basic properties of conditional moments, Yu and Lam (1997) showed that 2 r m Rds Y is an unbiased estimator of y µ under (3.4), and the variance is given by: Again, if the assumption of linear relationship is invalid, the RSS regression estimator in (3.5) is in general a biased estimator of y µ .Next we will propose our approach for using a regression estimator for estimating y µ based on DRSS.

DRSS for regression estimator
In the two-phase regression estimator using DRSS, for the k-th cycle, in the first stage r quantified RSS samples each of size are considered.The following will denote the first stage sampling: These sets of quantified observations, of size mr 2 , are used to estimate x µ , the population mean of the variable X, which is assumed to be unknown.In the second stage a bivariate DRSS, of size n=rm, which is Note that, rankings in the second stage on the variable X are based on the exact measures, i.e. perfect ranking.Also, we are not using the mr 3 observation from the first stage to estimate x µ , because we quantified only mr 2 of them and not all the mr 3 observations and this will reduce the cost of the sampling unit in the study.

Regression Estimator of µ y
If W and are, respectively, the i -th smallest value of ( )  ( ) Note that, ( ) Dell and Clutter (1972)).Under DRSS, the regression estimator of the population mean y µ can be defined as where, and RSS X * as above.

Properties of the estimator
Again, using the basic properties of conditional moments and the above results, the following theorem will be proved.Theorem 4.1: Under (4.1) assumptions: (1) ( ) (2) , where .
The prove of this theorem and two required propositions are in the Appendix.

Performance of
Note that these relative precisions are based on the variances of the estimators.

Performance of
(see section 3).Note that these relative precisions are based on the variances of the estimators.Since t is not easy to find, the values of the above expressions simulation is used to calculate them.

Design of the Simulation
A computer simulation is conducted to study the efficiency of the regression estimator.Using SRS, RSS, and DRSS bivariate normal random samples where generated when . The performance of the regression estimators are investigated for r = 4, 5, 6, 7 and 8 and m =1, 4 and 8. Using 5000 replications, estimates of the means and the mean square errors for the regression estimators were computed.
The efficiency of the regression estimator is defined by where i and j represent any type of the above sampling methods.The results for the simulation are in Tables 1 and 2. Notice that ρ takes only high positive values because the regression estimator for the population mean is used only when the correlation between the two variables is high.Also, negative values are not considered since, from (4.6) and (4.7), the relative precision depends on the absolute values of and ρ β since they are squared.

Results of the Simulation
Our simulation (Table 1) shows that the efficiency is affected by the value of ρ.The regression estimator based on DRSS is more efficient than naive estimator using RSS whenever the absolute value of the correlation coefficient between X and Y (ρ) is more that 0.40.Moreover, this efficiency is increasing as the set size or the cycle size increases.Also, the regression estimator based on DRSS is more efficient than the naive estimator using DRSS whenever |ρ|>0.90.However, when |ρ|<0.98, the efficiency decreased as the set size increased and increased otherwise.Moreover, in this case the efficiency is not affected by the cycle size.
Table 2 shows that the double sampling regression estimator using DRSS was always superior to the double sampling regression estimators using SRS and RSS.However, the efficiency was affected by the value of ρ.The efficiency increased by increasing the value of ρ.Also, the efficiency decreases with increasing the set or the cycle size for small values of ρ.However, Re gD Y was still found to be more efficient than using other sampling methods.

Applications to Real Data Set
We illustrate the double ranked set sample mean estimation procedure using a real data set which consists of the height (Y) and the diameter (X) at breast height of 399 trees.See Platt et al. (1988) for a detailed description of the data set.The summary statistics for the data are reported in Table 3.Note that the correlation coefficient ρ = 0.908.Population size N = 399 and the correlation coefficient between X and Y is ρ = 0.908.
Using a set size r=3 and the cycle size m=3, we draw bivariate SRS and DRSS, of size 9. Table 4 contains all the above proposed estimators and their estimated variances using the drawn samples.Although, Table 3 confirms our simulation results.It should be emphasized that the example is used as an illustration of the applicability of our proposed estimators.

Conclusions
In conclusion DRSS regression estimator is to be used to improve the population mean estimation whenever DRSS is possible to be conducted.

Note that since and
[ ] ( ) ( ) ( ) Next we prove Theorem 4.1 Proof of Theorem 4.1: (1) Hence Re gD Y is an unbiased estimator of y µ . ( From (1), we have ( ) Then, , where , , 1 , , 1 1 from the second stage of DRSS), and the corresponding value of Y obtained from the th sample in the th set, then from ( as in (3.1).Let RSS X * be the sample mean based on the r RSS samples of size , i.e., 2 r m

Table 1 :
The efficiency of Re gD Y with respect to the naive estimators based on RSS and DRSS

Table 2 :
The efficiency of Re gD Y with respect to the regression estimators based on SRS and RSS.

Table 3 :
Summary Statistics of trees data.

Table 4 :
Results from the drawn samples