A Generalization of the Hypergeometric Distribution

In this paper we introduce a modification of the hypergeometric distribution that caters for the case when the sampling scheme favours the inclusion of units of one of the two types involved, as opposed to the hypergeometric distribution under which all samples are equally likely. The properties of the resulting distribution, termed the generalized hypergeometric, are studied, including the derivation and numerical assessment of a normal approximation of the distribution.

species.More specifically, denoting by X(t) the number of sites occupied by species A at time t, we assume that: (i) The probability that X(t) increases by one in (t, t+h) is The probability that X(t) decreases by one in (t, t+h) is K X t M -N + X t h+ o h It might help the argument to think, tentatively, of the two species as political parties and the N sites as seats of a parliament that is being continuously updated according to the rules specified in (i), (ii), and (iii).The parameters K 1 and K 2 are then measures of competitiveness of the two parties.

Differential Difference Equation and Equilibrium Solution
Let p r (t) = P( X(t) = r X(0) = r 0 ).It easily follows from the assumptions that p r satisfies the differential difference equation The equilibrium solution is obtained by putting the derivative equal to zero and solving for p r = p r (∞).It is easy to see that it satisfies (2.2) where K=K 1 /K 2 .
(2.3) Putting r = 0,1,..., and making successive substitutions leads to the equilibrium solution (2.4) The constant p 0 can be obtained by noting that the summation of p r over all values of r is equal to 1.This distribution is a generalization of the hypergeometric distribution and can justifiably be called a generalized hypergeometric (gh) distribution.However, the term generalized distribution may mean a different thing in the literature.See for example, Johnson and Kotz (1969), pp.158-60.The hypergeometric distribution corresponds to the special case K = 1.In fact, stressing the dependence on K, p r can be written as That is , the distribution reduces to the hypergeometric distribution.Thus, if K =1, the number of sites occupied by species A is effectively determined by taking a simple random sample of size N from the S+M units in the two species and counting the units that belong to species A. In this case all samples of size N are equally likely.The case K > 1 is more favorable to inclusion of units of type A than the case K = 1.One should expect that situations in which a few sites are occupied by A are less likely, and those in which many sites are occupied by A are more likely, compared to the hypergeometric case.The reverse should be expected when K<1.It is reasonable to interpret the number of sites r occupied by species A as few or many according as r is less than or greater than the average number of sites, µ, occupied by species A. One obvious result to expect is that µ(K) should be an increasing function of K.
We now see how these expectations are met by the gh distribution.
Let G x K ( , ) and H x ( ) denote, respectively, the probability generating functions (pgf) of the gh distribution and the hypergeometric distribution.Thus H x G x ( ) ( , ) = 1 .Now, from (2.5), we have (2.7) Summing over all r gives Likewise, multiplying both sides of (2.7) by r, summing over all r and using the above result gives (2.9) Note also that (2.10) Multiplying both sides by x r and summing over all r expresses the pgf G in terms of H as (2.11) Using (2.11), or multiplying both sides of (2.2) by x r and summing over all r, leads to the following We can now prove the following theorem.
Theorem 1: Keeping the other parameters fixed: Proof: The first assertion follows by differentiating (2.13) with respect to K, noting where the derivative is positive or negative, and expressing the condition in terms of ) (K µ , using (2.9).The assertion concerning µ(K) follows if we can establish that the derivative with respect to K is positive.It follows from (2.9) that Differentiating both sides of (2.11) twice with respect to x and putting x = 1 gives Substituting for µ(K) its expression in (2.9) leads to The essence of the comparison between the gh and the hypergeometric distributions is captured in the following corollary that easily follows from Theorem 1 . Corollary:

Moments of the gh Distribution
It is difficult to express the moments of the gh distribution, including the mean and variance, in simple algebraic functions of the parameters.In this section we develop relations that provide convenient means of evaluating the mean and variance without the need to compute the probabilities in (2.4).Note that putting x = 1 in (2.12) leads to the relationship We need to eliminate σ 2 in order to get a relation involving µ alone.This can be achieved by investigating the variation of the mean with each of the four parameters of the distribution keeping the others fixed.In the case of K this leads to a first order differential equation, while for M, S, and N it leads to recurrence relations.As for other moments, it can be shown, by differentiating (2.12) r times and putting x = 1, that the factorial moments satisfy the relations (3.2) r = 0, 1, 2,...

Variation of the Mean with K
Substituting (2.16) in (3.1) and noting that when K = 1 the mean reduces to that of the corresponding hypergeometric distribution, we get the following theorem.
Theorem 2: The mean of the gh distribution varies with K according to the differential equation It is difficult to solve this differential equation analytically but a numerical solution should be possible.
Once the mean is computed, σ 2 ( ) K can be evaluated from (3.1).Note also, from (2.16), that σ 2 ( ) K can be obtained from the slope of the curve of µ( )

Recurrence Relations for the Mean
First note that, considering the dependence of p r on S, for fixed K, M, and N, we have from (2.4)

Example
Consider a gh distribution with parameters M = 6, S = 5, N = 4, and K = 2.The distribution can be worked out using (2.2).It turned out that it has mean and variance given by µ = 632/275, and 2 σ = 51576/2752.Using relation (3.14), the most feasible in this case, with the given starting value, one can directly verify that µ(1) = 5/8, µ(2) = 28/23, µ(3) = 87/49, and finally, µ(4) = 632/275, in agreement with the correct value.The variance now can be obtained from (3.1).The other relations given by Theorem 3 give exactly the same values.Note that for the latter relations the variance can also be obtained from (3.8), or (3.9), or (3.11) according to the relation that was used to obtain the mean.

Approximating the Mean and Variance
Note that when the minimum of N, S, M is fairly large then µ(S) should not differ much from µ(S-1).
Using this in (3.12), the mean can be expected to be well approximated by the positive root of the equation (3.16) Differentiating (3.16) with respect to K and using (2.16) gives (3.17) Using (3.1) again we get Numerical computation using the recurrence equations derived in section 3 indicates that the error in these approximations does not exceed 1, irrespective of the values of the parameters.Noting that the random variable is integer-valued, the approximations are thus very good.

Maximum Likelihood Estimation
First assume that all parameters of the distribution are known except K.This situation may arise when we suspect that the sampling scheme is favorable to one of the two types concerned and we want to quantify the extent of that.The likelihood function is conveniently expressed in (2.10).Differentiating with respect to K and equating the derivative to zero gives the maximum likelihood estimate (MLE) of K as the solution of the equation Using (2.9), we have the MLE of K as the solution of Using (4.2) with (3.16), we get an approximate expression of the MLE of K as The second situation to consider is when all parameters are known except M. We think of M here as the number of unmarked fish, and we assume that K is known from previous experiences.It is straightforward from (2.4) that Summing over all r gives Using (4.5) in (4.4) we can express the likelihood ratio (LR) as The MLE of M is given as Using (4.9) with (3.16) gives an approximate expression of $ M as where [ x ] is the integer part of x.It is interesting to note that when K = 1 the approximate MLE of M reduces to that given by the hypergeometric distribution.
It is to be noticed from the preceding derivation that the same likelihood equation resulted in the estimation of K and M. If both of them are unknown, it is necessary to take more than one observation from the distribution.

Binomial and Normal Approximations of the Distribution
(5.1) This corresponds to a binomial distribution with parameters N and p = Kλ /(1+Kλ).If N is large the normal approximation of the binomial distribution should be in effect.
A normal approximation of the distribution, not necessarily through the binomial, is also possible.The derivation that follows is essentially due to Dunstan and Reynolds (1981).We have , from (2.2), (5.5) (5.7) It follows that where (5.9) Hence the distribution can be approximated by a normal distribution with mean µ and variance σ 2 as given by (5.4) and (5.9).As the mean of the distribution is well approximated by the solution of (3.16), the approximate mean can be used in place of the mode.
As noticed by Hall (1983), the derivation above is rather vague about the relative sizes of the parameters for which the approximation holds, apart from the requirement that the values of M, S, and N should be large, which is almost always met as far as chemical reactions are concerned.Hall has provided conditions on the parameters under which the distribution converges to normality.The approach we adopt here is to resort to numerical computations.

Numerical Assessment of the Normal Approximation
The analysis is restricted to N S M ≤ , .We considered values of N from as small as 10.The cumulative distribution function corresponding to (2.4) was computed at each possible integral value and compared to the value given by the normal distribution function with mean and variance given by (3.16) and (3.18), respectively.A continuity correction was used.For each set of parameters the maximum difference was recorded.The approximation was taken as satisfactory so long as the maximum absolute difference does not exceed 0.001.Initially, extensive exploratory computations were carried out by varying the parameters N, M, S, and K.It was noticed that the quality of the approximation was closely related to the values of N and µ/N, where µ is the approximate mean given by the positive solution of (3.16).A program was then written to facilitate the investigation of the range of values of µ/N, for each N, under which the approximation is satisfactory.The calculations were run separately for K K < > 1 1 and .It was found that the approximation can be good for values of N as small as 30.The results can be summarized for the two cases of K as follows.

Case of K > 1: (i)
For N ≥ 30 the approximation is found to be satisfactory for µ/N ∈ [0.23, 0.59] and it gets better as K approaches 1.
(ii) For N ≥ 50 also the approximation is found to be satisfactory provided µ/N ∈ [0.15, 0.69] and it gets better as K approaches 1.
(iii) For N ≥ 100 also the approximation is found to be satisfactory provided µ/N ∈ [0.086, 1.0] and for all values of K.

Case of K < 1: (i)
For N ≥ 30 the approximation is found to be satisfactory for µ/N ∈ [0.399, 1.0) and it gets better as K approaches 1.
(ii) For N ≥ 50 also the approximation is found to be satisfactory provided µ/N ∈ [0.282, 1.0) and it gets better as K approaches 1.
(iii) For N ≥ 100 also the approximation is found to be satisfactory provided µ/N ∈ [0.092, 1.0) and for all values of K.
The probability of more than one event in (t, t+h) is o(h), where K 1 and K 2 are positive constants, and o(h) increases with M (see (3.10)), it follows that $ M is given by the integer part of the solution for M of the equation

. 2 )
If the minimum of N, S, and M is large, the mode m defined by