## N

Ln(8;w)<X \aP{i„\ z„, 8) + ln w0 — Nlnp0, (2.21)

follows from the regularity conditions assumed for the probability functions (see assumptions 2.2 and 2.4 in appendix 2.26). In equation (2.21), w0 is the smallest component of w, and p0 is a positive lower bound on the probabilities P(i j z, 0). It follows that there is a maximum in int W. Since Ln(0; w) is continuous and diflferentiable for w e int W, the maximum is given by a solution of the equations for a stationary point14

At any solution of equation (2.22) the matrix of second derivatives d2LNfdwndwm is negative definite when restricted to W, that is, every stationary point is a maximum. Because of the bound (2.21), which tends to — oo at the boundaries of W, there cannot be two (or more) maxima in int W without an intervening saddle point; thus there is only one maximum. As a result the required maximum in w is given by a unique solution of equation (2.22). Making the substitution

771=1

we obtain the concentrated likelihood function

14. Since w) is homogeneous in w of degree zero, the additional constraint £„w„

= 1 does not affect the first-order conditions in equation (2.22).

where sn is the subsample containing case n. In equation (2.24), the weight factors X are the solution of the constraint equations ft _ * s ^ (225)

f=i for 5 = 1, . . . , S (obtained by substituting for wn from equation 2.22 into equation 2.23), together with the normalization condition

(obtained by substituting for wn from equation 2.22 in the condition = 1). The weight factors w have now disappeared from the problem. Because equation (2.22) has a unique solution for w e W, it follows that equation (2.25) likewise has a unique solution for k e Ae, where A0 is the set of weight factors k > 0 that also satisfy equation (2.26).

This can be reformulated in a much more convenient form, as follows. We maximize the "pseudolikelihood" function

over k e A0, where k is now considered as a vector of M independent variables, rather than a function of 0. This equivalence follows from the fact that the first-order conditions for a stationary point of LN(Q, k) are the same as equation (2.25), and the matrix of second derivatives ¿>2LN(0, k)/dk{s)dk{t) is negative definite at any stationary point when restricted to k e A0. Thus k) has a unique maximum in k e Ae, at which point it is equal to the concentrated likelihood of equation (2.24), apart from a constant term independent of 0. Note that the number of weight factors is now M (the number of alternatives) instead of N (the number of observations).

Maximum likelihood estimation for a choice-based sample therefore reduces to the problem of finding 0N and XN, such that

eeG.JleAe where the pseudolikelihood LN(Q, A) is given by equation (2.27). LN (0, A) is called a pseudolikelihood because in general it is not equal to the likelihood Ln (0; w); the only equality that holds between them is max Ln (8; w) = max LN (0, A).

The subsidiary condition A e \9 is inconvenient in that the normalization condition, equation (2.26), depends on 0. But since LN (6, A) is homogeneous of degree zero in A, the normalization condition has no effect on the maximization problem. In practice therefore, one can impose an arbitrary normalization. A convenient normalization is to fix a weight factor, say, A(5") = Hs, and then maximize over

If only estimates of 8 are required, this is all that is needed. If estimates of the aggregate shares Qs are also wanted, then the weight factors XN have to be rescaled by a factor icN to satisfy the normalization condition, equation (2.26); see section 2.13.

2.12 Asymptotic Properties of the Unconstrained Estimator

If the exogenous space Z is discrete with a finite set of values, then 8^ as given by equation (2.28) is the classical maximum likelihood estimator, and its consistency is assured by assumptions 2.1 through 2.5 given in appendix 2.26. In fact, even ifZ consists of a countable (rather than finite) discrete set of points, the results of Kiefer and Wolfowitz (1956) establish consistency of Sjv Since a continuous distribution can be approximated arbitrarily well by a discrete distribution, and since the pseudolikelihood (2.27) is a function only of the observations and of the parameters of the choice •model, this suggests that the result must be valid also for Z continuous. However, the usual proofs of consistency of the maximum likelihood estimator require assumptions which, even though of very general applicability, do not hold in the present case. In particular, note that while the estimated empirical distribution of z converges weakly to the true distribution, the pseudolikelihood in equation (2.27) does not converge to the expectation of the true likelihood.

It is therefore necessary to establish directly the consistency of estimators obtained from equation (2.28). The proof follows a method due to Manski and Lerman (1977), and used by them to prove consistency of the weighted exogenous sample maximum likelihood estimator for choice-based sampling.15 A few technical modifications are needed to apply the proof here (for details see Cosslett 1978, 1981), One finds that

This provides an interpretation of the parameters A, that is, the weights XN are estimates of the ratios of the sample choice proportions to the population choice proportions. The weights A may thus be viewed as correction factors, applied to the probabilities that hold for random sampling. With the normalization condition A(S) = Hs, we also have