Info

for all p £ [0,1]. For the choice of parameters in our example e = 1/100, and a = .01, the worst-case scenario sample size is n > 40,000.

Sometimes, with particularly computer-intensive applications, instead of asking how many observations are needed to achieve a desired level of accuracy, the researcher will be constrained to ask the, methodologically less satisfactory, "inverse" question, i.e., how accurate are the results given the number of observations used in the Monte Carlo study. The problem of feasibility becomes central in this case.

4. Reproducibility of Monte Carlo Results

Claerbout (see, e.g., Buckheit and Donoho, 1995), has recently championed the issue of reproducibility in the computational sciences. Reproducing computation results from published work often proofs to be a difficult and daunting task. Reproducibility relies on a plethora of implementation details that are difficult to communicate through conventional printed publications. Buckheit and Donoho (1995) point out that in the field of computational experiments:

■ researchers often cannot reproduce their own work, even a few months after the study has been completed,

■ research students have difficulties in presenting their problems to their academic advisers, and

■ researchers cannot reproduce computational results of other researchers and other published work.

Reproducibility implies that, ideally, identical results should be obtainable in a short amount of time, without requiring expensive computational resources, proprietary data, licensed software, and any application-specific knowledge. Moreover, for reproducibility to be of practical use, code and data should be carefully organized and documented.

Schwab et al. (2003) classify their computational problems according to their degree of reproducibility in:

■ Easily reproducible result files can be regenerated within ten minutes on a standard workstation.

■ Non-reproducible result files, such as hand-drawn illustrations or scanned figures, cannot be recalculated by the reader.

■ Conditionally reproducible result files require proprietary data, licensed software, or more than 10 minutes for their re-computation. The author nevertheless supplies a complete set of source files and rules to ensure that readers can reproduce the results if they possess the necessary resources.

Based on this stringent requirements, most computational results in economics and in environmental economics would be classified under the headings of "conditionally reproducible" at best. In a recent investigation, Vinod (2001) found that approximately 70 per cent of articles from prestigious economic journals were not reproducible. He attributed this problem to sloppy record keeping, inaccurate software, and the lack of maintenance of software and data, in particular, after publication.

Environmental Economics does not fare much better in this respect. In my experience, obtaining a datasets and the software code needed to reproduce published work proofs at best to be a difficult task. Recently two attempts from my part of obtaining data and code for the purpose of reproducing published computational results, have failed. The authors blamed a computer virus and a computer crash to explain the loss of data and code. In one instance, I was able to obtain the data used for a paper on a leading journal in environmental economics, but was unable to exactly reproduce the computational results. Similar instances were experienced by the editors of this volume (personal communication), and indeed seem to be frequent in the profession.

Of course, insisting on exact and easily reproducible results is not always practical. In applied work, it is quite frequent that a particular commercial software, dataset, or expensive equipment makes research results difficult to reproduce.

A particularly frequent problem is that in many environmental economic applications data used in published work is considered confidential and not made available. Researchers might, in fact, collect data themselves at a considerable cost, pay other institutions to analyze the data, etc. Other times the data is provided, but it is in a format that makes it difficult to use or is insufficiently documented.

Environmental economics journal do not maintain databases of data and codes of published papers. Typically, a much milder policy is implemented. For instance, consider the Journal of Environmental Economics and Management's policy for replication as stated in their "Guide for Authors." According to the current policy, all data must be clearly documented and computational methods must be explained with sufficient details to enable replication by other researchers. The only requirement concerning the dataset is that it must be made available on request. The findings of Dewald, Thursby, and Anderson (1986) suggests that this type of policy is not adequate to guarantee reproducibility of computer-based results. In their Journal of Money, Credit and

Banking project, they attempted to replicate computation results published or submitted to the journal. Of the 92 authors asked to supply data according to the journal policy, 75 responded, and 68 submitted something. The first 35 datasets were examined and only 7 were judged to be free of problems. The authors attempted to replicate the results of 9 papers for which they had obtained data and software code; only four computational results could be reproduced closely. Based on their findings, Dewald et al. (1986) recommended that journals require the submission of data and programs from authors at the time empirical papers are submitted.

5. Reporting Monte Carlo Results

Results based on Monte Carlo experiments should be reported as carefully as any other scientific experiment. Hoaglin and Andrews (1975) provided a slightly outdated list of items that should accompany any Monte Carlo based result. In principle, any information useful to assess the accuracy of the results and to facilitate their reproduction, should be supplied. As a minimum, taking into account recent development, the study should provide:

■ information on the simulation, including the uniform random number generator used and method used to generate non-uniform variates, which should be fully adequate for the needs of the study,

■ details on any measure employed to reduce variance,

■ a justification for the sample size chosen possibly in terms of standard deviation of the estimates obtained in the study,

■ detailed information of programming languages or software applications used, vendor, version, serial number, alternative platforms on which it runs, etc., and

■ information on the computer used, including details on the CPU, and operating system.5

Geweke (1996) suggests also that any published result should be checked for robustness to the choice of generator. All the items listed above provide information to help assess the accuracy of the Monte Carlo computer-based results. It is assumed that computations follow the current state of the art. Preference should be given to well-known, good algorithms and software available in the

5It is worth remembering that in the fall of 1994, a serious design flaw was discovered in the Intel Pentium processor, commonly referred to as the "Pentium floating-point-division bug" or "Pentium bug" in short. As a consequence, certain floating-point division operations performed by the Pentium processor produced incorrect results.

public domain. Random number generators not in the public domain that have not being tested before, should be assessed both theoretically and empirically before use (see Section 7).

Typically Monte Carlo results are presented in a tabular form. However, sometimes other forms can convey the results from the Monte Carlo experiments more effectively. For instance, when the distributional characteristics of the sampling distribution of a test statistic are of interest, graphical methods, such as histograms and density estimates, can be used. When a large number of Monte Carlo experiments are performed, other methods, such as estimating a response surface, have been effectively used in the past to summarize the results (see, Davidson and MacKinnon, 1993, for more details on the use of response surfaces in relation to Monte Carlo experiments).

6. Random Number Generation

As we noted, a Monte Carlo method is a controlled statistical experiment executed on a computer using algorithms that produce deterministic, repeating, sequences of computer numbers, referred to as pseudo random numbers, that "appear" as random samples drawn from a known distribution, typically, samples of independent and identically distributed U(0,1) random variables. An algorithm that generates such sequences of pseudo-random numbers is commonly known as a random number generator (RNG).

Many programming languages, adopt the so called linear congruential generator (LCG) introduced by Lehmer (1949). It is obvious that the pseudorandom number sequences produced by such a generator can be considered "random" only in some limited sense. Nonetheless, their imitation of "truly" random behavior is often good enough for our purposes. The LCG is defined by the difference equation:

Xn+i = (aXn + c) mod m, X(0) = X°, n > 0, (6.1)

for a multiplier a, 0 < a < m, shift (or increment) c, 0 < c < m, and a modulus m, 0 < m, all integers. The sequence of pseudo-random numbers Un is determined by equation 6.1 and by the normalization

Un = Xn/m, once the seed, X° is given. See Section 7.1 for more information about these parameters. For a quick "back-of-the-envelope" Monte Carlo experiment, to generate non-uniform variates, a direct application of standard theorems from mathematical statistics (see, e.g., as Hogg, McKean, and Craig, 2005), summarized in Figure 16.2, can be used. For instance, to generate pseudo-random normal numbers is the Box-Muller method. It exploits the fact that given two independent, uniformly distributed random variables, U1 and U2, the random

Figure 16.2. Relationships between the Standard Normal and related distributions.

variables, N1 and N2, obtained from the transformation:

Ni = y/-2 log(Ui) cos(2nU2) N2 = V-2log(Ui)sin(2nU2)

are independent standard normal random variables (see, e.g., as Hogg, McK-ean, and Craig, 2005, pages 290-291). To generate non-uniform variates related to the standard normal the "composition" theorems from mathematical statistics as summarized in Figure 16.2, can be used.

These approaches are simple to implement when feasible, however they are generally very inefficient and should not be used for serious research.

In general, it is preferable to avoid "reinventing the wheel" by re-writing the code implementing a random number generator when well-known good code is already available. Many useful generators are coded in languages such as FORTRAN and C (see, e.g., Gentle, 2003).

Modern statistical and econometric software application provide many useful functions for random number generation. R, for instance, offers a variety of uniform and non-uniform random number generators. The function RNGkind can be used to select among various uniform and normal generators. It also allows user-defined function to be used. For instance, the command

RNGkind( kind = "Knuth-TAOCP", normal.kind = "Box-Muller" )

Table 16.1. R Functions for Random Number Generation.

Name

Distribution

Parameters

Defaults

rbeta

beta

shapel, shape2

-, -

rbinom

Binomial

size, prob

-, -

rcauchy

Cauchy

loc, scale

0, 1

rchisq

chi-square

df

Was this article helpful?

0 0

Post a comment