Published On:Tuesday, 3 January 2012
Posted by Muhammad Atif Saeed
Monte Carlo Method
Many practitioners have some intuitive familiarity with the Monte Carlo method from their work. At an elementary level, it is a surprisingly simple concept, but it can be computationally expensive to use. It is easy to code Monte Carlo analyses that take hours or even days to run. To speed up analyses—to make them run in minutes as opposed to days—users need to employ techniques such as variance reduction. These techniques are easy to learn, but they are NOT intuitive. To use them, users need a sophisticated understanding of how and why the Monte Carlo method works. In this article, I introduce the Monte Carlo method from such a standpoint. I also close the article with some recommended books.
Credit for inventing the Monte Carlo method often goes to Stanislaw Ulam, a Polish born mathematician who worked for John von Neumann on the United States' Manhattan Project during World War II. Ulam is primarily known for designing the hydrogen bomb with Edward Teller in 1951. He invented the Monte Carlo method in 1946 while pondering the probabilities of winning a card game of solitaire. Quoted in Eckhardt (1987), Ulam describes the incident as:
The first thoughts and attempts I made to practice [the Monte Carlo Method] were suggested by a question which occurred to me in 1946 as I was convalescing from an illness and playing solitaires. The question was what are the chances that a Canfield solitaire laid out with 52 cards will come out successfully? After spending a lot of time trying to estimate them by pure combinatorial calculations, I wondered whether a more practical method than "abstract thinking" might not be to lay it out say one hundred times and simply observe and count the number of successful plays. This was already possible to envisage with the beginning of the new era of fast computers, and I immediately thought of problems of neutron diffusion and other questions of mathematical physics, and more generally how to change processes described by certain differential equations into an equivalent form interpretable as a succession of random operations. Later
[in 1946, I] described the idea to John von Neumann, and we began to plan actual calculations.
The Monte Carlo method, as it is understood today, encompasses any technique of statistical sampling employed to approximate solutions to quantitative problems. Ulam did not invent statistical sampling. This had been employed to solve quantitative problems before, with physical processes such as dice tosses or card draws being used to generate samples. W. S. Gossett, who published under the pen name "Student," randomly sampled from height and middle finger measurements of 3,000 criminals to simulate two correlated normal distributions. He discusses this methodology in both Student (1908a) and Student (1908b). Ulam's contribution was to recognize the potential for the newly invented electronic computer to automate such sampling. Working with John von Neuman and Nicholas Metropolis, he developed algorithms for computer implementations, as well as exploring means of transforming non-random problems into random forms that would facilitate their solution via statistical sampling. This work transformed statistical sampling from a mathematical curiosity to a formal methodology applicable to a wide variety of problems. It was Metropolis who named the new methodology after the casinos of Monte Carlo. Ulam and Metropolis published the first paper on the Monte Carlo method in 1949.
A standard way to introduce the Monte Carlo method is with the problem of Buffon's needle. This is a charming problem that leads directly to some important (not all intuitive!) conclusions.
More than 200 years before Metropolis coined the name "Monte Carlo method," George Louis Leclerc, Comte de Buffon, proposed the following problem. If a needle of length l is dropped at random on the middle of a horizontal surface ruled with parallel lines a distance d > l apart, what is the probability that the needle will cross one of the lines?
To solve the problem, consider Exhibit 1.
Problem of Buffon's Needle
Exhibit 1 Buffon asked what was the probability that the needle would fall across one of the lines, marked here in green. That outcome will occur only if A < .
The positioning of the needle relative to nearby lines can be described with a random vector which has components and . The random vector (A, ) is uniformly distributed on the region [0,d) [0,). Accordingly, it has probability density function . The probability that the needle will cross one of the lines is given by the integral
[1] |
This integral is easily solved to obtain the probability as
[2] |
In this solution, the great mathematician Laplace perceived an amusing, if inefficient means of approximating the number . Suppose Buffon's experiment is performed with the needle being dropped n times. Let M be the random variable for the number of times the needle crosses a line, then
[3] |
where E indicated an expectation. Note that formulas [2] and [3] represent the same probability. Setting them equal to each other and rearranging, we obtain an expression for :
[4] |
This means that
[5] |
is a statistical estimator for . If we drop the needle n times, and it crosses a line m times, we can substitute the result m for the random variable M in [5]. The number we obtain will be an estimate for .
In 1864, Captain O. C. Fox performed this experiment three times to occupy himself while recovering from wounds. His results are shown in Exhibit 2:
Results of Fox's Needle Dropping Experiments Exhibit 2 | |||||
n | m | l inches | d inches | surface | estimate |
500 | 236 | 3 | 4 | stationary | 3.1780 |
530 | 253 | 3 | 4 | rotated | 3.1423 |
590 | 939 | 5 | 2 | rotated | 3.1416 |
Results of Fox's three needle-dropping experiments are indicated. In the second two experiments, Fox rotated the surface between drops. |
Fox's results illustrate two important issues for the Monte Carlo method.
After obtaining a poor approximation for in his first experiment, Fox modified his subsequent experiments by applying a slight rotary motion to the ruled surface between drops. He did this to eliminate any bias arising from his position while dropping the needle. His approximations in subsequent experiments were improved. This same need to eliminate bias exists today with computer implementations of the Monte Carlo method, although such biases now arise from inappropriate pseudorandom number generators instead of posture.
In his third experiment, Fox used a five inch needle and made the lines on the surface just two inches apart. Now, the needle could cross as many as three lines in a single toss. In 590 tosses, Fox obtained 939 line crossings. Doing so entailed little more effort than his first two experiments, but it yielded a better approximation for . Fox's innovation was a precursor of today's variance reduction techniques.
This example also illustrates another point. Many practitioners who first experience the Monte Carlo method on the job see it being used to solve inherently probabilistic problems, such as derivatives pricing or weather forecasting. They assume the use of random sampling makes it applicable only to probabilistic problems. As our example illustrates, this is not true. There is nothing probabilistic about estimating the number !
To understand the Monte Carlo method theoretically, it is useful to think of it as a general technique of numerical integration. It can be shown, at least in a trivial sense, that every application of the Monte Carlo method can be represented as a definite integral. As one example, consider our needle dropping experiment. It is nothing more than an elaborate means of estimating the integral [1].
Suppose we need to evaluate a multi-dimensional definite integral of the form:
[6] |
Most integrals can be converted to this form with a suitable change of variables, so we can consider this to be a general application suitable for the Monte Carlo method.
The integral represents a non-random problem, but the Monte Carlo method approximates a solution by introducing a random vector U that is uniformly distributed on the region of integration. Applying the function f to U, we obtain a random variable f(U). This has expectation:
[7] |
where is the probability density function of U. Because equals 1 on the region of integration, [7] becomes
[8] |
Comparing [6] and [8], we obtain a probabilistic expression for the integral :
[9] |
so random variable f(U) has mean and some standard deviation . We define
[10] |
as an unbiased estimator for with standard error . This is a little unconventional, since [10] is an estimator that depends upon a sample {U[1]}of size one, but it is a valid estimator nonetheless.
To estimate with a standard error lower than , let's generalize our estimator to accommodate a larger sample {U[1], U [2],
, U [m]}. Applying the function f to each of these yields m independent and identically distributed (IID) random variables f(U [1]), f(U [2]),
, f(U [m]), each with expectation and standard deviation . The generalization of [10]
[11] |
is an unbiased estimator for with standard error:
[12] |
If we have a realization {u[1], u[2],
, u[m]} for our sample, we may estimate as
[13] |
We call [11] the crude Monte Carlo estimator. Formula [12] for its standard error is important for two reasons. First, it tells us that the standard error of a Monte Carlo analysis decreases with the square root of the sample size. If we quadruple the number of realizations used, we will half the standard error. Second, standard error does not depend upon the dimensionality of the integral [6]. Most techniques of numerical integration—such as the trapezoidal rule or Simpson's method—suffer from the curse of dimensionality. When generalized to multiple dimensions, the number of computations required to apply them increases exponentially with the dimensionality of the integral. For this reason, such methods cannot be applied to integrals of more than a few dimensions. The Monte Carlo method does not suffer from the curse of dimensionality. It is as applicable to a 1000-dimensional integral as it is to a one-dimensional integral.
While increasing the sample size is one technique for reducing the standard error of a Monte Carlo analysis, doing so can be computationally expensive. A better solution is to employ some technique of variance reduction. These techniques incorporate additional information about the analysis directly into the estimator. This allows them to make the Monte Carlo estimator more deterministic, and hence have a lower standard error.
Standard techniques of variance reduction include
importance sampling, and
Let's describe the method of control variates.
Consider crude Monte Carlo estimator:
[14] |
Let be a real valued function for which the mean
[15] |
is known. We shall refer to the random variable (U) as a control variate. Consider the random variable f*(U) based upon this control variate:
[16] |
for some constant c. By construction,
[17] |
so we can estimate with the Monte Carlo estimator
[18] |
This will have a lower standard error than crude estimator [14] if the standard deviation * of is smaller than the standard deviation of f(U). This will happen if (U) has a high correlation with the random variable f(U), in which case random variables c(U) and f(U) will tend to offset each other in [18]. We formalize this observation by calculating
[19] | ||
[20] | ||
[21] |
where is the standard deviation of (U). Obviously, * will be smaller than if
[22] |
It can be shown that * is minimized by setting
[23] |
in which case, from [21],
[24] |
Often, and are unknown, which makes determining the optimal value [24] for c problematic. We can estimate and with a separate Monte Carlo analysis. Alternatively, if closely approximates f, c might simply be set equal to 1.
As the above description indicates, the key to the method of control variates is finding a function that closely approximates f, and for which E[(U)] is easy to calculate.