2018-03-18
Random Stuff for Monte Carlo
A colleague mentioned the problem of users running multiple Monte Carlo jobs
and getting the same results each time due to not setting the seed of the
pseudo-random number generator.I assume you know that
pseudorandom number generators generate a reproducible stream of numbers
starting from some seed.
Here’s a brief collection of information
for avoiding that problem with some standards and implementations I know of,
since I haven’t found one elsewhere.You’re not likely to be doing
monte carlo in bash or awk, for instance, but it may be useful to know about
their random numbers.
Given a method of setting the seed, the question arises ‘from what?’.
Using real time isn’t useful for independent threads or processes
starting together or within the time resolution. Process numbers
may work if you don’t have the same one for any independent threads,
i.e. you want the pid of the task on Linux. Anyhow, at least on
Linux, you might as well use values read from /dev/urandom
if
you can, probably via getrandom(2)
if the language
implementation provides a way to call it.You may need a generator with multiple independent streams for parallel
programs. I can’t vouch for their quality, but one such is
SPRNG, I happen to maintain the Fedora package for
an R interface to
one
from L’Ecuyer et al, and there’s support in
Python NumPy.
In the descriptions below, ‘reproducible’ means the generator has the same seed every time you start a program using it unless you set it somehow.
You’re unlikely to need them, but there are sources of ‘true’ random numbers
available on the net, generated from physical sources.I recall that
Turing’s ACE design included such a source, but I don’t have a reference.
The pseudo-random generators covered here won’t be
cryptographically
secure if that matters — which it obviously doesn’t for Monte Carlo.
Bash
$RANDOM
is non-reproducible; assign to the variable to set the
seed.
C/POSIX
You may want to use a library implementing another generator, but see the man
pages for drand48(3)
and random(3)
. The generator used by
rand(3)
is unspecified, but glibc
uses the same one as
random(3)
.
Common Lisp
random
is non-reproducible, and a seed can’t be set. See
CLTL2
for using a saved random state.
Emacs Lisp
random
is non-reproducible, set from some entropy source, or seeded
based on a string argument’s contents (see the Elisp manual).
Fortran
Use RANDOM_SEED
, perhaps with a C interface to
getrandom
. The Gfortran manual describes the implementation and
has an example using the getpid
extension.
Gawk (and maybe other awks)
rand()
is reproducible; use srandom()
to seed it.
Go
rand.Seed
sets the seed in the math/rand
package, which is
reproducible.
Haskell
Use getStdRandom
from the
random
package,
though I’ve seen a suggestion that’s slow; there are
other
implementations of Random
.
Julia
See the documentation.
MATLAB
The random stream is reproducible by default. rng sets the seed based on the current time; I don’t know how to do better.
Octave
rand
is non-reproducible by default,See the suggestion in the doc about
getting the same behaviour as MATLAB.
using urandom on Linux, or time
fallbacks otherwise: note the warnings in the manual.
Python
random.seed()
will use
urandom
on Linux.
R
Non-reproducible by default, but only set from the time and process number;
use set.seed
if necessary. Note the warning about low-order bits.
For parallel streams maybe see the
rlecuyer
package (used by pdbR), or
rTRNG
.
Rust
random
seems to be non-reproducible, but see the
documentation.
Scheme
There is no standard random number generator, so consult your implementation.
Scilab
rand
is reproducible. Set a seed with rand("seed",n)
.