2018-03-18 random numbers, R, Python, MATLAB, Octave, Haskell, Fortran, C, bash, awk, programming

Random Stuff for Monte Carlo

A colleague mentioned the problem of users running multiple Monte Carlo jobs and getting the same results each time due to not setting the seed of the pseudo-random number generator.I assume you know that pseudorandom number generators generate a reproducible stream of numbers starting from some seed.

Here’s a brief collection of information for avoiding that problem with some standards and implementations I know of, since I haven’t found one elsewhere.You’re not likely to be doing monte carlo in bash or awk, for instance, but it may be useful to know about their random numbers.

Given a method of setting the seed, the question arises ‘from what?’. Using real time isn’t useful for independent threads or processes starting together or within the time resolution. Process numbers may work if you don’t have the same one for any independent threads, i.e. you want the pid of the task on Linux. Anyhow, at least on Linux, you might as well use values read from /dev/urandom if you can, probably via getrandom(2) if the language implementation provides a way to call it.You may need a generator with multiple independent streams for parallel programs. I can’t vouch for their quality, but one such is SPRNG, I happen to maintain the Fedora package for an R interface to one from L’Ecuyer et al, and there’s support in Python NumPy.

In the descriptions below, ‘reproducible’ means the generator has the same seed every time you start a program using it unless you set it somehow.

You’re unlikely to need them, but there are sources of ‘true’ random numbers available on the net, generated from physical sources.I recall that Turing’s ACE design included such a source, but I don’t have a reference.

The pseudo-random generators covered here won’t be cryptographically secure if that matters — which it obviously doesn’t for Monte Carlo.

Bash

$RANDOM is non-reproducible; assign to the variable to set the seed.

C/POSIX

You may want to use a library implementing another generator, but see the man pages for drand48(3) and random(3). The generator used by rand(3) is unspecified, but glibc uses the same one as random(3).

Common Lisp

random is non-reproducible, and a seed can’t be set. See CLTL2 for using a saved random state.

Emacs Lisp

random is non-reproducible, set from some entropy source, or seeded based on a string argument’s contents (see the Elisp manual).

Fortran

Use RANDOM_SEED, perhaps with a C interface to getrandom. The Gfortran manual describes the implementation and has an example using the getpid extension.

Gawk (and maybe other awks)

rand() is reproducible; use srandom() to seed it.

Go

rand.Seed sets the seed in the math/rand package, which is reproducible.

Haskell

Use getStdRandom from the random package, though I’ve seen a suggestion that’s slow; there are other implementations of Random.

Julia

See the documentation.

MATLAB

The random stream is reproducible by default. rng sets the seed based on the current time; I don’t know how to do better.

Octave

rand is non-reproducible by default,See the suggestion in the doc about getting the same behaviour as MATLAB.

using urandom on Linux, or time fallbacks otherwise: note the warnings in the manual.

Python

random.seed() will use urandom on Linux.

R

Non-reproducible by default, but only set from the time and process number; use set.seed if necessary. Note the warning about low-order bits. For parallel streams maybe see the rlecuyer package (used by pdbR), or rTRNG.

Rust

random seems to be non-reproducible, but see the documentation.

Scheme

There is no standard random number generator, so consult your implementation.

Scilab

rand is reproducible. Set a seed with rand("seed",n).