Monday, June 4, 2012

The importance of knowing your random number generator

I recently gained access to the Stokes computing cluster at UCF and ran some long-time simulations that I had written in Matlab. These were not in parallel, but rather utilized each node to run a separate simulation. Each simulation took a few days to run.

I was a bit irritated to find the output from each simulation was exactly the same, despite having run on separate nodes. The reason, I discovered, is that Matlab will generate the same string of random numbers every time you start a new instance.

I tried this experiment with Matlab R2011a. After opening a fresh instance of Matlab, I entered and received the following in the Command Window:

>> randn(1,5)

ans =

    0.5377    1.8339   -2.2588    0.8622    0.3188


I then closed Matlab and reopened it. Entering the same command gives the same output as before:

>> randn(1,5)

ans =

    0.5377    1.8339   -2.2588    0.8622    0.3188


Oh, boy. I performed the same procedure on a different model of computer in our lab and received the same output. I also tried this with Python 2.7 and Numpy's random.rand() method. The output is different each time I start an iPython session, so the seed must be automatically shuffled at the start.

Lesson learned. If you're working with random numbers in Matlab (at least with R2011a), be sure to randomize the seed with the command rng shuffle. Now to go redo a few week's worth of work...