01 Jan

# More R Random

## Generation of normals

If you want to generate 200 standard normals, then do:

> xn <- rnorm(200)

You will get different numbers in xn if you do the command again.

There are additional arguments to control the mean and standard deviation.

## Two types of uniform

You can have a distribution that has all numbers in some range to be equally likely — a continuous uniform. Alternatively you can have a distribution that is equally likely for some finite set of objects, such as a range of integers — a discrete uniform.

### Continuous uniform

You can generate 100 numbers that are continuously uniform between 0 and 1 with:

> xcontu <- runif(100)

You will get different numbers in xcontu if you do the command again.

There are additional arguments to change the range.

### Discrete uniform

Use the sample function to generate uniformly from some set of integers (or other types of objects). For example:

> xdiscu <- sample(1:100, 4, replace=TRUE)

selects 4 numbers between 1 and 100, inclusive, with replacement.

You will get different numbers in xdiscu if you do the command again.

You can get a random color from among the named colors with the command:

> sample(colors(), 1)

The prob argument to sample allows you to give different probabilities to the elements of the vector that is being selected from. Thus sample will perform non-uniform sampling as well.

## Random permutations

The sample function also does random permutations. In fact, that is its default behavior:

> xpermute <- sample(x)
> sample(1:9)
[1] 1 9 3 8 5 2 6 4 7

You will get a different order in xpermute if you do the command again. (We are assuming here that x is a vector with more than one element.)

## Seed setting

In all of the commands above, you get different answers as you repeat them. That is pretty much the point of them. However, it can be useful to know that you will get the same answers again even though you are generating random numbers. You can do that by setting the random seed.

In R there is an object called .Random.seed that controls random generation. Once you have generated something random, there will be a .Random.seed object in your global environment. (It doesn’t show up in ls() because the name starts with a dot — you can see such objects by saying: ls(all=TRUE).)

Calls to random functions change the value of .Random.seed. That is, these calls not only return a value, they also have the side effect of changing .Random.seed.

But if the random seed is the same at the start of a call, then the results will be the same. There are two ways of setting the seed: you can save the seed and then assign it, or you can use set.seed

The preferred method is to use set.seed. You can just give a number as the first argument:

> set.seed(123)
> rnorm(4)
[1] -0.56047565 -0.23017749  1.55870831  0.07050839
> rnorm(4)
[1]  0.1292877  1.7150650  0.4609162 -1.2650612
> set.seed(123)
> rnorm(4)
[1] -0.56047565 -0.23017749  1.55870831  0.07050839

## Probability distributions

R has functions for a number of probability distributions. In general, there are four functions for each distribution as shown in Table 1.

Table 1

Function name Description
rxxx random generation
dxxx density function
pxxx cumulative probability function
qxxx quantile function

For example rnorm is the random generation function for the normal distribution. dnorm is the density for the normal. pnorm is the cumulative probability function for the normal — that is, this gives the probability of being less than or equal to a given quantile. qnorm is the quantile function — the inverse of the probability function (that is, it returns a quantile given a probability).

Table 2 shows a few of the distributions that are available in R.

Table 2

Distribution Functions
Uniform runif dunif punif qunif
Normal rnorm dnorm pnorm qnorm
Student’s t rt dt pt qt
F rf df pf qf
Exponential rexp dexp pexp qexp
Log normal rlnorm dlnorm plnorm qlnorm
Beta rbeta dbeta pbeta qbeta
Binomial rbinom dbinom pbinom qbinom
Poisson rpois dpois ppois qpois

You can see a more complete list with the command:

> ??distribution

The ecdf function takes a data vector as an argument and returns a function that is the cumulative probability function of the data.

Many contributed packages contain functions for additional distributions.

## Pseudorandomness

In a certain sense most of what is said on this page is a lie. When you use a function like rnorm or sample, you are not generating randomness at all. These are pseudorandom functions. Technically you are generating chaos when you use them, not randomness. There are two main reasons to use pseudorandomness rather than randomness.

The first is convenience. In the early days of computing there was no way to actually get true random values, so they had to invent pseudorandom methods. Now there is the possibility of using truly random values, but it is generally harder to do and seldom offers an advantage.

The second reason to prefer pseudorandomness is reproducibility. Random numbers (by definition) are not reproducible. A program without reproducible results is a program that can not be debugged.

It is largely accidental that we have pseudorandom functions and not truly random functions. It’s a happy accident.

## Resources

This includes a discussion of probability distributions.