More R Random
This page has the following sections:
Generation of normals
Two types of uniform
Random permutations
Seed setting
Probability distributions
Pseudorandomness
Resources
Generation of normals
If you want to generate 200 standard normals,
then do:
> xn <- rnorm(200)
You will get different numbers in xn if you do the
command again.
There are additional arguments to control the mean and standard
deviation.
Two types of uniform
You can have a distribution that has all numbers in some range
to be equally likely -- a continuous uniform.
Alternatively you can have a distribution that is equally likely for some
finite set of objects, such as a range of integers -- a discrete
uniform.
Continuous uniform
You can generate 100 numbers that are continuously uniform between 0 and 1 with:
> xcontu <- runif(100)
You will get different numbers in xcontu if you do the
command again.
There are additional arguments to change the range.
Discrete uniform
Use the sample function to generate uniformly from some
set of integers (or other types of objects).
For example:
> xdiscu <- sample(1:100, 4, replace=TRUE)
selects 4 numbers between 1 and 100, inclusive, with replacement.
You will get different numbers in xdiscu if you do the
command again.
You can get a random color from among the named colors with the command:
> sample(colors(), 1)
The prob argument to sample allows you to give
different probabilities to the elements of the vector that is being
selected from.
Thus sample will perform non-uniform sampling as well.
Random permutations
The sample function also does random permutations.
In fact, that is its default behavior:
> xpermute <- sample(x)
You will get a different order in xpermute if you do the
command again.
(We are assuming here that x is a vector with more than one element.)
Seed setting
In all of the commands above, you get different answers as you
repeat them.
That is pretty much the point of them.
However, it can be useful to know that you will get the same answers
again even though you are generating random numbers.
You can do that by setting the random seed.
In R there is an object called .Random.seed that controls
random generation.
Once you have generated something random, there will be a
.Random.seed object in your global environment.
(It doesn't show up in ls() because the name starts with a dot.)
Calls to random functions change the value of .Random.seed.
That is, these calls not only return a value, they also have the side
effect of changing .Random.seed.
But if the random seed is the same at the start of a call, then the
results will be the same.
There are two ways of setting the seed: you can save the seed and then
assign it, or you can use set.seed
The preferred method is to use set.seed.
You can just give a number as the first argument:
> set.seed(123)
> rnorm(10)
> rnorm(10)
> set.seed(123)
> rnorm(10)
Probability distributions
R includes functions for a number of probability distributions.
In general, there are four functions for each distribution as shown
in Table 1.
Table 1
| Function name
| Description
|
| rxxx
| random generation
|
| dxxx
| density function
|
| pxxx
| cumulative probability function
|
| qxxx
| quantile function
|
For example
rnorm is the random generation function for
the normal distribution.
dnorm is the density for the normal.
pnorm is the cumulative probability function for the normal -- that is,
this gives the probability of being less than or equal to a given quantile.
qnorm is the quantile function -- the inverse of the probability
function (that is, it returns a quantile given a probability).
Table 2 shows a few of the distributions that are available in R.
Table 2
| Distribution
| Functions
|
| Uniform
| runif dunif punif qunif
|
| Normal
| rnorm dnorm pnorm qnorm
|
| Student's t
| rt dt pt qt
|
| F
| rf df pf qf
|
| Exponential
| rexp dexp pexp qexp
|
| Log normal
| rlnorm dlnorm plnorm qlnorm
|
| Beta
| rbeta dbeta pbeta qbeta
|
| Binomial
| rbinom dbinom pbinom qbinom
|
| Poisson
| rpois dpois ppois qpois
|
You can see a more complete list with the command:
> ??distribution
The
ecdf function takes a data vector as an argument and returns
a function that is the cumulative probability function of the data.
Many contributed packages contain functions for additional distributions.
Pseudorandomness
In a certain sense most of what is said on this page is a lie.
When you use a function like
rnorm or
sample,
you are not generating randomness at all.
These are pseudorandom functions.
Technically you are generating chaos when you use them, not randomness.
There are two main reasons to use pseudorandomness rather than randomness.
The first is convenience.
In the early days of computing there was no way to actually get
true random values, so they had to invent pseudorandom methods.
Now there is the possibility of using truly random values, but
it is generally harder to do and seldom offers an advantage.
The second reason to prefer pseudorandomness is reproducibility.
Random numbers (by definition) are not reproducible.
A program without reproducible results is a program that can not be
debugged.
It is largely accidental that we have pseudorandom functions and
not truly random functions.
It's a happy accident.
Resources
An Introduction to R
This includes a discussion of probability distributions.
Back to top level of
Some hints for the R beginner
First Version: 2010 June 27
Last Modified: 2010 June 27
Direct access to this page is via
http://www.burns-stat.com/pages/Tutor/more_R_random.html