More R Numbers
This page has the following sections:
Scientific Notation
Two Kinds of Numbers
Numerical Error
Printing
Commas
Special Values
Two Kinds Revisited
Resources
Scientific Notation
A number like 2.3 will be familiar.
A number like 2.3e6 may be confusing to the uninitiated.
This latter number has three parts:
a value
the letter "e"
the number of places to shift the decimal point.
So 2.3e6 is the same as 2300000
(that is, 2.3 million), and 2.3e-4 is the same as .00023.
Some languages allow the "e" to be a "d".
R does not.
Two Kinds of Numbers
R has two types of number:
integer
floating point
Most of the time you don't care.
R will automatically move between the two types as the situation demands
in almost all cases.
Types of numbers only become important at the point that your expectations
are violated.
It is almost surely the case that it is your expectations and not R that
needs to change -- we'll get to that shortly.
An integer in R is pretty much like the integers in your head -- whole
numbers that can be positive or negative.
A difference is that integers in R only go to about plus or minus two billion.
A floating point number is sort of like the real numbers in your head.
Here's the rub: floating point numbers are significantly different than
the real numbers in your head.
Floating point numbers are in scientific notation -- they have a value part
and a part that says how far to move the "decimal" point.
The "decimal" is in quotes because it really should be "binary point".
Computers think in binary.
A floating point number has a value written in binary, and an indication
of how many binary places to move over (by the way, that's where "floating"
comes in).
There are two important things to remember about floating point numbers:
they are limited in the number of real numbers they represent.
the numbers they do represent are binary.
Let's look at some examples.
Floating point numbers can exactly represent one-half and one-quarter.
Floating points do not exactly represent one-third or one-tenth.
Numerical Error
Table 1
| Item |
Fraction |
Percent |
| A |
2/7 |
29 |
| B |
2/7 |
29 |
| C |
3/7 |
43 |
| total |
1 |
101 |
Table 1 exhibits numerical error in the familiar world.
The percentages don't really add up to 101%, but that is as close as
we can come when we are only using whole percentages.
One-seventh is not an even percentage.
There is error.
But not error in the sense of a mistake.
There is error in the sense of inaccuracy.
A common message to R-help -- probably
the
most common -- goes something like:
R is wrong because
> seq(0, 1, by=.1)[4] == .3
[1] FALSE
To expect that result to be
TRUE is absolutely the same thing
as always expecting the total percentage in cases like Table 1 to be 100.
Do not expect that.
What you should expect is that you get numerical error whenever
there are operations with floating point numbers.
Printing
Part of the problem with the sequence example is that R is too nice.
R prints the expected result:
> seq(0, 1, by=.1)[4]
[1] 0.3
R, by default, prints only some of the decimal places that it has available.
This makes a lot of numerical error invisible.
The advantage is simpler looking objects.
The disadvantage is that you tend to forget about numerical error.
We can look at what R really thinks the answer is by printing all of
the digits that it has.
On one system this is:
> print(seq(0, 1, by=.1)[4], digits=20)
[1] 0.30000000000000004
On another it is:
> print(seq(0, 1, by=.1)[4], digits=20)
[1] 0.3
In the second case you are probably expecting the equality test
to be
TRUE.
Do not expect that.
Remember that we are translating from binary to decimal.
There is more than one binary number that translates to 0.3.
We can see how far apart the numbers are by subtracting:
> seq(0, 1, by=.1)[4] - 0.3
[1] 5.551115e-17
Commas
If you want to represent numbers using a comma to mark the decimal point,
then you can use
format with the
decimal.mark argument:
> format(seq(0, 1, by=.5), decimal.mark=",")
[1] "0,0" "0,5" "1,0"
A more general way of getting commas as the decimal mark is to set
the
OutDec option:
> options(OutDec=",")
> seq(0, 1, by=.5)
[1] 0,0 0,5 1,0
See the
options help file for more information.
If you want to go the other way, then the
sub function works:
> nums <- format(seq(0, 1, by=.5), decimal.mark=",")
> as.numeric(sub(",", ".", nums))
[1] 0.0 0.5 1.0
Alternatively if you have commas in your numbers that are marking
orders of magnitude, then use
gsub:
> as.numeric(gsub(",", "", "2,300,000"))
[1] 2300000
The difference between
sub and
gsub is that
sub
only does one substitution (per element) while
gsub does all of
the substitutions possible.
Special Values
R has plus and minus infinity as numbers.
We can see once again that floating point numbers are substantially
different from the real numbers in our heads:
> 1e308
[1] 1e+308
> 1e308 * 10
[1] Inf
Infinity starts just after the largest possible floating point number.
Another special value is
NaN which means Not-a-Number.
This is created from operations that have no definite mathematical limit.
Here are two examples:
> 0/0
[1] NaN
> Inf - Inf
[1] NaN
Two Kinds Revisited
Remember there are two types of numbers in R -- floating point and integer.
Let's experiment some:
> mode(1)
[1] "numeric"
typeof(1)
[1] "double"
> mode(1:2)
[1] "numeric"
> typeof(1:2)
[1] "integer"
Almost always numbers are floating point.
In particular, they are double precision floating point numbers.
The colon operator used with logically integer values is one of the
few places where the result is integer by default.
Remember that
is.integer is testing the storage type of the
object, it is not testing if the values are logically integer:
> is.integer(1)
[1] FALSE
> is.integer(1:2)
[1] TRUE
> is.integer(as.integer(1))
[1] TRUE
> is.integer(as.double(1:2))
[1] FALSE
We just saw that
1 is interpreted as a floating point number
rather than an integer.
Put capital "L" (as in "long") after a number to make R create it as an integer.
> is.integer(1L)
[1] TRUE
Resources
Circle 1 of
The R Inferno
contains more on numerical error.
Back to top level of
Some hints for the R beginner
Back to
More R Key Objects
First Version: 2010 July 11
Last Modified: 2010 July 25
Direct access to this page is via
http://www.burns-stat.com/pages/Tutor/more_R_numbers.html