More R Numbers
This page has the following sections:
Two Kinds of Numbers
Two Kinds Revisited
A number like
2.3 will be familiar. A number like
2.3e6 may be confusing to the uninitiated.
This latter number has three parts:
- a value
- the letter “e”
- the number of places to shift the decimal point.
2.3e6 is the same as
2300000 (that is, 2.3 million), and
2.3e-4 is the same as
Some languages allow the “e” to be a “d”. R does not.
Two Kinds of Numbers
R has two types of number:
- floating point
Most of the time you don’t care.
R will automatically move between the two types as the situation demands in almost all cases. Types of numbers only become important at the point that your expectations are violated. It is almost surely the case that it is your expectations and not R that needs to change — we’ll get to that shortly.
An integer in R is pretty much like the integers in your head — whole numbers that can be positive or negative. A difference is that integers in R only go to about plus or minus two billion.
A floating point number is sort of like the real numbers in your head. Here’s the rub: floating point numbers are significantly different than the real numbers in your head.
Floating point numbers are in scientific notation — they have a value part and a part that says how far to move the “decimal” point. The “decimal” is in quotes because it really should be “binary point”. Computers think in binary.
A floating point number has a value written in binary, and an indication of how many binary places to move over (by the way, that’s where “floating” comes in).
There are two important things to remember about floating point numbers:
- they are limited in the number of real numbers they represent.
- the numbers they do represent are binary.
Let’s look at some examples. Floating point numbers can exactly represent one-half and one-quarter. Floating points do not exactly represent one-third or one-tenth.
Table 1 exhibits numerical error in the familiar world. The percentages don’t really add up to 101%, but that is as close as we can come when we are only using whole percentages. One-seventh is not an even percentage.
There is error. But not error in the sense of a mistake. There is error in the sense of inaccuracy.
A common message to R-help — probably the most common — goes something like:
R is wrong because
> seq(0, 1, by=.1) == .3  FALSE
To expect that result to be
TRUE is absolutely the same thing as always expecting the total percentage in cases like Table 1 to be 100.
Do not expect that.
What you should expect is that you get numerical error whenever there are operations with floating point numbers.
Part of the problem with the sequence example is that R is too nice. R prints the expected result:
> seq(0, 1, by=.1)  0.3
R, by default, prints only some of the decimal places that it has available. This makes a lot of numerical error invisible. The advantage is simpler looking objects. The disadvantage is that you tend to forget about numerical error.
We can look at what R really thinks the answer is by printing all of the digits that it has. On one system this is:
> print(seq(0, 1, by=.1), digits=20)  0.30000000000000004
On another it is:
> print(seq(0, 1, by=.1), digits=20)  0.3
In the second case you are probably expecting the equality test to be
TRUE. Do not expect that. Remember that we are translating from binary to decimal. There is more than one binary number that translates to 0.3.
We can see how far apart the numbers are by subtracting:
> seq(0, 1, by=.1) - 0.3  5.551115e-17
If you want to represent numbers using a comma to mark the decimal point, then you can use
format with the
> format(seq(0, 1, by=.5), decimal.mark=",")  "0,0" "0,5" "1,0"
A more general way of getting commas as the decimal mark is to set the
> options(OutDec=",") > seq(0, 1, by=.5)  0,0 0,5 1,0
options help file for more information.
If you want to go the other way, then the
sub function works:
> nums <- format(seq(0, 1, by=.5), decimal.mark=",") > as.numeric(sub(",", ".", nums))  0.0 0.5 1.0
Alternatively if you have commas in your numbers that are marking orders of magnitude, then use
> as.numeric(gsub(",", "", "2,300,000"))  2300000
The difference between
gsub is that
sub only does one substitution (per element) while
gsub does all of the substitutions possible.
R has plus and minus infinity as numbers. We can see once again that floating point numbers are substantially different from the real numbers in our heads:
> 1e308  1e+308 > 1e308 * 10  Inf
Infinity starts just after the largest possible floating point number.
Another special value is
NaN which means Not-a-Number. This is created from operations that have no definite mathematical limit. Here are two examples:
> 0/0  NaN > Inf - Inf  NaN
Two Kinds Revisited
Remember there are two types of numbers in R — floating point and integer. Let’s experiment some:
> mode(1)  "numeric" typeof(1)  "double" > mode(1:2)  "numeric" > typeof(1:2)  "integer"
Almost always numbers are floating point. In particular, they are double precision floating point numbers.
The colon operator used with logically integer values is one of the few places where the result is integer by default.
is.integer is testing the storage type of the object, it is not testing if the values are logically integer:
> is.integer(1)  FALSE > is.integer(1:2)  TRUE > is.integer(as.integer(1))  TRUE > is.integer(as.double(1:2))  FALSE
We just saw that
1 is interpreted as a floating point number rather than an integer. Put capital “L” (as in “long”) after a number to make R create it as an integer.
> is.integer(1L)  TRUE
Circle 1 of The R Inferno contains more on numerical error.
Back to top level of Impatient R
Back to More R Key Objects