02 Jan

More R Numbers

This page has the following sections:

Scientific Notation
Two Kinds of Numbers
Numerical Error
Printing
Commas
Special Values
Two Kinds Revisited
Resources

Scientific Notation

A number like 2.3 will be familiar. A number like 2.3e6 may be confusing to the uninitiated.

This latter number has three parts:

  • a value
  • the letter “e”
  • the number of places to shift the decimal point.

So 2.3e6 is the same as 2300000 (that is, 2.3 million), and 2.3e-4 is the same as .00023.

Some languages allow the “e” to be a “d”. R does not.

Two Kinds of Numbers

R has two types of number:

  • integer
  • floating point

Most of the time you don’t care.

R will automatically move between the two types as the situation demands in almost all cases. Types of numbers only become important at the point that your expectations are violated. It is almost surely the case that it is your expectations and not R that needs to change — we’ll get to that shortly.

An integer in R is pretty much like the integers in your head — whole numbers that can be positive or negative. A difference is that integers in R only go to about plus or minus two billion.

A floating point number is sort of like the real numbers in your head. Here’s the rub: floating point numbers are significantly different than the real numbers in your head.

Floating point numbers are in scientific notation — they have a value part and a part that says how far to move the “decimal” point. The “decimal” is in quotes because it really should be “binary point”. Computers think in binary.

A floating point number has a value written in binary, and an indication of how many binary places to move over (by the way, that’s where “floating” comes in).

There are two important things to remember about floating point numbers:

  • they are limited in the number of real numbers they represent.
  • the numbers they do represent are binary.

Let’s look at some examples. Floating point numbers can exactly represent one-half and one-quarter. Floating points do not exactly represent one-third or one-tenth.

Numerical Error

Table 1

  Item   Fraction   Percent
A       2/7       29
B       2/7       29
C       3/7       43
total        1     101

 

Table 1 exhibits numerical error in the familiar world. The percentages don’t really add up to 101%, but that is as close as we can come when we are only using whole percentages. One-seventh is not an even percentage.

There is error. But not error in the sense of a mistake. There is error in the sense of inaccuracy.

A common message to R-help — probably the most common — goes something like:

R is wrong because

> seq(0, 1, by=.1)[4] == .3
[1] FALSE

To expect that result to be TRUE is absolutely the same thing as always expecting the total percentage in cases like Table 1 to be 100.

Do not expect that.

What you should expect is that you get numerical error whenever there are operations with floating point numbers.

Printing

Part of the problem with the sequence example is that R is too nice. R prints the expected result:

> seq(0, 1, by=.1)[4]
[1] 0.3

R, by default, prints only some of the decimal places that it has available. This makes a lot of numerical error invisible. The advantage is simpler looking objects. The disadvantage is that you tend to forget about numerical error.

We can look at what R really thinks the answer is by printing all of the digits that it has. On one system this is:

> print(seq(0, 1, by=.1)[4], digits=20)
[1] 0.30000000000000004

On another it is:

> print(seq(0, 1, by=.1)[4], digits=20)
[1] 0.3

In the second case you are probably expecting the equality test to be TRUE. Do not expect that. Remember that we are translating from binary to decimal. There is more than one binary number that translates to 0.3.

We can see how far apart the numbers are by subtracting:

> seq(0, 1, by=.1)[4] - 0.3
[1] 5.551115e-17

Commas

If you want to represent numbers using a comma to mark the decimal point, then you can use format with the decimal.mark argument:

> format(seq(0, 1, by=.5), decimal.mark=",")
[1] "0,0" "0,5" "1,0"

A more general way of getting commas as the decimal mark is to set the OutDec option:

> options(OutDec=",")
> seq(0, 1, by=.5)
[1] 0,0 0,5 1,0

See the options help file for more information.

If you want to go the other way, then the sub function works:

> nums <- format(seq(0, 1, by=.5), decimal.mark=",")
> as.numeric(sub(",", ".", nums))
[1] 0.0 0.5 1.0

Alternatively if you have commas in your numbers that are marking orders of magnitude, then use gsub:

> as.numeric(gsub(",", "", "2,300,000"))
[1] 2300000

The difference between sub and gsub is that sub only does one substitution (per element) while gsub does all of the substitutions possible.

Special Values

R has plus and minus infinity as numbers. We can see once again that floating point numbers are substantially different from the real numbers in our heads:

> 1e308
[1] 1e+308
> 1e308 * 10
[1] Inf

Infinity starts just after the largest possible floating point number.

Another special value is NaN which means Not-a-Number. This is created from operations that have no definite mathematical limit. Here are two examples:

> 0/0
[1] NaN
> Inf - Inf
[1] NaN

Two Kinds Revisited

Remember there are two types of numbers in R — floating point and integer. Let’s experiment some:

> mode(1)
[1] "numeric"
typeof(1)
[1] "double"
> mode(1:2)
[1] "numeric"
> typeof(1:2)
[1] "integer"

Almost always numbers are floating point. In particular, they are double precision floating point numbers.

The colon operator used with logically integer values is one of the few places where the result is integer by default.

Remember that is.integer is testing the storage type of the object, it is not testing if the values are logically integer:

> is.integer(1)
[1] FALSE
> is.integer(1:2)
[1] TRUE
> is.integer(as.integer(1))
[1] TRUE
> is.integer(as.double(1:2))
[1] FALSE

We just saw that 1 is interpreted as a floating point number rather than an integer. Put capital “L” (as in “long”) after a number to make R create it as an integer.

> is.integer(1L)
[1] TRUE

Resources

Circle 1 of The R Inferno contains more on numerical error.

Back to top level of Impatient R

Back to More R Key Objects

© Copyright - Burns Statistics