02 Jan

# More R Numbers

## Scientific Notation

A number like `2.3` will be familiar. A number like `2.3e6` may be confusing to the uninitiated.

This latter number has three parts:

• a value
• the letter “e”
• the number of places to shift the decimal point.

So `2.3e6` is the same as `2300000` (that is, 2.3 million), and `2.3e-4` is the same as `.00023`.

Some languages allow the “e” to be a “d”. R does not.

## Two Kinds of Numbers

R has two types of number:

• integer
• floating point

Most of the time you don’t care.

R will automatically move between the two types as the situation demands in almost all cases. Types of numbers only become important at the point that your expectations are violated. It is almost surely the case that it is your expectations and not R that needs to change — we’ll get to that shortly.

An integer in R is pretty much like the integers in your head — whole numbers that can be positive or negative. A difference is that integers in R only go to about plus or minus two billion.

A floating point number is sort of like the real numbers in your head. Here’s the rub: floating point numbers are significantly different than the real numbers in your head.

Floating point numbers are in scientific notation — they have a value part and a part that says how far to move the “decimal” point. The “decimal” is in quotes because it really should be “binary point”. Computers think in binary.

A floating point number has a value written in binary, and an indication of how many binary places to move over (by the way, that’s where “floating” comes in).

There are two important things to remember about floating point numbers:

• they are limited in the number of real numbers they represent.
• the numbers they do represent are binary.

Let’s look at some examples. Floating point numbers can exactly represent one-half and one-quarter. Floating points do not exactly represent one-third or one-tenth.

## Numerical Error

Table 1

Item   Fraction   Percent
A       2/7       29
B       2/7       29
C       3/7       43
total        1     101

Table 1 exhibits numerical error in the familiar world. The percentages don’t really add up to 101%, but that is as close as we can come when we are only using whole percentages. One-seventh is not an even percentage.

There is error. But not error in the sense of a mistake. There is error in the sense of inaccuracy.

A common message to R-help — probably the most common — goes something like:

R is wrong because

```> seq(0, 1, by=.1) == .3
 FALSE```

To expect that result to be `TRUE` is absolutely the same thing as always expecting the total percentage in cases like Table 1 to be 100.

Do not expect that.

What you should expect is that you get numerical error whenever there are operations with floating point numbers.

## Printing

Part of the problem with the sequence example is that R is too nice. R prints the expected result:

```> seq(0, 1, by=.1)
 0.3```

R, by default, prints only some of the decimal places that it has available. This makes a lot of numerical error invisible. The advantage is simpler looking objects. The disadvantage is that you tend to forget about numerical error.

We can look at what R really thinks the answer is by printing all of the digits that it has. On one system this is:

```> print(seq(0, 1, by=.1), digits=20)
 0.30000000000000004```

On another it is:

```> print(seq(0, 1, by=.1), digits=20)
 0.3```

In the second case you are probably expecting the equality test to be `TRUE`. Do not expect that. Remember that we are translating from binary to decimal. There is more than one binary number that translates to 0.3.

We can see how far apart the numbers are by subtracting:

```> seq(0, 1, by=.1) - 0.3
 5.551115e-17```

## Commas

If you want to represent numbers using a comma to mark the decimal point, then you can use `format` with the `decimal.mark` argument:

```> format(seq(0, 1, by=.5), decimal.mark=",")
 "0,0" "0,5" "1,0"```

A more general way of getting commas as the decimal mark is to set the `OutDec` option:

```> options(OutDec=",")
> seq(0, 1, by=.5)
 0,0 0,5 1,0```

See the `options` help file for more information.

If you want to go the other way, then the `sub` function works:

```> nums <- format(seq(0, 1, by=.5), decimal.mark=",")
> as.numeric(sub(",", ".", nums))
 0.0 0.5 1.0```

Alternatively if you have commas in your numbers that are marking orders of magnitude, then use `gsub`:

```> as.numeric(gsub(",", "", "2,300,000"))
 2300000```

The difference between `sub` and `gsub` is that `sub` only does one substitution (per element) while `gsub` does all of the substitutions possible.

## Special Values

R has plus and minus infinity as numbers. We can see once again that floating point numbers are substantially different from the real numbers in our heads:

```> 1e308
 1e+308
> 1e308 * 10
 Inf```

Infinity starts just after the largest possible floating point number.

Another special value is `NaN` which means Not-a-Number. This is created from operations that have no definite mathematical limit. Here are two examples:

```> 0/0
 NaN
> Inf - Inf
 NaN```

## Two Kinds Revisited

Remember there are two types of numbers in R — floating point and integer. Let’s experiment some:

```> mode(1)
 "numeric"
typeof(1)
 "double"
> mode(1:2)
 "numeric"
> typeof(1:2)
 "integer"```

Almost always numbers are floating point. In particular, they are double precision floating point numbers.

The colon operator used with logically integer values is one of the few places where the result is integer by default.

Remember that `is.integer` is testing the storage type of the object, it is not testing if the values are logically integer:

```> is.integer(1)
 FALSE
> is.integer(1:2)
 TRUE
> is.integer(as.integer(1))
 TRUE
> is.integer(as.double(1:2))
 FALSE```

We just saw that `1` is interpreted as a floating point number rather than an integer. Put capital “L” (as in “long”) after a number to make R create it as an integer.

```> is.integer(1L)
 TRUE```

## Resources

Circle 1 of The R Inferno contains more on numerical error.