An xts R Inferno-ism

07 Feb 2014
2014/02/07

Another of the all ye entering here.

Issue

When subscripting an xts object, columns that don’t exist in the object are silently ignored.

Example

First, create an xts object:

xtx <- xts(cbind(a=1:4, b=11:14, c=21:24), order=Sys.Date() + 1:4)

which looks like:

> xtx
           a  b  c
2014-02-07 1 11 21
2014-02-08 2 12 22
2014-02-09 3 13 23
2014-02-10 4 14 24

Using this, we see that there is no error and no warning when asking for a non-existent column:

> xtx[, c("b", "notHere", "a")]
            b a
2014-02-07 11 1
2014-02-08 12 2
2014-02-09 13 3
2014-02-10 14 4

In contrast, an error is thrown when this is done with a matrix:

> as.matrix(xtx)[, c("b", "notHere", "a")]
Error in as.matrix(xtx)[, c("b", "notHere", "a")] : 
  subscript out of bounds

Big danger

I discovered this when testing some new code.  I observed non-missing values in a column that should have had all missing values.  It would have been easy to miss this if that hadn’t have been the case.

Consider:

newmat <- array(NA, c(4, 4), list(NULL, c("b", "notHere", "a", "Z")))
newmat[] <- xtx[, c("b", "notHere", "a", "Z")]

And the result is:

> newmat
      b notHere  a Z
[1,] 11       1 11 1
[2,] 12       2 12 2
[3,] 13       3 13 3
[4,] 14       4 14 4

No error, no warning (in this case), but wrong.

The very worst possibility in software is getting wrong results with no indication of a problem (Chapter 50 of Tao Te Programming).

Desiderata

I’d like to lobby for a warning if columns are requested that don’t exist.

There’s the question of whether an error should be thrown or not.  At this point throwing an error would break backward compatibility (Chapter 72 of Tao Te Programming).  I doubt that there’s much code that depends on the current behavior, but presumably there was some reason not to throw an error.

A compromise would be to add an argument to subscripting that would force an error.  For example:

xtx[, c("b", "notHere", "a"), strict=TRUE]

The default value for strict would be FALSE in order to preserve the current behavior.  However, there is an argument for making the default TRUE — this forces people to truly decide that the current behavior is what they want.

See also

There are some related items in The R Inferno.

The closest in spirit is probably Circle 8.2.13.

The  most troublesome is Circle 8.1.44 — not saying drop=FALSE in subscripting a matrix when it is appropriate.

Circle 8.1.46 is also related, but my favorite in this genre is Circle 8.3.25.

Epilogue

Written by an Italian poet
From the thirteenth century
And every one of them words rang true
And glowed like burning coal

from “Tangled Up in Blue” by Bob Dylan

Tags: , ,
10 replies
  1. Carl Witthoft says:

    MIchael’s proposal -issue a Warning rather than Error might be a good compromise. I checked – if one creates a `ts` object with the same names, errors are properly thrown. Here’s a quote from the xts vignette:
    “The generic methods currently extended to xts include
    [“, cbind, rbind, c, str, Ops, print, na.omit, time, index, plot and coredata
    . In addition, most methods that can accept zoo or matrix objects will
    simply work as expected.”
    Given that, I would definitely label the behavior you found as a bug and ask the maintainer to fix it. Commonality of method behavior is more important than back-compatibility with a mistake (IMHO)

    Reply
    • Patrick Burns says:

      Carl,

      I favor an error as well, but don’t feel it is my decision to make. Josh pointed out that zoo throws an error.

      Reply
      • Joshua Ulrich says:

        I plan on looking into this. My guess is that it’s unintended. It happens because the heavy-lifting of xts subsetting is done in C, which needs integer vectors of row/column locations. The xts subsetting R code finds these integer locations, but doesn’t currently check that all requested columns are matched.

        Reply
  2. Giles Heywood says:

    Wow. Much inferno.

    Add instance:

    > require(xts)
    > require(zoo)
    > x z lag(x,1)
    [,1]
    1970-01-02 NA
    1970-01-03 1
    1970-01-04 2
    > lag(z,1,na=T)
    xx
    1970-01-02 2
    1970-01-03 3
    1970-01-04 NA

    Many definition. Such gotcha. So scare.

    Reply
  3. Giles Heywood says:

    Here is a further example of inferno-ism, this time definition of lag:

    > require(xts)
    > x z lag(x,1)
    [,1]
    1970-01-02 NA
    1970-01-03 1
    1970-01-04 2
    > lag(z,1,na=TRUE)
    xx
    1970-01-02 2
    1970-01-03 3
    1970-01-04 NA
    > is(x,’zoo’)
    [1] TRUE

    So in a nutshell:
    There are two possible definitions of lag
    It is vital to know which one is being used
    xts went for the *other* definition (not the one used by zoo)
    xts extends zoo, so an xts object ‘is’ a zoo, but lags the other way
    This is a potential ‘gotcha’ in any application mixing the two classes

    Reply
    • Joshua Ulrich says:

      Yes, xts purposefully breaks the “convention” of ts and zoo that lag(x,1) means to “lag” the series 1 observation into the future. This is explained in ?lag.xts, which also tells you how to set a global option to make lag.xts behave like lag.ts and lag.zoo. The xts and zoo authors have discussed adding a lagts generic that would follow the convention that a lag of 1 means the series is lagged 1 observation into the past.

      Reply
  4. Giles Heywood says:

    my code example gets munched! The following assignments are supposed to be in there. (they could be done more conventionally with the usual assignment operator… but WordPress is taking a dislike to *something* in the code example, so I’ll post this:

    assign(‘x’,xts(1:3,as.Date(1:3)))
    assign(‘z’,zoo(as.matrix(1:3),as.Date(1:3)))

    Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *

© Copyright - Burns Statistics