An xts R Inferno-ism
Another of the all ye entering here.
Issue
When subscripting an xts
object, columns that don’t exist in the object are silently ignored.
Example
First, create an xts
object:
xtx <- xts(cbind(a=1:4, b=11:14, c=21:24), order=Sys.Date() + 1:4)
which looks like:
> xtx a b c 2014-02-07 1 11 21 2014-02-08 2 12 22 2014-02-09 3 13 23 2014-02-10 4 14 24
Using this, we see that there is no error and no warning when asking for a non-existent column:
> xtx[, c("b", "notHere", "a")] b a 2014-02-07 11 1 2014-02-08 12 2 2014-02-09 13 3 2014-02-10 14 4
In contrast, an error is thrown when this is done with a matrix:
> as.matrix(xtx)[, c("b", "notHere", "a")] Error in as.matrix(xtx)[, c("b", "notHere", "a")] : subscript out of bounds
Big danger
I discovered this when testing some new code. I observed non-missing values in a column that should have had all missing values. It would have been easy to miss this if that hadn’t have been the case.
Consider:
newmat <- array(NA, c(4, 4), list(NULL, c("b", "notHere", "a", "Z"))) newmat[] <- xtx[, c("b", "notHere", "a", "Z")]
And the result is:
> newmat b notHere a Z [1,] 11 1 11 1 [2,] 12 2 12 2 [3,] 13 3 13 3 [4,] 14 4 14 4
No error, no warning (in this case), but wrong.
The very worst possibility in software is getting wrong results with no indication of a problem (Chapter 50 of Tao Te Programming).
Desiderata
I’d like to lobby for a warning if columns are requested that don’t exist.
There’s the question of whether an error should be thrown or not. At this point throwing an error would break backward compatibility (Chapter 72 of Tao Te Programming). I doubt that there’s much code that depends on the current behavior, but presumably there was some reason not to throw an error.
A compromise would be to add an argument to subscripting that would force an error. For example:
xtx[, c("b", "notHere", "a"), strict=TRUE]
The default value for strict
would be FALSE
in order to preserve the current behavior. However, there is an argument for making the default TRUE
— this forces people to truly decide that the current behavior is what they want.
See also
There are some related items in The R Inferno.
The closest in spirit is probably Circle 8.2.13.
The most troublesome is Circle 8.1.44 — not saying drop=FALSE
in subscripting a matrix when it is appropriate.
Circle 8.1.46 is also related, but my favorite in this genre is Circle 8.3.25.
Epilogue
Written by an Italian poet
From the thirteenth century
And every one of them words rang true
And glowed like burning coal
from “Tangled Up in Blue” by Bob Dylan
Looks like an easy enough fix:
In one would just have something like
if(not(all(j %in% colnames(x)))) warning("Attempting to subscript by non-existent columns")
or similar insideif(is.character(j)) {
j <- match(j, colnames(x), nomatch=0L)
}
MIchael’s proposal -issue a Warning rather than Error might be a good compromise. I checked – if one creates a `ts` object with the same names, errors are properly thrown. Here’s a quote from the xts vignette:
“The generic methods currently extended to xts include
[“, cbind, rbind, c, str, Ops, print, na.omit, time, index, plot and coredata
. In addition, most methods that can accept zoo or matrix objects will
simply work as expected.”
Given that, I would definitely label the behavior you found as a bug and ask the maintainer to fix it. Commonality of method behavior is more important than back-compatibility with a mistake (IMHO)
Carl,
I favor an error as well, but don’t feel it is my decision to make. Josh pointed out that zoo throws an error.
I plan on looking into this. My guess is that it’s unintended. It happens because the heavy-lifting of xts subsetting is done in C, which needs integer vectors of row/column locations. The xts subsetting R code finds these integer locations, but doesn’t currently check that all requested columns are matched.
This has been unferno’d in version 0.10-0, which is now on CRAN.
Thank you Josh and team.
Wow. Much inferno.
Add instance:
> require(xts)
> require(zoo)
> x z lag(x,1)
[,1]
1970-01-02 NA
1970-01-03 1
1970-01-04 2
> lag(z,1,na=T)
xx
1970-01-02 2
1970-01-03 3
1970-01-04 NA
Many definition. Such gotcha. So scare.
Here is a further example of inferno-ism, this time definition of lag:
> require(xts)
> x z lag(x,1)
[,1]
1970-01-02 NA
1970-01-03 1
1970-01-04 2
> lag(z,1,na=TRUE)
xx
1970-01-02 2
1970-01-03 3
1970-01-04 NA
> is(x,’zoo’)
[1] TRUE
So in a nutshell:
There are two possible definitions of lag
It is vital to know which one is being used
xts went for the *other* definition (not the one used by zoo)
xts extends zoo, so an xts object ‘is’ a zoo, but lags the other way
This is a potential ‘gotcha’ in any application mixing the two classes
Yes, xts purposefully breaks the “convention” of ts and zoo that
lag(x,1)
means to “lag” the series 1 observation into the future. This is explained in?lag.xts
, which also tells you how to set a global option to makelag.xts
behave likelag.ts
andlag.zoo
. The xts and zoo authors have discussed adding alagts
generic that would follow the convention that a lag of 1 means the series is lagged 1 observation into the past.my code example gets munched! The following assignments are supposed to be in there. (they could be done more conventionally with the usual assignment operator… but WordPress is taking a dislike to *something* in the code example, so I’ll post this:
assign(‘x’,xts(1:3,as.Date(1:3)))
assign(‘z’,zoo(as.matrix(1:3),as.Date(1:3)))
Leave a Reply
Want to join the discussion?Feel free to contribute!