here's simple data frame missing value:
m = data.frame( name = c('name','name'), col1 = c(na,1) , col2 = c(1,1))
when apply aggregate m way:
aggregate(.~name, m, fun=sum, na.rm=true)
the result is:
rowname col1 col2 name 1 1
so entire first row ignored. if do
aggregate(m[,2:3], by=list(m$name), fun=sum, na.rm=true)
the result is
group.1 col1 col2 name 1 2
so (1,1) entry ignored.
this caused major debugging headache in 1 of codes, since thought these 2 calls equivalent. there reason why "formula" entry method treated differently?
thanks.
good question, in opinion, shouldn't have caused major debugging headache because documented quite in multiple places in manual page aggregate
.
first, in usage section:
## s3 method class 'formula' aggregate(formula, data, fun, ..., subset, na.action = na.omit)
later, in description:
na.action
: function indicates should happen when data contain na values. default ignore missing values in given variables.
i can't answer why formula mode written differently---that's function authors have answer---but using above information, can use following:
aggregate(.~name, m, fun=sum, na.rm=true, na.action=null) # name col1 col2 # 1 name 1 2
Comments
Post a Comment