similar unanswered question here: merge 2 dataframes condition on timestamp question more general , not time-series specific
i have 2 dataframes want merge proximity condition on 2 columns both dataframes. in sql have done like
select * (select * a) t1 inner join (select * b) t2 on t1.user = t2.user , t1.label - t2.label < 2
what i'm looking canincal way above in r, like-
merge(x,y,by='user', condition = x$label - y$label <=2 )
so following should without rows 3,7,11,12,13,15 etc....
set.seed(1212) <- data.frame(user=rep(paste("u",1:3,sep=''),4),label=sample.int(10,12,t)) b <- data.frame(user=rep(paste("u",1:3,sep=''),4),label=sample.int(10,12,t)) merge(a,b,by='user') user label.x label.y 1 u1 3 1 2 u1 3 3 3 u1 3 10 4 u1 3 5 5 u1 3 1 6 u1 3 3 7 u1 3 10 8 u1 3 5 9 u1 1 1 10 u1 1 3 11 u1 1 10 12 u1 1 5 13 u1 4 1 14 u1 4 3 15 u1 4 10 16 u1 4 5 17 u2 7 1 18 u2 7 7 19 u2 7 4 20 u2 7 2 21 u2 2 1 22 u2 2 7 23 u2 2 4 24 u2 2 2 25 u2 6 1 26 u2 6 7 27 u2 6 4 28 u2 6 2 29 u2 1 1 30 u2 1 7 31 u2 1 4 32 u2 1 2 33 u3 8 7 34 u3 8 1 35 u3 8 7 36 u3 8 4 37 u3 1 7 38 u3 1 1 39 u3 1 7 40 u3 1 4 41 u3 10 7 42 u3 10 1 43 u3 10 7 44 u3 10 4 45 u3 9 7 46 u3 9 1 47 u3 9 7 48 u3 9 4
with base r, try:
subset(merge(a, b, = 'user'), label.x - label.y <= 2)
if data huge, can try non-equi join data.table
:
library(data.table) setdt(a) setdt(b) a[, label.a := label - 2] b[, label.b := label] y <- a[b, on = .(user, label.a <= label.b), allow.cartesian=true][, label.a := null]
> head(y) user label i.label 1: u1 1 1 2: u1 3 1 3: u1 3 1 4: u2 1 7 5: u2 2 7 6: u2 6 7
here, label
in y
label
a
, , label.i
label
b
.
or more explictly, use j
expression:
library(data.table) setdt(a) setdt(b) a[, label2 := label - 2] y <- a[ b, .(user, label.a = label, label.b = i.label), on = .(user, label2 <= label), allow.cartesian=true]
Comments
Post a Comment