One to many, left, outer join with pandas (Python) -


i'm trying join 3 tables using python 2.7 , pandas. tables ones below:

table 1 id  |  test 1   |  ss 2   |  sb 3   |  sc  table 2 id  |  tested  |  value1  |  value2  |  id2 1   |        |  e       |  o       |  1 1   |  axe     |  ee      |  e       |  1 1   |  bce     |  io      |  p       |  3 2   |  bee     |  kd      |  …       |  2 2   |  bdd     |        |  fff     |  3 3   |  db      |  f       |  yiueie  |  2  table 3 id2  |  type 1    |  1    |  d 1    |  h 3    |  e 1    |  o 2    |  ou 2    |  oui 3    |  op 

the code i'm using below:

import pandas pd  xl = pd.excelfile(r'c:\users\joe\desktop\project1\xlfiles\test1.xlsx') xl.sheet_names df = xl.parse("sheet1") df.head()  xl2 = pd.excelfile(r'c:\users\joe\desktop\project1\xlfiles\test2.xlsx') xl2.sheet_names df2 = xl2.parse("sheet1") df2.head()  xl3 = pd.excelfile(r'c:\users\joe\desktop\project1\xlfiles\test3.xlsx') xl3.sheet_names df3 = xl3.parse("sheet1") df3.head()  df3 = df3.groupby('id2')['type'].apply(','.join).reset_index()  s1 = pd.merge(df2, df3, how='left', on=['id2']) 

the code joins table 3 table table 2 how like. but, can't figure out how group multiple columns join s1 table 1. need information every column in s1 added table 1, want 1 row each id value (3 rows total). know how this?

my expected output, reference, below:

id  |  test  |  type     |  tested     |  value1   |  id2   1   |  ss    |  i,d,h,o  |  a,axe,bce  |  e,ee,io  |  1,1,3 2   |  sb    |  ou,oui   |  bee,bdd    |  kd,a     |  2,3 3   |  sc    |  e,op     |  db         |  f        |  2 

thanks in advance help.

you can use cumcount count id2 in both df2 , df3 merge unique pairs. groupby , aggregate join.

last use join:

df2['g'] = df2.groupby('id2').cumcount() df3['g'] = df3.groupby('id2').cumcount() df23 = pd.merge(df2, df3, how='left', on=['g','id2']).astype(str).groupby('id').agg(','.join) #for same dtype match - int df23.index = df23.index.astype(int) print (df23)        tested   value1   value2    id2      g   type id                                                   1   a,axe,bce  e,ee,io    o,e,p  1,1,3  0,1,0  i,d,e 2     bee,bdd     kd,a  ...,fff    2,3    0,1  ou,op 3          db        f   yiueie      2      1    oui  df = df1.join(df23, on='id') #subset , desired order of output columns cols = ['id','test','type','tested','value1','id2'] df = df[cols] print (df)    id test   type     tested   value1    id2 0   1   ss  i,d,e  a,axe,bce  e,ee,io  1,1,3 1   2   sb  ou,op    bee,bdd     kd,a    2,3 2   3  sci    oui         db        f      2 

Comments