regex - How to efficiently replace partial strings in pandas? -


objective: reformat contents of pandas dataframe based on has been provided me.

i have following dataframe: example dataframe

i looking change each column following style:

enter image description here

i using following code produce style need, not efficient:

lt = [] in patterns['components'][0]:     x in i.split('__'):         lt.append(x) lt[1].replace('(','').replace(', ',' < '+str(lt[0])+' ≤ ').replace(']','') 

i have attempted pandas replace no avail - throws no errors , seems ignore aiming do.

source df:

in [37]: df out[37]:                            components                             outcome 0          (quantity__(0.0, 16199.0])  (unitprice__(-1055.648, 3947.558]) 1  (unitprice__(-1055.648, 3947.558])          (quantity__(0.0, 16199.0]) 

solution:

in [38]: cols = ['components','outcome']     ...: df[cols] = df[cols].replace(r'\(([^_]*)__\(([^,\s]+),\s*([^\]]+)\]\).*',     ...:                             r'\2 < \1 <= \3',     ...:                             regex=true) 

result:

in [39]: df out[39]:                           components                            outcome 0          0.0 < quantity <= 16199.0  -1055.648 < unitprice <= 3947.558 1  -1055.648 < unitprice <= 3947.558          0.0 < quantity <= 16199.0 

update:

in [113]: df out[113]:                                 components                               outcome 0             (quantity__(0.0, 16199.0])     (unitprice__(-1055.648, 3947.558]) 1    (unitprice__(-1055.648, 3947.558])             (quantity__(0.0, 16199.0])  in [114]: cols = ['components','outcome']  in [115]: pat = r'\s*\(([^_]*)__\(([^,\s]+),\s*([^\]]+)\]\)\s*'  in [116]: df[cols] = df[cols].replace(pat, r'\2 < \1 <= \3', regex=true)  in [117]: df out[117]:                           components                            outcome 0          0.0 < quantity <= 16199.0  -1055.648 < unitprice <= 3947.558 1  -1055.648 < unitprice <= 3947.558          0.0 < quantity <= 16199.0 

or witout parentheses:

in [119]: df out[119]:                          components                           outcome 0         quantity__(0.0, 16199.0])  unitprice__(-1055.648, 3947.558] 1  unitprice__(-1055.648, 3947.558]          quantity__(0.0, 16199.0]  in [120]: pat = r'([^_]*)__\(([^,\s]+),\s*([^\]]+)\]'  in [121]: df[cols] = df[cols].replace(pat, r'\2 < \1 <= \3', regex=true)  in [122]: df out[122]:                           components                            outcome 0         0.0 < quantity <= 16199.0)  -1055.648 < unitprice <= 3947.558 1  -1055.648 < unitprice <= 3947.558          0.0 < quantity <= 16199.0 

Comments