i have dataframe column contains lists. want a) find unique values of lists b) make dictionary format {uniquevalue : [indexa, indexb,...]}, indices correspond index of dataframe row contains uniquevalue.
i have done a, code b creates dictionary has indexes, regardless if contained in row or not. please help?
import pandas pd df = pd.read_excel(io = 'links.xlsx') unique_list = [] row in df['relevant_links']: row_list = row.split(sep = ', ') unique_list.extend(row_list) unique_set = set(unique_list) unique_dict = dict.fromkeys(unique_set, []) print(unique_dict.keys()) row_idx = 0 row in df['relevant_links']: [unique_dict[i].append(row_idx) in str(row).split(', ') if in unique_dict] row_idx += 1
i think can use:
df = pd.dataframe({'relevant_links':['a, c, v','a, r, e','e, t','e, r']}) print (df) relevant_links 0 a, c, v 1 a, r, e 2 e, t 3 e, r #create series s = df['relevant_links'].str.split(', ', expand=true).stack() #groupby unique links, create list , dict unique_dict = s.reset_index(name='val').groupby('val')['level_0'].apply(list).to_dict() print (unique_dict) {'v': [0], 't': [2], 'r': [1, 3], 'e': [1, 2, 3], 'a': [0, 1], 'c': [0]} unique_set = s.unique().tolist() print (unique_set) ['a', 'c', 'v', 'r', 'e', 't']
Comments
Post a Comment