this question has answer here:
i'd count multiple values (contained in list per cell) on groupby object.
i have following dataframe:
| | record respondent’s sex | 7. use phone for? | |---|-----------------------------|---------------------------------------------| | 0 | male | sending texts;calls;receiving sending texts | | 1 | female | sending texts;calls;whatsapp;facebook | | 2 | male | sending texts;calls;receiving texts | | 3 | female | sending texts;calls |
i count every value in column 7. use phone for?
, after grouping on record respondent’s sex
.
i have no problem doing when there 1 value per cell.
grouped = df.groupby(['record respondent’s sex'], sort=true) question_counts = grouped['2. teacher, caregiver, or young adult ?'].value_counts(normalize=false, sort=true) question_data = [ {'2. teacher, caregiver, or young adult ?': question, 'record respondent’s sex': group, 'count': count*100} (group, question), count in dict(question_counts).items()] df_question = pd.dataframe(question_data)
gives me table looks this:
| 7. use phone for? | record respondent's sex | count | |-----------------------------------|-----------------------------|-------| | sending texts | male | 2 | | calls | male | 2 | | receiving texts | male | 2 | | sending texts | female | 2 | | calls | female | 2 | | whatsapp | female | 1 | | facebook | female | 1 |
if working multiple values!
value_counts()
doesn't work on lists multiple values, throws typeerror: unhashable type: 'list'
error. question counting occurrence of values in panda series? shows how deal in various ways, can't seem work on groupby object.
# initialize sample data. df = pd.dataframe({'record respondent’s sex': ['male', 'female'] * 2, '7. use phone for?': [ "sending texts;calls;receiving sending texts", "sending texts;calls;whatsapp;facebook", "sending texts;calls;receiving texts", "sending texts;calls" ]}) # split values on ';' , separate columns. melt result. df2 = pd.melt( pd.concat([df['record respondent’s sex'], df.loc[:, "7. use phone for?"].apply( lambda series: series.split(';')).apply(pd.series)], axis=1), id_vars='record respondent’s sex')[['record respondent’s sex', 'value']] # group on gender , rename columns. result = df2.groupby('record respondent’s sex')['value'].value_counts().reset_index() result.columns = ['record respondent’s sex', '7. use phone for?', 'count'] # reorder columns. >>> result[['7. use phone for?', 'record respondent’s sex', 'count']] 7. use phone for? record respondent’s sex count 0 calls female 2 1 sending texts female 2 2 facebook female 1 3 whatsapp female 1 4 calls male 2 5 sending texts male 2 6 receiving sending texts male 1 7 receiving texts male 1
Comments
Post a Comment