i trying evaluate relevance of features , using decisiontreeregressor()
the related part of code presented below:
# todo: make copy of dataframe, using 'drop' function drop given feature new_data = data.drop(['frozen'], axis = 1) # todo: split data training , testing sets(0.25) using given feature target # todo: set random state. sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test = train_test_split(new_data, data['frozen'], test_size = 0.25, random_state = 1) # todo: create decision tree regressor , fit training set sklearn.tree import decisiontreeregressor regressor = decisiontreeregressor(random_state=1) regressor.fit(x_train, y_train) # todo: report score of prediction using testing set sklearn.model_selection import cross_val_score #score = cross_val_score(regressor, x_test, y_test) score = regressor.score(x_test, y_test) print score # python 2.x
when run print
function, returns given score:
-0.649574327334
you can find score function implementatioin , explanation below here , below:
returns coefficient of determination r^2 of prediction. ... best possible score 1.0 , can negative (because model can arbitrarily worse).
i not grasp whole concept yet, explanation not helpful me. instance not understand why score negative , indicates (if squared, expect can positive).
what score indicates , why can negative?
if know article (for starters) might helpful well!
you can find rest of code here
you can find dataset here
the article execute cross_val_score
in decisiontreeregressor
implemented. may take @ documentation of scikitlearn decisiontreeregressor. basically, score see r^2, or (1-u/v). u squared sum residual of prediction, , v total square sum(sample sum of square).
u/v can arbitrary large when make bad prediction, while can small 0 given u , v sum of squared residual(>=0)
Comments
Post a Comment