r - what are the steps to perform logistic regression? - Cross Validated


str(newshad2) 'data.frame':   462 obs. of  10 variables:  $ sbp      : int  160 144 118 170 134 132 142 114 114 132 ...  $ tobacco  : num  12 0.01 0.08 7.5 13.6 6.2 4.05 4.08 0 0 ...  $ ldl      : num  5.73 4.41 3.48 6.41 3.5 6.47 3.38 4.59 3.83 5.8 ...  $ adiposity: num  23.1 28.6 32.3 38 27.8 ...  $ typea    : int  49 55 52 51 60 62 59 62 49 69 ...  $ obesity  : num  25.3 28.9 29.1 32 26 ...  $ alcohol  : num  97.2 2.06 3.81 24.26 57.34 ...  $ age      : int  52 63 46 58 49 45 38 58 29 53 ...  $ chd      : int  1 1 0 1 1 0 0 1 0 1 ...  $ famhist  : num  2 1 2 2 2 2 1 2 2 2 ... 

i want predict heart disease chd giving input of other variables.

myfit <- glm(as.factor(chd) ~ ., data = newshad2, family = binomial       (link='logit')) summary(myfit) call: glm(formula = as.factor(chd) ~ ., family = binomial(link = "logit"),      data = newshad2)  deviance residuals:     min      1q  median      3q     max   -1.778  -0.821  -0.439   0.889   2.543    coefficients:              estimate std. error z value   pr(>|z|)     (intercept) -7.076091   1.340486   -5.28 0.00000013 *** sbp          0.006504   0.005730    1.14    0.25637     tobacco      0.079376   0.026603    2.98    0.00285 **  ldl          0.173924   0.059662    2.92    0.00355 **  adiposity    0.018587   0.029289    0.63    0.52570     typea        0.039595   0.012320    3.21    0.00131 **  obesity     -0.062910   0.044248   -1.42    0.15509     alcohol      0.000122   0.004483    0.03    0.97835     age          0.045225   0.012130    3.73    0.00019 *** famhist      0.925370   0.227894    4.06 0.00004896 *** --- signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1  (dispersion parameter binomial family taken 1)  null deviance: 596.11  on 461  degrees of freedom residual deviance: 472.14  on 452  degrees of freedom aic: 492.1  number of fisher scoring iterations: 5 

how interpret these results , should next? should remove, 1 variable @ time statistically insignificant , run glm model again?

if main objective "predict" heart disease, split data training , development/test set , check overall accuracy. 1 way predictions model (fit) below.

model = glm(y ~ x1 + x2 + ... ) predict(model, new_data, type = "response") 

additionally, calculate other metrics such precision, recall, , average precision further test generalizability of model. more information regarding such metrics, refer here.


Comments