i wanted create function using quantile function (quantile
) can produce quantile values cut-points, in dplyr
environment.
for example, want create function making result below.
# load library , data library(dplyr); library(rlang) iris <- iris cut_points_1 <- c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 1) quantile(iris$sepal.length, cut_points_1) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 95% 100% 4.300 4.800 5.000 5.270 5.600 5.800 6.100 6.300 6.520 6.900 7.255 7.900
but, cannot understand how manage part (iris$sepal.length
) in function. specifically, not know how call variable name in data.frame, when using non-dplyr functions (e.g., quantile
). in other words, when names of data
, var_name
change in function, not know how write data$var_name
in function.
below code , function.
# load library , data library(dplyr); library(rlang) iris <- iris # cut-points (percentage) cut_points_1 <- c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 1) cut_points_2 <- c(0, 0.2, 0.4, 0.6, 0.8, 1) # function cut <- function(data, var_name, cut_points) { data <- enquo(data) cut_points <- enquo(cut_points) varname_cut <- paste0(substitute(var_name), "_cut") # different variable name: source(https://stackoverflow.com/questions/46131829/unquote-the-variable-name-on-the-right-side-of-mutate-function-in-dplyr/46132317?noredirect=1#comment79234301_46132317) !!varname_cut := quantile(!!data$!!var_name, cut_points) # <- problem! } # run cut(iris, sepal.length, cut_points_1) cut(iris, sepal.length, cut_points_2)
here solution, adapting function make work :
# load library , data library(dplyr, warn.conflicts = f) iris <- iris # cut-points (percentage) cut_points_1 <- c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 1) cut_points_2 <- c(0, 0.2, 0.4, 0.6, 0.8, 1) # function cut <- function(data, var_name, cut_points) { var_name <- enquo(var_name) varname_cut <- paste0(quo_name(var_name), "_cut") tibble(cut_points = cut_points, !!varname_cut := data %>% pull(!!var_name) %>% quantile(cut_points)) } # run cut(iris, sepal.length, cut_points_1) #> # tibble: 12 x 2 #> cut_points sepal.length_cut #> <dbl> <dbl> #> 1 0.00 4.300 #> 2 0.10 4.800 #> 3 0.20 5.000 #> 4 0.30 5.270 #> 5 0.40 5.600 #> 6 0.50 5.800 #> 7 0.60 6.100 #> 8 0.70 6.300 #> 9 0.80 6.520 #> 10 0.90 6.900 #> 11 0.95 7.255 #> 12 1.00 7.900 cut(iris, sepal.length, cut_points_2) #> # tibble: 6 x 2 #> cut_points sepal.length_cut #> <dbl> <dbl> #> 1 0.0 4.30 #> 2 0.2 5.00 #> 3 0.4 5.60 #> 4 0.6 6.10 #> 5 0.8 6.52 #> 6 1.0 7.90
i add column cut_points quantile
result. can format %
if needed
some explanation
- you don't need use
enquo
ondata
,cut_points
because don't need quosure in function them. passed object. - you can use
quo_name
name of quosure paste it - you can use
dplyr::pull
column of data vector , no one-column tibble
Comments
Post a Comment