r - When the data and variable names change in my function, how should I write them? -


i wanted create function using quantile function (quantile) can produce quantile values cut-points, in dplyr environment.

for example, want create function making result below.

# load library , data  library(dplyr); library(rlang) iris <- iris   cut_points_1 <- c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 1)   quantile(iris$sepal.length, cut_points_1)     0%   10%   20%   30%   40%   50%   60%   70%   80%   90%   95%  100%      4.300 4.800 5.000 5.270 5.600 5.800 6.100 6.300 6.520 6.900 7.255 7.900  

but, cannot understand how manage part (iris$sepal.length) in function. specifically, not know how call variable name in data.frame, when using non-dplyr functions (e.g., quantile). in other words, when names of data , var_name change in function, not know how write data$var_name in function.

below code , function.

# load library , data  library(dplyr); library(rlang) iris <- iris   # cut-points (percentage) cut_points_1 <- c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95,     1)  cut_points_2 <- c(0, 0.2, 0.4, 0.6, 0.8, 1)  # function  cut <- function(data, var_name, cut_points) {   data <- enquo(data)   cut_points <- enquo(cut_points)    varname_cut <- paste0(substitute(var_name), "_cut") # different variable name: source(https://stackoverflow.com/questions/46131829/unquote-the-variable-name-on-the-right-side-of-mutate-function-in-dplyr/46132317?noredirect=1#comment79234301_46132317)     !!varname_cut := quantile(!!data$!!var_name, cut_points) # <- problem! }  # run cut(iris, sepal.length, cut_points_1) cut(iris, sepal.length, cut_points_2) 

here solution, adapting function make work :

# load library , data  library(dplyr, warn.conflicts = f) iris <- iris   # cut-points (percentage) cut_points_1 <- c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 1)  cut_points_2 <- c(0, 0.2, 0.4, 0.6, 0.8, 1)  # function  cut <- function(data, var_name, cut_points) {   var_name <- enquo(var_name)   varname_cut <- paste0(quo_name(var_name), "_cut")   tibble(cut_points = cut_points,          !!varname_cut := data %>% pull(!!var_name) %>% quantile(cut_points)) }  # run cut(iris, sepal.length, cut_points_1) #> # tibble: 12 x 2 #>    cut_points sepal.length_cut #>         <dbl>            <dbl> #>  1       0.00            4.300 #>  2       0.10            4.800 #>  3       0.20            5.000 #>  4       0.30            5.270 #>  5       0.40            5.600 #>  6       0.50            5.800 #>  7       0.60            6.100 #>  8       0.70            6.300 #>  9       0.80            6.520 #> 10       0.90            6.900 #> 11       0.95            7.255 #> 12       1.00            7.900 cut(iris, sepal.length, cut_points_2) #> # tibble: 6 x 2 #>   cut_points sepal.length_cut #>        <dbl>            <dbl> #> 1        0.0             4.30 #> 2        0.2             5.00 #> 3        0.4             5.60 #> 4        0.6             6.10 #> 5        0.8             6.52 #> 6        1.0             7.90 

i add column cut_points quantile result. can format % if needed

some explanation

  • you don't need use enquo on data , cut_points because don't need quosure in function them. passed object.
  • you can use quo_name name of quosure paste it
  • you can use dplyr::pull column of data vector , no one-column tibble

Comments