i'm trying build function can import/read several data tables in .csv files, , compute statistics on selected files. each of 332 .csv file contains table same column names: date, pollutant , id. there lot of missing values.
this function wrote far, compute mean of values pollutant:
pollutantmean <- function(directory, pollutant, id = 1:332) { library(dplyr) setwd(directory) good<-c() (i in (id)){ task1<-read.csv(sprintf("%03d.csv",i)) } p<-select(task1, pollutant) good<-c(good,complete.cases(p)) mean(p[good,]) }
the problem have each time goes through loop new file read , data read replaced data new file. end function working fine 1 single file, not when want select multiple files e.g. if ask id=10:20, end mean calculated on file 20.
how change code can select multiple files?
thank you!
my answer offers way of doing want (if understood correctly) without using loop. 2 assumptions are: (1) have 332 *.csv files same header (column names) - file of same structure, , (2) can combine tables 1 big data frame.
if these 2 assumptions correct, use list of files import files data frames (so answer not contain loop function!).
# creates list name of file. have provide path folder. file_list <- list.files(path = [your path *.csv files saved in], full.names = true) # create list of data frames. mylist <- lapply(file_list, read.csv) # 'row-bind' data frames of list 1 big list. mydata <- rbindlist(mylist) # can perform calculation on big data frame, using column information filter or subset information of subset of table (if necessary).
i hope helps.
Comments
Post a Comment