R reshape data from long to wide and vice versa -
i wrote 2 wrapper functions cast , melt bring data long wide form , vice versa. however, still struggle function reshape_wide
brings data long form wide form.
here example functions plus code run it. created dummy data.frame in wide format reshape long format using reshape_long
function , transform original wide form using reshape_wide
function. however, reshaping fails reason cannot figure it. seems formula used in dcast
wrong.
reshape_long <- function(data, identifiers) { data_long <- melt(data, id.vars = identifiers, variable.name="name", value.name="value") data_long$value <- as.numeric(data_long$value) data_long <- data_long[!is.na(data_long$value), ] return(data_long) } reshape_wide <- function(data, identifiers, name) { if(is.null(identifiers)) { formula_wide <- as.formula(paste(paste(identifiers,collapse="+"), "series ~ ", name)) } else { formula_wide <- as.formula(paste(paste(identifiers,collapse="+"), "+ series ~ ", name)) } series <- ave(1:nrow(data), data$name, fun=function(x) { seq.int(along=x) }) data <- cbind(data, series) data_wide <- dcast(data, formula_wide, value.var="value") data_wide <- data_wide[,!(names(data_wide) %in% "series")] return(data_wide) } data <- data.frame(id = rep("k", 6), type = c(rep("a", 3), rep("b", 3)), x = c(na,na,1,2,3,4), y = 5:10, z = c(na,11,12,na,14,na)) data <- reshape_long(data, identifiers = c("id", "type")) data reshape_wide(data, identifiers = c("id", "type"), name="name")
here link r output when run code above:
what wrong in column type b appears 5 times rather 3 times should be. same data.frame?
here r output sessioninfo()
> sessioninfo() r version 2.14.0 (2011-10-31) platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] c attached base packages: [1] grid stats graphics grdevices utils datasets methods [8] base other attached packages: [1] reshape2_1.2.1 outliers_0.14 lme4_0.999375-42 [4] matrix_1.0-1 gregmisc_2.1.2 gplots_2.10.1 [7] kernsmooth_2.23-7 catools_1.12 bitops_1.0-4.1 [10] gtools_2.6.2 gmodels_2.15.1 gdata_2.8.2 [13] lattice_0.20-0 dataframes2xls_0.4.5 rankprod_2.26.0 [16] r.utils_1.9.3 r.oo_1.8.3 r.methodss3_1.2.1 [19] xlsx_0.3.0 xlsxjars_0.3.0 rjava_0.9-2 [22] rj_1.0.0-3 loaded via namespace (and not attached): [1] mass_7.3-16 nlme_3.1-102 plyr_1.6 rj.gd_1.0.0-1 stats4_2.14.0 [6] stringr_0.5 tools_2.14.0
the example cannot work: since id , type not form primary key (i.e., since there several rows same id , type), when data put in tall format, no longer know if 2 values come same row.
also, not sure trying series
column, not seem work.
library(reshape2) d <- data.frame( id = rep("k", 6), type = c(rep("a", 3), rep("b", 3)), x = c(na,na,1,2,3,4), y = 5:10, z = c(na,11,12,na,14,na) ) d$row <- seq_len(nrow(d)) # (row,id,type) primary key d d1 <- reshape_long(d, identifiers = c("row", "id", "type")) d1 dcast(d1, row + id + type ~ name) # want reshape_wide(d1, identifiers = c("row", "id", "type"), name="name")
Comments
Post a Comment