问题
I am a strange excel or csv formatted file which I want to import to R as a data frame. The problem is that some columns have multiple rows for the records, for example, the data is as follow: There are three columns and two rows but the tools columns has multiple columns, is there a way I can format the data so I will have only record with multiple tools (like say tool1, tool2, etc)
Task Location Tools
Raising ticket Alabama sharepoint
word
oracle
Changing ticket Seattle word
oracle
Final output expected
Task Location Tools1 Tools2 Tools3
Raising ticket Alabama sharepoint word oracle
Changing ticket Seattle word oracle
回答1:
With dplyr
and tidyr
. You can fill
your dataframe so that Task and Location are included in each row. Then group_by
Task and mutate
to add an id column for each task within each group. Then use spread
to spread the newly created id column across multiple columns.
library(dplyr)
library(tidyr)
df <- data.frame(Task = c("Raising ticket","","","Changing ticket",""), Location = c("Alabama","","","Seattle",""), Tools = c("sharepoint","word","oracle","word","oracle"))
df[df==""] <- NA
df %>%
fill(Task,Location) %>%
group_by(Task) %>%
mutate(id = paste0("Tools",row_number())) %>%
spread(id, Tools)
# A tibble: 2 x 5
# Groups: Task [2]
# Task Location Tools1 Tools2 Tools3
# <fct> <fct> <fct> <fct> <fct>
# 1 Changing ticket Seattle word oracle <NA>
# 2 Raising ticket Alabama sharepoint word oracle
来源:https://stackoverflow.com/questions/51442307/formatting-multi-row-data-into-single-row-in-r