formatting multi-row data into single row in R

我只是一个虾纸丫 提交于 2019-12-13 20:15:51

问题


I am a strange excel or csv formatted file which I want to import to R as a data frame. The problem is that some columns have multiple rows for the records, for example, the data is as follow: There are three columns and two rows but the tools columns has multiple columns, is there a way I can format the data so I will have only record with multiple tools (like say tool1, tool2, etc)

Task             Location  Tools 
Raising ticket   Alabama   sharepoint
                           word
                           oracle
Changing ticket  Seattle   word 
                           oracle

Final output expected

Task             Location  Tools1   Tools2  Tools3
Raising ticket   Alabama   sharepoint   word    oracle
Changing ticket  Seattle   word         oracle

回答1:


With dplyr and tidyr. You can fill your dataframe so that Task and Location are included in each row. Then group_by Task and mutate to add an id column for each task within each group. Then use spread to spread the newly created id column across multiple columns.

library(dplyr)
library(tidyr)
df <- data.frame(Task = c("Raising ticket","","","Changing ticket",""), Location = c("Alabama","","","Seattle",""), Tools = c("sharepoint","word","oracle","word","oracle"))
df[df==""]  <- NA
df %>%
  fill(Task,Location) %>%
  group_by(Task) %>%
  mutate(id = paste0("Tools",row_number())) %>%
  spread(id, Tools)

# A tibble: 2 x 5
# Groups: Task [2]
#  Task            Location Tools1     Tools2 Tools3
#   <fct>           <fct>    <fct>      <fct>  <fct> 
# 1 Changing ticket Seattle  word       oracle <NA>  
# 2 Raising ticket  Alabama  sharepoint word   oracle


来源:https://stackoverflow.com/questions/51442307/formatting-multi-row-data-into-single-row-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!