Reshaping a table in R while parsing information from column names and using it to collect information from specific columns

冷暖自知 提交于 2021-02-11 13:00:22

问题


I have this badly organized data table given to me, in which there are hundreds of columns (subset is given below)

Names of columns are dot delimited where the first field holds information about a type of object (e.g. Item123, object_AB etc.) without any naming convention. There is no specific order for these columns as well. Other columns share the type of object field and also have the name of some property for that object (e.g. color, manufacturer etc.).

Item123.type.value  Item123.mass.value  Item123.color.value object_AB.type.value  object_AB.mass.value  object_AB.color.value
Desk  11.2  blue  Chair 2.3 orange
Desk 14.2 red Sofa  22  grey
Armchair  23.3  black  Monitor 2.2 white

EDITED: Adding dput() structure:

structure(list(Item123.type.value = structure(c(2L, 2L, 1L),
levels = c("Armchair", "Desk"), class = "factor"), Item123.mass.value = structure(1:3,
levels = c("11.2", "14.2", "23.3"), class = "factor"), Item123.color.value = structure(c(2L,
3L, 1L), levels = c("black", "blue", "red"), class = "factor"),
object_AB.type.value = structure(c(1L, 3L, 2L), levels = c("Chair",
"Monitor", "Sofa"), class = "factor"), object_AB.mass.value = structure(c(2L,
3L, 1L), levels = c("2.2", "2.3", "22"), class = "factor"),
object_AB.color.value = structure(c(2L, 1L, 3L), levels = c("grey",
"orange", "white"), class = "factor")), row.names = c(NA_integer_,
-3L), class = "data.frame")

I need to convert the table into something like this (order of rows does not matter):

type  name  mass  color
Item123  Desk  11.2  blue
Item123  Desk  14.2  red
object_AB  Chair 2.3 orange
object_AB  Sofa  22  grey
Item123  Armchair  23.3  black
object_AB  Monitor 2.2 white

I would really appreciate any help I could get!!


回答1:


You can use pivot_longer here specifying names_pattern to get data from the column names.

tidyr::pivot_longer(df, 
                    cols = everything(), 
                    names_to = c('name', '.value'),
                    names_pattern = '(\\w+)\\.(\\w+)\\.')

# A tibble: 6 x 4
#  name      type     mass  color 
#  <chr>     <fct>    <fct> <fct> 
#1 Item123   Desk     11.2  blue  
#2 object_AB Chair    2.3   orange
#3 Item123   Desk     14.2  red   
#4 object_AB Sofa     22    grey  
#5 Item123   Armchair 23.3  black 
#6 object_AB Monitor  2.2   white 



回答2:


I would suggest this approach and maybe it could be longest and boring using as df the data you added. The code looks for specific patterns in your column names, reshape it and finally merge all:

library(tidyverse)
#Code
df %>% select(contains('type')) %>%
  mutate(id=1:n()) %>%
  pivot_longer(-id) %>%
  separate(name,into = c(paste0('V',1:3)),sep = '\\.') %>%
  select(-c(V2,V3)) %>%
  rename(Value1=value) %>%
  left_join(df %>% select(contains('mass')) %>%
              mutate(id=1:n()) %>%
              pivot_longer(-id) %>%
              separate(name,into = c(paste0('V',1:3)),sep = '\\.') %>%
              select(-c(V2,V3)) %>%
              rename(Value2=value)) %>%
  left_join(df %>% select(contains('color')) %>%
              mutate(id=1:n()) %>%
              pivot_longer(-id) %>%
              separate(name,into = c(paste0('V',1:3)),sep = '\\.') %>%
              select(-c(V2,V3)) %>%
              rename(Value3=value))

Output:

# A tibble: 6 x 5
     id V1        Value1   Value2 Value3
  <int> <chr>     <chr>     <dbl> <chr> 
1     1 Item123   Desk       11.2 blue  
2     1 object_AB Chair       2.3 orange
3     2 Item123   Desk       14.2 red   
4     2 object_AB Sofa       22   grey  
5     3 Item123   Armchair   23.3 black 
6     3 object_AB Monitor     2.2 white 


来源:https://stackoverflow.com/questions/63696646/reshaping-a-table-in-r-while-parsing-information-from-column-names-and-using-it

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!