Reshaping a table in R while parsing information from column names and using it to collect information from specific columns

问题

I have this badly organized data table given to me, in which there are hundreds of columns (subset is given below)

Names of columns are dot delimited where the first field holds information about a type of object (e.g. Item123, object_AB etc.) without any naming convention. There is no specific order for these columns as well. Other columns share the type of object field and also have the name of some property for that object (e.g. color, manufacturer etc.).

Item123.type.value  Item123.mass.value  Item123.color.value object_AB.type.value  object_AB.mass.value  object_AB.color.value
Desk  11.2  blue  Chair 2.3 orange
Desk 14.2 red Sofa  22  grey
Armchair  23.3  black  Monitor 2.2 white

EDITED: Adding dput() structure:

structure(list(Item123.type.value = structure(c(2L, 2L, 1L),
levels = c("Armchair", "Desk"), class = "factor"), Item123.mass.value = structure(1:3,
levels = c("11.2", "14.2", "23.3"), class = "factor"), Item123.color.value = structure(c(2L,
3L, 1L), levels = c("black", "blue", "red"), class = "factor"),
object_AB.type.value = structure(c(1L, 3L, 2L), levels = c("Chair",
"Monitor", "Sofa"), class = "factor"), object_AB.mass.value = structure(c(2L,
3L, 1L), levels = c("2.2", "2.3", "22"), class = "factor"),
object_AB.color.value = structure(c(2L, 1L, 3L), levels = c("grey",
"orange", "white"), class = "factor")), row.names = c(NA_integer_,
-3L), class = "data.frame")

I need to convert the table into something like this (order of rows does not matter):

type  name  mass  color
Item123  Desk  11.2  blue
Item123  Desk  14.2  red
object_AB  Chair 2.3 orange
object_AB  Sofa  22  grey
Item123  Armchair  23.3  black
object_AB  Monitor 2.2 white

I would really appreciate any help I could get!!

回答1:

You can use pivot_longer here specifying names_pattern to get data from the column names.

tidyr::pivot_longer(df, 
                    cols = everything(), 
                    names_to = c('name', '.value'),
                    names_pattern = '(\\w+)\\.(\\w+)\\.')

# A tibble: 6 x 4
#  name      type     mass  color 
#  <chr>     <fct>    <fct> <fct> 
#1 Item123   Desk     11.2  blue  
#2 object_AB Chair    2.3   orange
#3 Item123   Desk     14.2  red   
#4 object_AB Sofa     22    grey  
#5 Item123   Armchair 23.3  black 
#6 object_AB Monitor  2.2   white

回答2:

I would suggest this approach and maybe it could be longest and boring using as df the data you added. The code looks for specific patterns in your column names, reshape it and finally merge all:

library(tidyverse)
#Code
df %>% select(contains('type')) %>%
  mutate(id=1:n()) %>%
  pivot_longer(-id) %>%
  separate(name,into = c(paste0('V',1:3)),sep = '\\.') %>%
  select(-c(V2,V3)) %>%
  rename(Value1=value) %>%
  left_join(df %>% select(contains('mass')) %>%
              mutate(id=1:n()) %>%
              pivot_longer(-id) %>%
              separate(name,into = c(paste0('V',1:3)),sep = '\\.') %>%
              select(-c(V2,V3)) %>%
              rename(Value2=value)) %>%
  left_join(df %>% select(contains('color')) %>%
              mutate(id=1:n()) %>%
              pivot_longer(-id) %>%
              separate(name,into = c(paste0('V',1:3)),sep = '\\.') %>%
              select(-c(V2,V3)) %>%
              rename(Value3=value))

Output:

# A tibble: 6 x 5
     id V1        Value1   Value2 Value3
  <int> <chr>     <chr>     <dbl> <chr> 
1     1 Item123   Desk       11.2 blue  
2     1 object_AB Chair       2.3 orange
3     2 Item123   Desk       14.2 red   
4     2 object_AB Sofa       22   grey  
5     3 Item123   Armchair   23.3 black 
6     3 object_AB Monitor     2.2 white

来源：https://stackoverflow.com/questions/63696646/reshaping-a-table-in-r-while-parsing-information-from-column-names-and-using-it

标签

data.table

reshape