问题
I have a component list made of 3 columns: product, component and quantity of component used:
a <- structure(list(prodName = c("prod1", "prod1", "prod2", "prod3",
"prod3", "int1", "int1", "int2", "int2"), component = c("a",
"int1", "b", "b", "int2", "a", "b", "int1", "d"), qty = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L)), row.names = c(NA, -9L), class = c("data.table",
"data.frame"))
prodName component qty
1 prod1 a 1
2 prod1 int1 2
3 prod2 b 3
4 prod3 b 4
5 prod3 int2 5
6 int1 a 6
7 int1 b 7
8 int2 int1 8
9 int2 d 9
Products with names starting with prod
are final products, those with names like int
are intermediate products, and those with letters are raw materials.
I need the full component list of final products with only raw materials as components. That is, I want to convert any int
into raw materials.
- Intermediate products can be composed by raw materials and another intermediate products, hence my reference to "recursive".
- I can't know in advance the level of nesting / recursion of an intermediate product (2 levels in this example, in excess of 6 in actual data).
For this example, my expected result is (I explicitly stated the computation of the resulting number):
prodName |component |qty
prod1 |a |1+2*6 = 13
prod1 |b |0+2*7 = 14
prod2 |b |3
prod3 |b |4+5*8*7 = 284
prod3 |a |0+5*8*6 = 240
prod3 |d |0+5*9 = 45
What I have done:
I solved this by creating a very cumbersome sequence of joins with merge
. While this approach worked for the toy data, it's unlikely I can apply it to the real one.
#load data.table
library(data.table)
# split the tables between products and different levels of intermediate
a1 <- a[prodName %like% "prod",]
b1 <- a[prodName %like% "int1",]
c1 <- a[prodName %like% "int2",]
# convert int2 to raw materials
d1 <- merge(c1,
b1,
by.x = "component",
by.y = "prodName",
all.x = TRUE)[
is.na(component.y),
component.y := component][
is.na(qty.y),
qty.y := 1][,
.(prodName, qty = qty.x*qty.y),
by = .(component = component.y)]
# Since int1 is already exploded into raw materials, rbind both tables:
d1 <- rbind(d1, b1)
# convert all final products into raw materials, except that the raw mats that go directly into the product won't appear:
e1 <- merge(a1,
d1,
by.x = "component",
by.y = "prodName",
all.x = TRUE)
# rbind the last calculated raw mats (those coming from intermediate products) with those coming _directly_ into the final product:
result <- rbind(e1[!is.na(qty.y),
.(prodName, qty = qty.x * qty.y),
by = .(component = component.y)],
e1[is.na(qty.y),
.(prodName, component, qty = qty.x)])[,
.(qty = sum(qty)),
keyby = .(prodName, component)]
I'm aware I can split the data into tables and perform joins until every intermediate product is expressed as composed by only raw materials, but as mentioned above, that will be a last resort due to the size of data and levels of recursion of intermediate products.
Is there an easier / better way to do this sort of recursive join?
回答1:
Here's my attempt using your dataset.
It uses a while
loop checking to see if there's any components
that also are in the prodName
field. The loop always needs to have the same fields so instead of adding a column for the recursive multipliers (i.e., 5*8*7 at the end), the iterative multipliers are integrated. That is, 5*8*7 becomes 5*56 at the end.
library(data.table)
a[, qty_multiplier := 1]
b <- copy(a)
while (b[component %in% prodName, .N] > 0) {
b <- b[a
, on = .(prodName = component)
, .(prodName = i.prodName
, component = ifelse(is.na(x.component), i.component, x.component)
, qty = i.qty
, qty_multiplier = ifelse(is.na(x.qty), 1, x.qty * qty_multiplier)
)
]
}
b[prodName %like% 'prod', .(qty = sum(qty * qty_multiplier)), by = .(prodName, component)]
prodName component qty
1: prod1 a 13
2: prod1 b 14
3: prod2 b 3
4: prod3 b 284
5: prod3 a 240
6: prod3 d 45
回答2:
Essentially, your data represents a weighted edgelist in a directed graph. The below code directly calculates the sum of (product) distances over each simple path from raw component -> final product using the igraph
library:
library(igraph)
## transform edgelist into graph
graph <- graph_from_edgelist(as.matrix(a[, c(2, 1)])) %>%
set_edge_attr("weight", value = unlist(a[, 3]))
## combinations raw components -> final products
out <- expand.grid(prodname = c("prod1", "prod2", "prod3"), component = c("a", "b", "d"), stringsAsFactors = FALSE)
## calculate quantities
out$qty <- mapply(function(component, prodname) {
## all simple paths from component -> prodname
all_paths <- all_simple_paths(graph, from = component, to = prodname)
## if simple paths exist, sum over product of weights for each path
ifelse(length(all_paths) > 0,
sum(sapply(all_paths, function(path) prod(E(graph, path = path)$weight))), 0)
}, out$component, out$prodname)
out
#> prodname component qty
#> 1 prod1 a 13
#> 2 prod2 a 0
#> 3 prod3 a 240
#> 4 prod1 b 14
#> 5 prod2 b 3
#> 6 prod3 b 284
#> 7 prod1 d 0
#> 8 prod2 d 0
#> 9 prod3 d 45
回答3:
I think you are better off representing the information in a set of adjacency matrices that tell you "how much of this is made of that". You need 4 matrices, corresponding to all the possible relationships. For example you put the relationship between final product and intermediate in a matrix with 3 rows and 2 columns like this:
QPI <- matrix(0,3,2)
row.names(QPI) <- c("p1","p2","p3")
colnames(QPI) <- c("i1","i2")
QPI["p1","i1"] <- 2
QPI["p3","i2"] <- 5
i1 i2
p1 2 0
p2 0 0
p3 0 5
this tells you that it takes 2 units of intermediate product i1 to make one unit of final product p1.
Similarly you define the other matrices:
QPR <- matrix(0,3,3)
row.names(QPR) <- c("p1","p2","p3")
colnames(QPR) <- c("a","b","d")
QPR["p1","a"] <- 1
QPR["p2","b"] <- 3
QPR["p3","b"] <- 4
QIR <- matrix(0,2,3)
row.names(QIR) <- c("i1","i2")
colnames(QIR) <- c("a","b","d")
QIR["i1","a"] <- 6
QIR["i1","b"] <- 7
QIR["i2","d"] <- 9
QII <- matrix(0,2,2)
row.names(QII) <- colnames(QII) <- c("i1","i2")
For example looking at QIR we see it takes 6 units of raw material a to make one unit of intermediate product i1. Once you have it in this way you sum over all possible ways of going from raw material to final product using matrix multiplication.
You have 3 terms: you can go directly from raw to final [QPR] QPR, or go from raw to intermediate
to final [QPI%*%QIR
] or go from raw to intermediate to other intermediate to final [QPI%*%QII%*%QIR
]
You result is in the end represented by the matrix
result <- QPI%*%QIR + QPI%*%QII%*%QIR + QPR
I put all the code together below. If you run it you will see that the result looks like this:
a b d
p1 13 14 0
p2 0 3 0
p3 240 284 45
which says exactly the same thing as
prodName |component |qty
prod1 |a |1+2*6 = 13
prod1 |b |0+2*7 = 14
prod2 |b |3
prod3 |b |4+5*8*7 = 284
prod3 |a |0+5*8*6 = 240
prod3 |d |0+5*9 = 45
hope this helps
QPI <- matrix(0,3,2)
row.names(QPI) <- c("p1","p2","p3")
colnames(QPI) <- c("i1","i2")
QPI["p1","i1"] <- 2
QPI["p3","i2"] <- 5
QPR <- matrix(0,3,3)
row.names(QPR) <- c("p1","p2","p3")
colnames(QPR) <- c("a","b","d")
QPR["p1","a"] <- 1
QPR["p2","b"] <- 3
QPR["p3","b"] <- 4
QIR <- matrix(0,2,3)
row.names(QIR) <- c("i1","i2")
colnames(QIR) <- c("a","b","d")
QIR["i1","a"] <- 6
QIR["i1","b"] <- 7
QIR["i2","d"] <- 9
QII <- matrix(0,2,2)
row.names(QII) <- colnames(QII) <- c("i1","i2")
QII["i2","i1"] <- 8
result <- QPI%*%QIR + QPI%*%QII%*%QIR + QPR
print(result)
来源:https://stackoverflow.com/questions/56822061/recursive-self-join-in-data-table