I have a list of stores and I have a product (apples). I ran a system of linear equations to get the column \'var\'; this value represents the amount of apples you will
The accepted answer works nicely but I thought I would add one that approaches the problem as a linear programming problem. This would be helpful if
The structure of your problem is a linear programming problem known as the transport problem. Yours is a tidy case where: 1. it costs the same to move a good from any sender to any receiver and 2. your system is balanced insofar as demand = supply.
The easiest way to think about the constraints on the solution to the problem is (I think) in terms of the matrix of places that are sending goods versus those receiving them. We can make that matrix from your toy example:
# Load the data
df <- data.frame(store = c('a', 'b', 'c', 'd', 'e', 'f'),
sku = c('apple', 'apple', 'apple', 'apple', 'apple', 'apple'),
var = c(1,4,-6,-1,5,-3))
df
#> store sku var
#> 1 a apple 1
#> 2 b apple 4
#> 3 c apple -6
#> 4 d apple -1
#> 5 e apple 5
#> 6 f apple -3
# Seeing the row-column constraints
sol.mat <- matrix(c(1,4,1,0,0,1,0,0,3), nrow = 3, byrow = TRUE)
rownames(sol.mat) <- -1 * df$var[df$var < 0]
colnames(sol.mat) <- df$var[df$var >= 0]
sol.mat
#> 1 4 5
#> 6 1 4 1
#> 1 0 0 1
#> 3 0 0 3
What this matrix shows us is that the solution to the system you have proposed satisfies the constraints that all of the row sums equal the amount to be sent from each store and all the column sums equal the amount to be received. Any solution needs to meet those criteria. Thus if we have S senders (the rows) and R receivers (the columns) we have SxR unknowns. If we call each unknown x_ij
where i
indexes the sender and j
the receiver we have the constraints that (A) sum_j x_ij = S_i
and (B) sum_i x_ij = R_j
. In a normal transport problem we would also have a cost associated with each link between a sender and receiver. This will be an SxR matrix, that we can call C. We would then seek the solution that minimizes costs, and solve numerically with min sum_i sum_j x_ij * c_ij
, subject to (A) and (B).
The fact that costs don't figure in your discussion just means all routes cost the same. We can still use this same structure of the problem to solve for a solution using the existing libraries R has for linear programming. I'm going to use the package lpSolve
which has a function for solving precisely this kind of problem called lp.transport
. Below I write a
wrapper function around lp.transport
that takes your known values and the store names and determines a valid solution. The function can also take a user supplied cost matrix (SxR), and can return output either in the compact form of a SxR matrix or as the larger matrix you are hunting for:
get_transport_matrix <- function(vals, labels, costs = NULL, bigmat = TRUE) {
if (sum(vals) != 0) {stop("Demand and Supply are Imbalanced!")}
S <- -1 * vals[which(vals < 0)]
names(S) <- labels[which(vals < 0)]
R <- vals[which(vals >=0)]
names(R) <- labels[which(vals >=0)]
if (is.null(costs)) {
costs.mat <- matrix(1, length(S), length(R))
} else {
costs.mat <- costs
}
solution <- lpSolve::lp.transport(costs.mat, direction = 'min',
row.signs = rep("=", length(S)),
row.rhs = S,
col.signs = rep("=", length(R)),
col.rhs = R)$solution
rownames(solution) <- names(S)
colnames(solution) <- names(R)
if (!bigmat) {
return(solution)
} else {
bigres <- matrix(0, length(vals), length(vals),
dimnames = list(labels, labels))
bigres[names(S), names(R)] <- solution
colnames(bigres) <- paste0("ship_to_", colnames(bigres))
return(bigres)
}
}
We can demo the function with your toy data to see how it works. Here I return just the small sender-receiver matrix. As we can see the solution is different to the one you supplied but also valid.
get_transport_matrix(df$var, df$store, bigmat = FALSE)
#> a b e
#> c 0 1 5
#> d 0 1 0
#> f 1 2 0
Using a linear programming package scales easily. Here for instance we solve for 10 stores:
get_transport_matrix(c(-10:-1, 10:1),
c(letters[1:10], letters[1:10]),
bigmat = FALSE)[1:6,]
#> a b c d e f g h i j
#> a 0 0 0 0 0 0 4 3 2 1
#> b 0 0 0 0 4 5 0 0 0 0
#> c 0 0 0 6 2 0 0 0 0 0
#> d 0 0 6 1 0 0 0 0 0 0
#> e 0 4 2 0 0 0 0 0 0 0
#> f 0 5 0 0 0 0 0 0 0 0
Finally, the default output of the function is in a large-matrix format and you can simply
cbind()
it to your dataframe to obtain your desired output:
cbind(df, get_transport_matrix(df$var, df$store))
#> store sku var ship_to_a ship_to_b ship_to_c ship_to_d ship_to_e
#> a a apple 1 0 0 0 0 0
#> b b apple 4 0 0 0 0 0
#> c c apple -6 0 1 0 0 5
#> d d apple -1 0 1 0 0 0
#> e e apple 5 0 0 0 0 0
#> f f apple -3 1 2 0 0 0
#> ship_to_f
#> a 0
#> b 0
#> c 0
#> d 0
#> e 0
#> f 0
Created on 2019-03-21 by the reprex package (v0.2.1)
Here's a tidyverse solution. It relies on there being a net zero of each sku.
If that's the case, then we should be able to line up all the donated items (one row for each unit in the negative var
s, sorted by sku) with all the received items (one row for each positive var
, sorted by sku).
Consequently, the first 5 donated apples are matched with the first 5 received apples, and so on.
Then we total up the total for each sku between each donor and recipient pair and spread so each recipient gets a column.
complete
to match OP solutionlibrary(tidyverse)
output <- bind_cols(
# Donors, for whom var is negative
df %>% filter(var < 0) %>% uncount(-var) %>% select(-var) %>%
arrange(sku) %>% rename(donor = store),
# Recipients, for whom var is positive
df %>% filter(var > 0) %>% uncount(var) %>%
arrange(sku) %>% rename(recipient = store)) %>%
# Summarize and spread by column
count(donor, recipient, sku) %>%
complete(donor, recipient, sku, fill = list(n = 0)) %>%
mutate(recipient = paste0("ship_to_", recipient)) %>%
spread(recipient, n, fill = 0)
> output
# A tibble: 6 x 8
donor sku ship_to_a ship_to_b ship_to_c ship_to_d ship_to_e ship_to_f
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a apple 0 0 0 0 0 0
2 b apple 0 0 0 0 0 0
3 c apple 1 4 0 0 1 0
4 d apple 0 0 0 0 1 0
5 e apple 0 0 0 0 0 0
6 f apple 0 0 0 0 3 0
I bet there are simpler ways of doing this but this one works.
The function fun
outputs a result identical
to the expected one.
fun <- function(DF){
n <- nrow(DF)
mat <- matrix(0, nrow = n, ncol = n)
VAR <- DF[["var"]]
neg <- which(DF[["var"]] < 0)
for(k in neg){
S <- 0
Tot <- abs(DF[k, "var"])
for(i in seq_along(VAR)){
if(i != k){
if(VAR[i] > 0){
if(S + VAR[i] <= Tot){
mat[k, i] <- VAR[i]
S <- S + VAR[i]
VAR[i] <- 0
}else{
mat[k, i] <- Tot - S
S <- Tot
VAR[i] <- VAR[i] - Tot + S
}
}
}
}
}
colnames(mat) <- paste0("ship_to_", DF[["store"]])
cbind(DF, mat)
}
out <- fun(df)
identical(output, out)
#[1] TRUE