问题
This question is a follow-up of this question.
Let's say I have a large data.frame, df
, with columns u, v
. I'd like to number the observed variable-interactions of u, v
in increasing order, i.e. the order in which they were seen when traversing the data.frame
from top to bottom.
Note: Assume
df
has some existing ordering so it's not ok to temporarily reorder it.
The code shown at the bottom of this post works well, except that the result vector returned is not in increasing order. That is, instead of the current:
# result is in decreasing order here:
match(df$label, levels(df$label))
# [1] 5 6 3 7 4 7 2 2 1 1
# but we'd like it to be in increasing order like this:
# 1 2 3 4 5 4 6 6 7 7
I've been experimenting with order(), rank(), factor(...ordered=T)
etc. and nothing seems to work. I must be overlooking something obvious. Any ideas?
Note: It's also not allowed to cheat by reordering both
u, v
as individual factors.
set.seed(1234)
df <- data.frame(u=sample.int(3,10,replace=T), v=sample.int(4,10,replace=T))
# u v
# 1 1 3
# 2 2 3
# 3 2 2
# 4 2 4
# 5 3 2
# 6 2 4
# 7 1 2
# 8 1 2
# 9 2 1
# 10 2 1
(df$label <- factor(interaction(df$u,df$v), ordered=T))
# [1] 1.3 2.3 2.2 2.4 3.2 2.4 1.2 1.2 2.1 2.1
# Levels: 2.1 < 1.2 < 2.2 < 3.2 < 1.3 < 2.3 < 2.4
# This is ok except want increasing-order
match(df$label, levels(df$label))
# [1] 5 6 3 7 4 7 2 2 1 1
# no better.
match(df$label, levels(df$label)[rank(levels(df$label))])
# [1] 6 7 1 4 3 4 5 5 2 2
回答1:
Duh! The solution is to add interaction(... drop=T)
. I still don't fully understand why not having that breaks things though.
# The original factor from interaction() had unused levels...
str(df$label)
# Factor w/ 12 levels "1.1","1.2","1.3",..: 3 7 6 8 10 8 2 2 5 5
# SOLUTION
df$label <- interaction(df$u,df$v, drop=T)
str(df$label)
# Factor w/ 7 levels "2.1","1.2","2.2",..: 5 6 3 7 4 7 2 2 1 1
rank(unique(df$label))
# [1] 5 6 3 7 4 2 1
We will use that rank (shown above) to reorder the levels in-order-observed, before matching our vector against them as follows:
# And now we get the desired result
match(df$label, levels(df$label)[ rank(unique(df$label)) ] )
# [1] 1 2 3 4 5 4 6 6 7 7
来源:https://stackoverflow.com/questions/23028406/how-to-reorder-arbitrary-integer-vector-to-be-in-increasing-order